I have this data:
##sequence-region Q75T13 1 641
Q75T13 UniProtKB Chain 1 641 . . . ID
Q75T13 UniProtKB Topological domain 1 60 . . . Note=Cytoplasmic
Q75T13 UniProtKB Transmembrane 61 85 . . . Note=Helical
Q75T13 UniProtKB Topological domain 86 641 . . . Note=Lumenal
##sequence-region Q9BRR3 1 403
Q9BRR3 UniProtKB Chain 1 403 . . . ID
Q9BRR3 UniProtKB Topological domain 1 22 . . . Note=Lumenal
Q9BRR3 UniProtKB Transmembrane 23 43 . . . Note=Helical
Q9BRR3 UniProtKB Topological domain 44 259 . . . Note=Cytoplasmic
##sequence-region Q96FM1 1 250
Q96FM1 UniProtKB Topological domain 120 135 . . . Note=Cytoplasmic
Q96FM1 UniProtKB Transmembrane 136 156 . . . Note=Helical
Q96FM1 UniProtKB Topological domain 157 169 . . . Note=Lumenal
Q96FM1 UniProtKB Transmembrane 170 190 . . . Note=Helical
Q96FM1 UniProtKB Topological domain 191 250 . . . Note=Lumenal
And I was wondering what the awk code would look like for:
The rows that have the word lumenal, if in the previous row it has the word transmembrane, subtract -12 in column 4 and print the row with the word lumenal. If the row with the word lumenal has the word "transmembrane" in the next row, add +12 in column 5 and print the row with the word lumenal. The final file would be:
Q75T13 UniProtKB Topological domain 74 641 . . . Note=Lumenal
Q9BRR3 UniProtKB Topological domain 1 34 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 145 169 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 157 181 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 179 250 . . . Note=Lumenal
Can someone help me? I am a little bit stuck. I am trying with awk and grep
Try this in R:
Use python instead of an awk script and save the code somewhere. This is not a trivial re-formatting issue and you'll revisit this exact code some time in the future, do not waste your time writing a 'throw-away" script.
This is the version in csv:
And the output:
@ Rafael Soler
Why did you delete the post?