How to split a string and swap the result in bash
2
I have a column which is the output of an annotation command (bcftools query -f '%POS\t%REF\t%ALT\t%BCSQ\n'
)
The column I'm interested in is the amino acid position and amino acid change for SNPs, like so:
402Y>402H
How can I use bash code to swap these two around (if it contains a ">". Synonymous changes would just be 402Y
for example)
So the result would be
402H>402Y
Thanks
text
string
bash
split
• 1.2k views
Could be better to provide some example, but in general, you can do this:
$ echo "402Y>402H" | perl -pe 's/(\d+)(\w)>(\d+)(\w)/$1$4>$3$2/'
402H>402Y
•
link
4.7 years ago by
JC
13k
How about try this command?
$ cat in.txt
123520 T C missense Rv0104 Rv0104 protein_coding + 402Y>402H 123520T>C
199470 T G missense mce1A Rv0169 protein_coding + 313S>313A 199470T>G
199470 T G missense mce1A Rv0169 protein_coding + 75*>75Q 199470T>G
$ awk '{OFS="\t"; split($9,a,">"); $9=a[2]">"a[1]; print}' in.txt
123520 T C missense Rv0104 Rv0104 protein_coding + 402H>402Y 123520T>C
199470 T G missense mce1A Rv0169 protein_coding + 313A>313S 199470T>G
199470 T G missense mce1A Rv0169 protein_coding + 75Q>75* 199470T>G
•
link
4.7 years ago by
wm
▴
570
Login before adding your answer.
Traffic: 1750 users visited in the last hour
Great, thanks.
I have a couple of questions though -
1) it seems to fail where I have 'stop lost' annotations:
75*>75Q
I presume this is because the * appears just before the >.2) Do you know how I can do this in-place in the original file/table? For example the first two rows look like this:
I just need to apply this change to the 9th column here. Thanks
echo "75*>75Q" | perl -pe 's/(\d+)(.)>(\d+)(.)/$1$4>$3$2/'
The "." means anything. This will apply to a new file, is not recommended to do in the origin file.