Entering edit mode
2.4 years ago
nmaden
•
0
I have the following input data in .tsv format (a piece of example from a file that contains >100k rows/lines).
Chr:Pos ID REF ALT Q AD DP GQ GT
1:1246257 GGCT - 62.73 31,5 36 99 0/1
1:1267678 TGCA - 0 , ./.
1:17085502 rs2329006 T C 1171.77 20,52 72 99 0/1
1:17085564 rs3851921 A G 446.77 21,19 40 99 0/1
1:17085590 rs371449598 GCGCTG - 88.73 37,12 49 99 0/1
1:17085594 T C 0 , ./.
1:119964933 rs75429891 T C 373.77 26,19 45 99 0/1
1:120057158 rs6203 C T 1351.77 18,21 39 99 0/1
1:111783982 rs13721 C A 782.77 0,32 62 99 1/.
1:111783982 rs13721 C T 782.77 0,30 62 99 ./1
1:152195729 rs34061715 T - 576.73 0,17 21 38 ./1
2:73613032 rs193922695 - GGA 786.73 0,8 21 99 ./1
2:73613032 rs61156725 GGAGGA - 615.73 0,13 21 99 1/.
2:73613032 rs61156725 GGAGGA - 615.73 0,,13 21 99 2/.
2:241696841 rs71779512 TCC - 944.73 0,8 19 99 ./1
3:42251578 - GGA 1128.73 0,8 28 99 ./1
3:42251578 rs10530663 - GGA 1128.73 0,2,8 28 99 ./2
1:981931 rs2465128 A G 1100.77 10,37 47 99 0/1
1:982994 rs10267 T C 2661.77 54,28 82 99 0/1
1:984302 rs9442391 T C 1054.77 50,28 78 99 0/1
1:1007203 rs4633229 A G 372.77 9,7 16 99 0/1
1:1007432 rs4333796 G A 626.77 22,16 38 99 0/1
1:1246257 GGC - 62.73 31,5 36 99 0/1
1:1290276 rs75904949 C G 231.77 10,12 22 99 0/1
I want to filter out GT records with ./. , 1/., ./1, 2/., ./2
and print the whole table.
For this I used the awk command as below:
awk '($9!="./." && $9!="1/." && $9!="./1" && $9!="2/." && $9!="./2") {print $0}' input.tsv > output.tsv
The command runs successfully but the output file still has the records with the above characters.
Any help is appreciated.
Thanks
show us such lines.
also test
awk -F '\t' '($9!="./." etc ...
I don't think this will work as intended.
When you use the 'negative' match ( != ) you more likely need to use the or operator ( || ) and not the and operator ( && ). Can you give that a try ?
EDIT: nevermind the above, just listen to Pierre Lindenbaum :)
it should work
yeah, you're right. Let's see where OP takes this
(and I know better than to debate with you on awk topics, but I'm just in one of those moods ;) )
Hi, the code can remove 1/., ./1, 2/. and ./2 but not ./. Tried varying it but no avail.
Thanks!