awk command for filtering
0
0
Entering edit mode
2.4 years ago
nmaden • 0

I have the following input data in .tsv format (a piece of example from a file that contains >100k rows/lines).

Chr:Pos ID  REF ALT Q   AD  DP  GQ  GT
1:1246257       GGCT    -   62.73   31,5    36  99  0/1
1:1267678       TGCA    -   0   ,           ./.
1:17085502  rs2329006   T   C   1171.77 20,52   72  99  0/1
1:17085564  rs3851921   A   G   446.77  21,19   40  99  0/1
1:17085590  rs371449598 GCGCTG  -   88.73   37,12   49  99  0/1
1:17085594      T   C   0   ,           ./.
1:119964933 rs75429891  T   C   373.77  26,19   45  99  0/1
1:120057158 rs6203  C   T   1351.77 18,21   39  99  0/1
1:111783982 rs13721 C   A   782.77  0,32    62  99  1/.
1:111783982 rs13721 C   T   782.77  0,30    62  99  ./1
1:152195729 rs34061715  T   -   576.73  0,17    21  38  ./1
2:73613032  rs193922695 -   GGA 786.73  0,8 21  99  ./1
2:73613032  rs61156725  GGAGGA  -   615.73  0,13    21  99  1/.
2:73613032  rs61156725  GGAGGA  -   615.73  0,,13   21  99  2/.
2:241696841 rs71779512  TCC -   944.73  0,8 19  99  ./1
3:42251578      -   GGA 1128.73 0,8 28  99  ./1
3:42251578  rs10530663  -   GGA 1128.73 0,2,8   28  99  ./2
1:981931    rs2465128   A   G   1100.77 10,37   47  99  0/1
1:982994    rs10267 T   C   2661.77 54,28   82  99  0/1
1:984302    rs9442391   T   C   1054.77 50,28   78  99  0/1
1:1007203   rs4633229   A   G   372.77  9,7 16  99  0/1
1:1007432   rs4333796   G   A   626.77  22,16   38  99  0/1
1:1246257       GGC -   62.73   31,5    36  99  0/1
1:1290276   rs75904949  C   G   231.77  10,12   22  99  0/1

I want to filter out GT records with ./. , 1/., ./1, 2/., ./2 and print the whole table.

For this I used the awk command as below:

awk '($9!="./." && $9!="1/." && $9!="./1"  && $9!="2/." && $9!="./2") {print $0}'  input.tsv > output.tsv

The command runs successfully but the output file still has the records with the above characters.

Any help is appreciated.

Thanks

filtering awk • 1.3k views
ADD COMMENT
0
Entering edit mode

but the output file still has the records with the above characters.

show us such lines.

ADD REPLY
0
Entering edit mode

also test awk -F '\t' '($9!="./." etc ...

ADD REPLY
0
Entering edit mode

I don't think this will work as intended.

When you use the 'negative' match ( != ) you more likely need to use the or operator ( || ) and not the and operator ( && ). Can you give that a try ?

EDIT: nevermind the above, just listen to Pierre Lindenbaum :)

ADD REPLY
0
Entering edit mode

it should work

echo -e '1.1\n./2' |  awk '($1!="./." && $1!="1/." && $1!="./1"  && $1!="2/." && $1!="./2")'
1.1
ADD REPLY
0
Entering edit mode

yeah, you're right. Let's see where OP takes this

(and I know better than to debate with you on awk topics, but I'm just in one of those moods ;) )

ADD REPLY
0
Entering edit mode

Hi, the code can remove 1/., ./1, 2/. and ./2 but not ./. Tried varying it but no avail.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6