awk code for Exac MAF values
2
0
Entering edit mode
9.6 years ago
basalganglia ▴ 40

I have a VCF as following and I filter exac values less and equal than 0.02 and also including unknown values "." . How can I write awk code for this ? I have written code as cat a.txt | awk '$6 <= "0.02"' | awk '$6 == "."' > but it does not work. Exac values are found in 6 column Could you please help me ?

Chr     Start   End     Ref     Alt     ExAC_ALL        ExAC_AFR        ExAC_AMR        ExAC_EAS        ExAC_FIN        ExAC_NFE        ExAC_OTH        ExAC_SAS        Otherinfo
1       12783   12783   G       A       .       .       .       .       .       .       .       .       0.5     881.62  27      1       12783   .       G       A       881.62  .       ABHet=0.279;ABHom=0.689;AC=33;AF=0.786;AN=42;BaseQRankSum=2.245;DP=1005;Dels=0.00;FS=0.000;HaplotypeScore=0.1330;InbreedingCoeff=0.0782;MLEAC=33;MLEAF=0.786;MQ=5.42;MQ0=949;MQRankSum=-0.409;OND=0.293;QD=1.77;ReadPosRankSum=-0.211;ANN=A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000438504|unprocessed_pseudogene||n.*1783C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000541675|unprocessed_pseudogene||n.*1416C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000423562|unprocessed_pseudogene||n.*1669C>T|||||1580|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000488147|unprocessed_pseudogene||n.*1351C>T|||||1621|,A|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000538476|unprocessed_pseudogene||n.*1583C>T|||||1628|,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328|processed_transcript|2/2|n.468+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000515242|transcribed_unprocessed_pseudogene|2/2|n.465+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000518655|transcribed_unprocessed_pseudogene|2/3|n.481+62G>A||||||,A|intron_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000450305|transcribed_unprocessed_pseudogene|3/5|n.182+86G>A||||||   GT:AD:DP:GQ:PL  0/1:3,25:27:15:102,0,15
awk filter • 2.7k views
ADD COMMENT
1
Entering edit mode
9.6 years ago
Ram 44k

Quotes are not needed around numbers. And there are quite a few errors in your syntax. Try this:

cat a.txt |  awk '{if($6 <=0.02 || $6 == ".")  print }'

In case that doesn't work, try using " " as a delimiter by passing the stream though a tr -s " " first.

ADD COMMENT
0
Entering edit mode

Yep it works but file also includes number like 0,04561 is seen as 4,56E-02.

I don't understand "try using " " as a delimiter by passing the stream though a tr -s " " first." sentence. Could you please explain this

Thanks

BG

ADD REPLY
1
Entering edit mode
9.6 years ago

A simple awk '$6<0.02' a.txt would do, since the "." would be included in that filter.

ADD COMMENT

Login before adding your answer.

Traffic: 2723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6