I want to extract the annotation values (QUAL, BaseQRankSum, ClippingRankSum, DP, FS, MQRankSum, etc.) of the variants (SNPs and indels) called in my genome reseq data and
- I want to plot the distribution of these values before proceeding to stringent filtering.
- I also want to plot the correlation between several annotation values for the called variants.
A part of the variants_MA605.vcf
file looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MA605**`
scaffold_1111 62 . T A 61.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.358;ClippingRankSum=-1.231;DP=5;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=37.19;MQRankSum=-1.231;QD=12.35;ReadPosRankSum=0.358;SOR=1.022 GT:AD:DP:GQ:PL 0/1:2,3:5:73:90,0,73
scaffold_1111 301 . G A 119.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.227;ClippingRankSum=-1.598;DP=73;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=27.33;MQRankSum=1.356;QD=1.64;ReadPosRankSum=1.404;SOR=0.596 GT:AD:DP:GQ:PL 0/1:59,11:70:99:148,0,1738
scaffold_1111 340 . C T 105.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.547;ClippingRankSum=-0.490;DP=33;FS=9.645;MLEAC=1;MLEAF=0.500;MQ=22.79;MQRankSum=1.351;QD=3.21;ReadPosRankSum=1.116;SOR=2.799 GT:AD:DP:GQ:PL 0/1:23,10:33:99:134,0,601
Using SnpSift (part of SnpEff); command:
java -jar SnpSift.jar extractFields variants_MA605.vcf CHROM POS ID AF QUAL > raw01VarMA605qual.txt
The output text file is like:
#CHROM POS ID AF QUAL
scaffold_1111 62 0.500 61.77
scaffold_1111 301 0.500 119.77
scaffold_1111 340 0.500 105.77
While the extraction of the QUAL values (and other string values: CHROM, REF, ALT) has been clear and straight forward I am not able to pull the annotation values for AC, BaseQRankSum, ClippingRankSum, etc. because they are multiple annotation values under INFO field. I have checked the documentation but its been not so clear and successful. How can I extract this INFO fields separately so I can test for correlation between the annotation values?
I have been SnpSift to get the values for QUAL in text file and R to do the distribution plotting. Are there any other tools than SnpSift that may do a better job of extracting the annotation and give the appropriate plots?
Thanks in advance!
Thanks for seeing that. I was totally unaware that it had pulled the AF field values.Thanks for pointing that to me.