Hi,
I want to filter Snps from vcf file but i am confused that which parameter is good for SNPs filtering. In my vcf file i have found several condition that confused me. I show some of lines of vcf output.
FORMAT INFO
**#CHROM POS ID REF ALT QUAL FILTER GT AD DP GQ PL AC AF AN INFO**
Chr01 16434 . T A 32.77 . 0/1 53,6 59 61 61,0,2212 AC=1 0.500 2 BaseQRankSum=0.226 ClippingRankSum=0.000 DP=135 ExcessHet=3.0103 FS=1.850 MLEAC=1 MLEAF=0.500 MQ=32.25 MQRankSum=1.082
Chr01 103148 . C A 1017.77 . 0/1 25,3 55 99 1046,0,886 AC=1 0.500 2 BaseQRankSum=0.009 ClippingRankSum=0.000 DP=55 ExcessHet=3.0103 FS=0.000 MLEAC=1 MLEAF=0.500 MQ=60.20 MQRankSum=0.949
Chr01 15650 . C A 424.77 . 0/1 3,11 14 58 453,0,58 AC=1 0.500 2 BaseQRankSum=0.853 ClippingRankSum=0.000 DP=25 ExcessHet=3.0103 FS=0.000 MLEAC=1 MLEAF=0.500 MQ=49.38 MQRankSum=0.585 QD=30.34 ReadPosRankSum=1.479 SOR=0.760
Chr01 15651 . C A 424.77 . 0/1 3,11 14 58 453,0,58 AC=1 0.500 2 BaseQRankSum=0.763 ClippingRankSum=0.000 DP=25 ExcessHet=3.0103 FS=0.000 MLEAC=1 MLEAF=0.500 MQ=49.38 MQRankSum=0.585 QD=30.34 ReadPosRankSum=1.481 SOR=0.760
Now if you see this result, in the first line of result AD=53,6. It means 53 reads have same allele like reference and 6 reads have alternate allele. Is it right that i am saying. If not please tell me what is that?? If i am right then it is good snp ?? My second question is : There are some SNPs that have different DP in info and format column. For those what should i do?? And i read about this and i found that DP of info column is total reads depth and DP in format column is allelic depth. So it would be better to select the SNPs on the basis of allelic depth. Please explain me how should i select the SNPs ??
Thanks in advance
Why do you want to filter them? What is your ultimate goal? These are parameters you can use to filter the file, but not unless you're clear on what you need exactly.
Thanks to reply. I want to filter true SNPs.but before it i want to understand the results.
I assume you're looking for true variants and avoid false positives - if you're looking for polymorphisms, you might need to set some criteria based on population allele frequency and also look into phenotypic effects.
For tool to filer you can use SnpSift.
After that been said; first thing first, as said by @Ram why you want to filter and what is the question you are trying to answer?
there is a nice filtering example decision making can be found here http://userweb.eng.gla.ac.uk/cosmika.goswami/snp_calling/SNPCalling.html
section 8
Thank you. I used most of tools but every time i have question ; is it true snp or not?? Can you please tell me that why DP value is different in info and format column??
Thanks
The difference between DP filed and AD filed is:
Thank you for your informative response. I read about this. Can you please tell me that why AD value is always smaller than DP value in my result file. Actually there is huge difference between AD and DP value in my result file. i read that the sum of AD may be different than the individual sample depth, especially when there are many non-informative reads. So it means when the reads were align to particular position then most of reads are non-informative or did not proper align in my data?? Thanks
Reads that are not used for calling are not counted in the DP measure, but are included in AD
It means then AD >= DP ?? am i right or not ?? I am bothering to much but i want to clear my concept in this field because i am new to analysis this type of data. So i apologize for that.
Yes, you understand it right :)