Total SNP calculation using command line vs picardtools
1
0
Entering edit mode
8.2 years ago
bitpir ▴ 250

Hello, I am trying to calculate the total number of SNPs in my VCF file. When I tried using Picardtool's CollectVariantCallingMetrics, it calculates 174387 SNPs. However, when I tried to use the command line to calculate my SNPs, I get 199320. The command line I used is as follow:

grep -v "#" VARIANT_VCF | wc -l

Does anyone know why there is a difference in 24933 SNPs called in the 2 calculation methods?

Thank you very much for your help!

picardtools SNP • 1.8k views
ADD COMMENT
0
Entering edit mode
8.2 years ago
igor 13k

Does your VCF contain both SNPs and indels? I think Picard only counts SNPs.

ADD COMMENT
0
Entering edit mode

Yes, my VCF contains both SNPs and indels. Under the CollectVariantCallingMetrics-generated file, theTOTAL SNP = 174387, TOTAL INDEL = 24542,TOTAL_MULTIALLELIC_SNPS = 6659, TOTAL_COMPLEX_INDELS=278. Adding those up, I get 205886 variants. I am using picardtools 2.0.1. I was reading GATK's website and it says that "Many factors affect these counts, including cohort size, relatedness between samples, strictness of filtering, ethnicity of samples, and even algorithm improvement due to updated software." I've also tried using picardtools 2.4.1 but did not get the same number as 2.0.1: TOTAL_SNP=174387, TOTAL_INDEL=24542, TOTAL_MULTIALLELIC_SNPS=646,TOTAL_COMPLEX_INDELS=292. Total variant = 199287.

ADD REPLY
0
Entering edit mode

Looks like TOTAL_SNP and TOTAL_INDEL stayed the same, which makes sense because those are simple events. The difference is in TOTAL_MULTIALLELIC_SNPS and TOTAL_COMPLEX_INDELS. They might be doing some overlapping. For example, a single indel may sometimes be split into multiple events by the variant caller and they might be trying to correct for that.

Since GATK discusses this tool on their website, it's probably a good idea to ask about it on the GATK forum.

ADD REPLY

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6