Question

Freebayes and snpeff give different number of SNPs

1

Entering edit mode

10.0 years ago

merodev ▴ 150

Hello friends,

I called SNPs using Freebayes and also used snpeff to see the effects on the output file from Freebayes. The number of SNPs for the same .vcf file is different in these two programs. Is there any suitable explanation for this? Number of SNPs reported by snpeff on the Freebayes output file is higher than the number of SNPs counted in Freebayes.

Thanks for your help

SNP snpeff freebayes • 4.9k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by merodev ▴ 150

score 3 · Answer 1 · 2014-12-22

3

Entering edit mode

10.0 years ago

Laura ★ 1.8k

Is SNPeff reporting multiple consequences for the same site on different lines so what you are seeing is duplication rather than new sites?

ADD COMMENT • link 10.0 years ago by Laura ★ 1.8k

0

Entering edit mode

Almost certainly the issue here, same with VEP and any other effect prediction tool that outputs multiple transcripts effects for a single variant.

ADD REPLY • link 10.0 years ago by User 59 13k

Ram · Answer 2 · 2014-12-25

2

Entering edit mode

10.0 years ago

Pablo ★ 1.9k

If a VCF entry has two (or more) alts, SnpEff counts it as two (or more) SNPs.

For instance

CHR:1, POS:1234, REF:A, ALT:C

is 1 SNP (A>C) , whereas this entry

CHR:1, POS:1234, REF:A, ALT:C,G

is counted as 2 SNPs (A>C and A>G).

I don't know how freebayes counts, but I'm assuming it's might only counting number of VCF entries (which SnpEff also reports in the HTML summary).

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Pablo ★ 1.9k

0

Entering edit mode

Thank you Pablo. This was exactly what I was looking for!

ADD REPLY • link 10.0 years ago by merodev ▴ 150

Ram · Answer 3 · 2014-12-21

0

Entering edit mode

10.0 years ago

Dan D 7.4k

Variance amongst variant caller results is a¹ well²-known³ issue⁴ (notice I didn't use the word "problem") in analysis of high-throughput sequencing data.

In simple terms, the reason for the discrepancy is the overall different approaches the various variant callers take in the process of filtering and preparing the files for SNP calling. There is a selection of algorithms to choose from for each step in the variant-calling process, in addition to the spectrum thresholds for quality, depth, and other metrics.

Given this, you should first ask yourself what you think defines an interesting or valid variant in the context of your experimental setup. Make sure you can explain, in basic language, why the variant caller you're using selects the variants it does. If you do this, you can be more confident in the variants you do see, while also being less paranoid about false negatives.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Dan D 7.4k

2

Entering edit mode

SnpEff is a tool to annotate an existing VCF file, not a different variant caller.

ADD REPLY • link 10.0 years ago by Fedor Gusev ▴ 210

0

Entering edit mode

Great answer, but for the wrong question :)

ADD REPLY • link 10.0 years ago by User 59 13k

0

Entering edit mode

Thank you for the comments. What I am trying to figure out is how SnpEff is summarizing snp counts in its output html. While grepping "TYPE=snp" or "snp" alone gives a lower count as compared to the summary results of snpeff.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by merodev ▴ 150