DiscoSNP++ 2.1.7 vs 2.2.0 huge difference in the number of SNPs called
1
1
Entering edit mode
9.2 years ago
tkitapci ▴ 60

Hi,

I run discoSNP++ using the provided run_discoSNP++.sh script with all default parameters on the same dataset twice using versions 2.1.7 and 2.2.0.

In the VCF file created by 2.1.7 I got 865,316 SNPs while 2.2.0 calls 72,151 SNPs. Is this expected? Is 2.2.0 has much strict parameters for calling SNPs?

Thanks

Best Regards
T. Hamdi Kitapci

discosnp • 1.9k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

Hi,

One of the major novelty in disco 2.2.0 is that it automatically detects the threshold coverage under which k-mers are removed as they are considered as containing a sequencing error.

In 2.1.7 this value is 3 by default. In 2.2.0, this value can be found in the log (line starting by "thresholds").

Regarding your results, two possibility (co-)exist:

  • variants predicted only by 2.1.7 are in fact generated by sequencing errors, they are false positives and it's a good thing they were non predicted in 2.2.0
  • the automatic threshold detection was too stringent and, by consequence, real k-mers were removed and associated variants were not found.

This is difficult to conclude without a deeper look into your study details, the automatic c value, and the data complexity and coverage.

Pierre

ADD COMMENT
0
Entering edit mode

Hi Pierre,

Thanks a lot for the clarification I will keep this in mind for my analysis.

Thanks

Best Regards

T. Hamdi Kitapci

ADD REPLY

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6