The necessary coverage depends on the platform and run mode, too. Illumina's newer NextSeq platform, for example, has much lower quality and much less accurate quality scores than their top-quality MiSeq platform, as well as shorter reads. All three of those factors influence how much coverage is needed to accurately call variants. WGS needs lower coverage than exon-capture, though, because it has less bias. Using a NextSeq instead of a HiSeq/MiSeq might double your coverage target; and exon-capture might triple it.
Additionally, Illumina's newer software versions with quantized quality scores are simply not very good for calling variants, which would again increase the necessary coverage for a given confidence level. It's possible to recalibrate the quality scores which will restore the full quality-score range and thus make it possible to more-accurately distinguish SNVs from sequencing error, reducing the necessary coverage, but it's better to just select a platform that does not quantize quality scores in the first place. The newer 2-dye chemistries also seem to decrease quality, and patterned flow-cells decrease average insert size (longer inserts help resolve repeats), so the newer platforms with 2-dye chemistry or patterned flow-cells need more coverage for accurate variant calling.
I'm currently evaluating some NextSeq data from a fungus with 120x coverage. Some of the SNPs are present in 97% of reads; it's pretty obvious they are real. Some are present in 1 read only; they appear to be sequencing error. Some are present in around 25% of reads, with a kind of low average quality score. I'm really not sure about those - are they real? Sequencing error? A collapsed 4-copy repeat in the assembly? If this was MiSeq or HiSeq 2500 data, it would be obvious. But with current NextSeq data, the lowest possible quality score is 14, which indicates over 95% confidence that the call is correct. I have no idea what they are. Others variants are scattered around whole coverage scale, between 2x and 120x; with inaccurate calls and quality scores, it's impossible to accurately call any variants or their ploidy unless you do massive oversequencing, and 30x would absolutely not be sufficient for a haploid, let alone a diploid.
What Is Considered A Good Coverage Depth In Exon Capture Seq
https://www.ncbi.nlm.nih.gov/pubmed/18987734
Thank you very much.
In my opinion it really depends on what your research question is. If it disease/clinical related you would like to be sure that a variants is there and you dont want the hassle of validating variants with Sanger sequencing so therefore 30X is relative good coverage. Usually a heterozygosity rate of <75% is used so that would mean that at least 7 reads are needed to call a variant in a 30x covered piece of genome... See for a longer discussion also this post: What Is Considered A Good Coverage Depth In Exon Capture Seq
Thanks a lot for the reply:)
Here's a more recent analysis of sensitivity vs. read depth for WGS and WXS: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-247
Thank you for the information
Another reference, about advised coverage in exome sequencing: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-195
But as said by others it really depends on what you are doing. De novo sequencing or resequencing, short or long reads, CNV detection or SNP detection, research or diagnostic,...
I do SNP detection. Thanks a lot for the information.