What Coverage For Genome Re-Sequencing By Illumina ?
4
1
Entering edit mode
12.1 years ago

Hello,

I was wondering was coverage you need to do genome re-sequencing in illumina (Illumina HighSeq 2000) ?

I was told 100x, which seems high, but I read that people often seem to use a 20-30x coverage.

Moreover, is it necessary to have a higher coverage to look for intra-population selective sweeps (from individual samples), than to investigate the genomic architecture of differenciation between sister species ?

Thank you by advance for you answer.

illumina mapping coverage • 5.9k views
ADD COMMENT
0
Entering edit mode

In which species do you intend to work? You know what's the quality of their genome?

ADD REPLY
0
Entering edit mode

It's a phytopathogen fungi genome, with a high GC-rate, so I think we're going to re-sequence several individuals at a high coverage at first (100x). Then we'll do some sampling, to see how much we can lower the coverage for further experiments without decreasing sensitivity.

ADD REPLY
3
Entering edit mode
12.1 years ago

Just an example of whole genome coverage:

enter image description here

Rather than giving you a hard number here are two articles that answer your questions.

Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Crawford & Lazzro 2012.

Low-coverage sequencing: Implications for design of complex trait association studies. Li et al 2011.

Whole genome depth modeling:

Exome dist:

ADD COMMENT
0
Entering edit mode

Thank you for the link. The first one in particular is very relevant for my interests (non human populations with small sample sizes).

ADD REPLY
3
Entering edit mode
12.1 years ago

Coverage should follow a Poisson distribution, so if your mean coverage is 30X, you will fall below 20X about 3.5% of the time. In theory to get 30X at 99% of locations you will need a mean of 45X coverage.

Unfortunately the genome does not respect this distribution and you will often see deserts and hotspots with thousands of reads, although this is largely a mappability issue.

ADD COMMENT
0
Entering edit mode

Yup, it is naughty data. I Often see that a negative binomial is a better fit.

ADD REPLY
0
Entering edit mode

using the negative binomial, what mean coverage is necessary to have 99% of bases covered at 30X?

ADD REPLY
1
Entering edit mode

I guess I should have been more clear: this was for exome data. I also added a plot for WG data in my original post.

Exome depth histograms often look more like:

n<-100000 hist(rpois(n,rgamma(n,2,0.0333)))

ADD REPLY
2
Entering edit mode
12.1 years ago
Lee Katz ★ 3.2k

With bacteria, we are aiming for something like 50x. For high quality SNPs, we aim for 100x so that even the lower-coverage bases will have good coverage.

ADD COMMENT
2
Entering edit mode
12.1 years ago

It also depends what you are looking for. For homozygous SNPs, 30x average will do pretty well. For heterozygous, or mixed SNPs, 50x is more like it.

ADD COMMENT

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6