Optimum on-target coverage for exome-sequencing
2
3
Entering edit mode
10.1 years ago
hellbio ▴ 520

Hi all,

We have performed exome-sequencing in 70 samples and the mean coverage is not uniform across all the samples which is from 20-70x.

What diagnostics should be considered to consider a sample as well sequenced and which samples to be sequenced again?

  1. What should be the percentage of bases in on-target region? and How can we calculate the on-target bases?
  2. Is there any tradeoff that certain percentage of these on-target bases should have certain Xcoverage?

Are there any other parameters to be considered to evaluate the quality of data?

Any suggestions are valuable.

next-gen • 6.0k views
ADD COMMENT
0
Entering edit mode

All questions are related to what you want with your data. If you want do genotying with a sequencing depth of at least 3, it's most likely sufficient. However, genotyping with a higher sequencing depth (e.g. >=10) will probably result in data loss.

ADD REPLY
0
Entering edit mode

Thanks for your time. We would like to identify mutations and more specifically we should be able to identify reliable heterozygote calls. So, inorder to call reliable heterozygotes what percentage of the bases should be present on-target at ?X coverage.

ADD REPLY
0
Entering edit mode

Again.. depends on how you want it. We personally require a variant depth of at least 7. You could make a plot of your read depth per base/window (mapping or genotyping) and then decide what your cut-off will be.

ADD REPLY
0
Entering edit mode

thanks!! Plotting read depth per base for whole-exome at good resolution to visualize is crucial. could you suggest some tools to plot read depth per base for whole-exome?

ADD REPLY
0
Entering edit mode

With respect to coverage plots, I find the "read depth vs. % coverered bases" plot most informative. You can get startet here:

http://gettinggeneticsdone.blogspot.co.at/2014/03/visualize-coverage-exome-targeted-ngs-bedtools.html

ADD REPLY
1
Entering edit mode
10.1 years ago

GATK provides this tool to check whether the coverage is good enough for variant loci.

ADD COMMENT
1
Entering edit mode
10.1 years ago
DG 7.3k

We expect Exome sequencing to be highly variable, and we expect certain exons to be routinely not captured and not sequenced. It is important to be aware of this but it also means that setting any sort of arbitrary threshold for re-sequencing samples probably isn't going to work very well. To routinely call heterozygotes the default stat is usually 7-10x. Just based on statistical sampling and the thresholds used by variant callers and genotype callers, that usually is your lower bound. Certainly above that (15x and above) you should be getting reliable variant calls and generally pretty reliable genotyping.

You should definitely be doing routine analysis of depth of coverage of your targeted exons. Depending on your particular study you should be analyzing coverage across exons of genes that are say known to cause the disease you may be studying (if you are doing disease studies) or are highly likely candidates. Lack of coverage can indicate that you should re-sequence those gaps (maybe not the whole sample though as it doesn't guarantee you'll do any better the second time around). Lack of coverage can also be indicative of an actual deletion of all or part of an exon so keep that in mind too.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestions. What should be the on-target percentage (percentage of bases mapping on target). Some samples have 40% and some have 80% on-target %. Doesn't it mean that in samples with 40% we already lost 60% of the data before mapping and subsequent analysis? What should be the on-target % cut-off we should look/demand from the sequencing facility. Any suggestions would be valuable.

ADD REPLY
0
Entering edit mode

I would personally worry more about coverage stats than on target percentage myself. If you are covering your exons well regardless of off target mapping the sample was sequenced fine. Although those ranges seem quite wide...

ADD REPLY
0
Entering edit mode

Thanks again!! But when the on-target is 50% isn't it true that remaining 50% of the target is not covered? from which we cannot detect any variants. Isn't it loss of data?

ADD REPLY
0
Entering edit mode

On target percentage is usually a measure of the percentage of reads (or percentage of bases really) that map to the target regions, not a measure of the coverage of those regions. Some of the off target you expect and some you don't. You should get off-target mapping from the nucleotides that are 5' and 3' to target regions (upstream,downstream, untargetted portion of the UTRs, intronic) since you are talking about physical shearing of DNA and baited capture. The target region might be 200 nucleotides long but you'll capture a chunk that is 250 or so in size. So obviously you will sequence the flanking regions as well. Those will all be considered off target. You will also have issues from pseudogenes and such as well.

Off-target is not the same thing as coverage. You need to assess the read depth of nucleotides in your targeted regions in order to assess whether you are potentially missing variants.

ADD REPLY

Login before adding your answer.

Traffic: 2661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6