Calculating Exome Coverage
3
2
Entering edit mode
10.6 years ago
Coryza ▴ 30

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

bedtools exome coverage bowtie2 • 9.2k views
ADD COMMENT
4
Entering edit mode
10.6 years ago

If you already have a .bed file with your targeted regions (in this case, the exome), I think the easiest solution to to use CalculateHsMetrics in Picard (which will also provide additional statistics beyond just on-target coverage)

http://picard.sourceforge.net/command-line-overview.shtml#CalculateHsMetrics

ADD COMMENT
0
Entering edit mode

+1. CalculateHSMetrics is awesome (once you figure out the goofy BED-type format it demands). If you decide to use it and have any trouble, post a comment here and I'll help you out when I see the question.

ADD REPLY
0
Entering edit mode

Dan, I have trouble figuring out the format of interval files. Links lead to that info seem all dead. Please help. Thanks!

ADD REPLY
0
Entering edit mode

Picard has been essentially folded in to the GATK. The new documentation is here: https://broadinstitute.github.io/picard/command-line-overview.html

ADD REPLY
2
Entering edit mode
10.6 years ago
DG 7.3k

Assuming you are using the same files for both commands (from your examples it wasn't clear) it looks like you are only dealing with a few bases. IIRC genomeCoverageBed only counts the bases on reads that lie within the targeted regions. IntersectBed returns all reads that intersect with the targeted regions in the bedfile, and then you are counting all of those bases. So it would include the bases that lie within intronic regions, etc as long as part of the read overlaps the exons.

ADD COMMENT
0
Entering edit mode

The first step is to intersectBed the derived data against the Illumina Targeted Exome file. In that case all the intronic regions are filtered out right? I thought only the regions that fall in the exon positions are returned, and in that case no intronic regions should be in my new file?

ADD REPLY
0
Entering edit mode

When you run intersectBed it gives you all reads that intersect that bed file. Many of the reads though can span the intron/exon boundary. Intersectbed doesn't just give you the bases that intersect, at least I don't think so. I would use HSMetrics as suggested above, or one of the GATK tools for calculating coverage as you can get much more detailed metrics.

ADD REPLY

Login before adding your answer.

Traffic: 1003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6