Question

Calculating Exome Coverage

2

Entering edit mode

10.6 years ago

Coryza ▴ 30

*// Edit to make the post more clear (Mapping done via Bowtie2). My problem is that when counting Exome Coverage via coverageBed gives different results than via genomeCoverageBed. So I'm not sure if I'm doing something wrong, or which of the 2 methods is correct.

1) My first step is to build an .bed file of my Illumina Paired-End reads, returning the positions that only fall in targeted exon regions. I'm doing that via intersectBed -a [data.bed] -b [illuminaexonregions.bed].

2) My next step is to calculate the coverage of my new datafile via coverageBed -a [newdata.bed] -b [illuminaexonregions.bed]. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10993449.0

Nucleotides/Length*100 24.253740909 % Coverage.

3) The next step was to calculate the coverage of my new datafile via genomeCoverageBed -i [newdata.bed] -g [genome.txt] -d awk '$3>0 {print $1"\t"$2"\t"$3}'. I calculated some statistics:

Number of exons 214126 with a total length of 45326818

Number of matched nucleotides 10576907.0

Nucleotides/Length*100 23.3347661863 % Coverage.

Somehow there's a difference in matched nucleotides, which I can't explain. What am I doing wrong?

bedtools exome coverage bowtie2 • 9.2k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 10.6 years ago by Coryza ▴ 30

Ram · Answer 1 · 2014-04-03

4

Entering edit mode

10.6 years ago

Charles Warden 8.3k

If you already have a .bed file with your targeted regions (in this case, the exome), I think the easiest solution to to use CalculateHsMetrics in Picard (which will also provide additional statistics beyond just on-target coverage)

http://picard.sourceforge.net/command-line-overview.shtml#CalculateHsMetrics

ADD COMMENT • link 10.6 years ago by Charles Warden 8.3k

0

Entering edit mode

+1. CalculateHSMetrics is awesome (once you figure out the goofy BED-type format it demands). If you decide to use it and have any trouble, post a comment here and I'll help you out when I see the question.

ADD REPLY • link 10.6 years ago by Dan D 7.4k

0

Entering edit mode

Dan, I have trouble figuring out the format of interval files. Links lead to that info seem all dead. Please help. Thanks!

ADD REPLY • link 9.2 years ago by qysl • 0

0

Entering edit mode

Picard has been essentially folded in to the GATK. The new documentation is here: https://broadinstitute.github.io/picard/command-line-overview.html

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.2 years ago by DG 7.3k

Ram · Answer 2 · 2014-04-03

2

Entering edit mode

10.6 years ago

DG 7.3k

Assuming you are using the same files for both commands (from your examples it wasn't clear) it looks like you are only dealing with a few bases. IIRC genomeCoverageBed only counts the bases on reads that lie within the targeted regions. IntersectBed returns all reads that intersect with the targeted regions in the bedfile, and then you are counting all of those bases. So it would include the bases that lie within intronic regions, etc as long as part of the read overlaps the exons.

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 10.6 years ago by DG 7.3k

0

Entering edit mode

The first step is to intersectBed the derived data against the Illumina Targeted Exome file. In that case all the intronic regions are filtered out right? I thought only the regions that fall in the exon positions are returned, and in that case no intronic regions should be in my new file?

ADD REPLY • link 10.6 years ago by Coryza ▴ 30

0

Entering edit mode

When you run intersectBed it gives you all reads that intersect that bed file. Many of the reads though can span the intron/exon boundary. Intersectbed doesn't just give you the bases that intersect, at least I don't think so. I would use HSMetrics as suggested above, or one of the GATK tools for calculating coverage as you can get much more detailed metrics.

ADD REPLY • link 10.6 years ago by DG 7.3k