I am trying to get the mean depth coverage for bam files with bedtools.
So far I've been using the command:
bedtools coverage -hist -abam bamfile.bam -b targets_exome.bed > bamfile.bam.hist.all.txt
But I find myself with a result similar to:
1 9998 10030 HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/1 1 - 9998 10030 0,0,0 1 32, 0, 0 32 32 1.0000000
1 9998 10030 HWI-H217:64:C48Y8ACXX:4:1110:12530:87823/2 1 - 9998 10030 0,0,0 1 32, 0, 0 32 32 1.0000000
1 9999 10096 HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/1 0 - 9999 10096 0,0,0 1 97, 0, 0 97 97 1.0000000
1 9999 10096 HWI-H217:64:C48Y8ACXX:4:1303:11532:70370/2 0 - 9999 10096 0,0,0 1 97, 0, 0 97 97 1.0000000
where the different features are overlapping (see #1 and #3).
I want to compute average coverage, defined as the number of reads covering the captured coding sequence of a haploid reference. How to do it in this case?
Thanks Kevin. My doubt is, to get the average coverage, since I have overlapping regions, it is necessary to get the per-base read depth, right?
Hey, it really depends on what exactly you want.
If you want BEDTools to treat each BED region independently (and calculate the mean read depth in each), then use
bedtools coverage -a BED -b BAM -mean > MeanCoverageBED.bedgraph
. This will ignore the fact that regions may be overlapping, I believe.Another option is to first edit your BED regions so that overlapping regions are merged into a single entry, using BEDTools merge. After you do that and re-run the above command for determining mean read depth, you'll get a more informative mean for your regions.
You can also output the per-base read-depth, but you'll then have to calculate the mean yourself (possibly using
awk
again).BEDTools is very powerful but one has to exactly sure what they want to get and which specific commands to use. There are many other tools for computing coverage/read-depth metrics too.
Thanks..
bedtools merge
is exactly what I was looking for!Great! Glad to have helped out
What version of bedtools is that? Version: v2.17.0 doesn't have the "-mean:" option. only -hist
nevermind, found it in v2.27.1
What is the resource requirement for coverage calculation? I am trying on some large windows (1Mbp) segments, it crashed bedtools version 2.24.0 several times when I tried.
This suprises me - given bedtools should be handle genome coverage, it shouldn't crash when calculating some 1Mb windows' coverage? Or it's a bug in 2.24.0
Thanks for your insight in advance!
Should not be a bug. Can you let us know the command that you are running? Also, the size of your BAM file, and number of regions in your BED file?
Hi, I used the command
bedtools coverage -a BED -b [BAM] -mean > MeanCoverageBED.bedgraph
to generate mean depth of coverageAfter running a while it showed a message
killed
You ran out of compute resources.
My system has 400GB free space and 128 GB of RAM
On which system is it? - a managed cluster? It may have reached the max compute time specified by your admins.
My bam file
Unique bed
Hi Kevin,
How do i calculate % of coverage for each Exons from the BED file and BAM as input.?
thanks