Hi, currently, I am trying to calculate the coverage for my genes of interest (i.e. number of reads falling in or overlapping by the gene of interest) from DNA-seq data in a BAM file. Genes of interest are annotated in a BED file.
There are several answers on Biostar suggesting to use
bedtools coverage
e.g., bedtools coverage -abam sample.bam -b exons.bed -counts
. However, while checking the Samtools' manual, I stumbled upon samtools bedcov
with the following description:
read depth per BED region. Reports read depth per genomic region, as specified in the supplied BED file.
which can be used like this: samtools bedcov gene.bed sample.bam
.
I have tested these two and I am getting different results! For instance, for OR4F5
gene I get the following results (using the same BAM file for both):
chr1 69091 70008 "OR4F5" 61 #from bedtools coverage
and
chr1 69091 70008 "OR4F5" 4714 #from samtools bedcov
where 61
is very different than 4714
!
Another observation is that the result from Samtools
is instantaneous for the above test while Bedtools
takes a lot of time to produce the result. The speed itself has led me to think of using samtools bedcov
. However, I was wondering whether there is catch that I am overlooking! I tried to find more information about samtools bedcov
but I was not able to find anything more than that one-liner description.
I would be happy to hear about your feedback on this.
[EDIT]: This is how it looks like in IGV. It seems that Bedtools
results is closer to reality.
That's peculiar. My gut feeling says the
samtools bedcov
is reporting the total number of reads within a region. However testingsamtools view BAM <region> | wc -l
gives me different answers. I'm not familiar withbedtools coverage
but the value of 67 seems close to high coverage libraries.Another option using samtools that I know works.
samtools depth
will give you per base pair depth of coverage. You can take the median (or mean) coverage within a locus. If you're parsing regions that are large (>10kb) you can obtain the read countssamtools view BAM <region> | wc -l
and then determine the coverage by this equation (READ COUNTS / BASES SPANED) * AVERAGE READ LENGTHUsing either
samtools view
orsamtools depth
you can also limit the reads to ones with mapping qualities greater than 10 (or whatever value you like). You can also limit the read length insamtools depth
Thanks for your reply!
bedtools coverage
does not provide option for filtering based on mapping quality. The value of61
is achieved when one counts all mapped reads. If one uses your first approach and also filters reads with mapping score below 10, then the result would be6
. I have now noticed thatbedtools multicov
can also be used for the same purpose with support for filtering based on mapping quality. Also it only counts non duplicate reads.How does the data looks in IGV in that region ? Can you post the exact commands ?
Let me check! I'll get back to you asap!
Now, I have edited the post with an image from the IGV.