I'm trying to produce a simple window => coverage table from a bam file. I produced (non-sliding) windows with bedtools makewindows
and then ran bedtools coverage -abam alignment.bam -b reference.windows > coverage
. The problem is that I cannot make sense of the output I got. Here's a sample:
I 0 215 SRR1297046.176411/2 23 + 0 215 0,0,0 1 215, 0, 1 215 215 1.0000000
I 0 216 SRR1297046.2210768/1 23 + 0 216 0,0,0 1 216, 0, 1 216 216 1.0000000
I 1 222 SRR1297046.178368/1 24 + 1 222 0,0,0 1 221, 0, 1 221 221 1.0000000
The first field is the chromosome name and the fourth one is the read's, but what are the others? This output is inconsistent with the ones described here. How can I convert this output into a "window => coverage" table?
I'd appreciate any help.
Thank you Denvon! That worked. I tried
samtools depth -b reference.windows alignment.bam
but it gave me the coverage at each position, not windows. Did you mean that it'd be more accurate if I post-processed this output to get what I want?I also tried
samtools bedcov reference.windows alignment.bam
, which gave me the coverage at each window. However, the coverage values were way bigger (in hundreds of thousands) than those output by bedtools (in thousands). Do you know why that is?Ah, the "bedcov" command is new. It reports the sum of the per-base coverage, so just divide that by the window size to get the average coverage.
I think both samtools depth and bedcov will double count overlapping read pairs, unless the overlapping part has been clipped in one of the two reads (I use clipOverlap for that). However, samtools mpileup should do the right thing unless the option --ignore-overlaps (disable read-pair overlap detection) is set.
But if you want to count the fragments per window (as in a typical RNA-Seq or ChIP-Seq experiments) than you could use bedtools coverage using only read 1 or better merge read pairs to a single fragment.
Indeed, setting
-Q 1
with bedcov (and-q 1
with depth) would presumably get around this, though obviously one is then filtering out phred 0 bases as well.Mmm... -q 1 will filter for base quality but it will not affect the overlapping issue.
Overlaps are removed in mpileup internally by setting the phred scores to 0. It's a nasty hack, but that's how it's done.
For the record, using samtools mpileup doesn't yield the same result as using
samtools bedcov -q 1
ordepth -Q 1
. At least according to my results. I post-processed the outputs from all of the commands and found that mpileup gives a lower coverage than bothdepth -Q 1
andbedcov -q 1
(which give the same coverage).Here's the script I used to get the average coverage per window from
mpileup
's output. Takes the window size (non-sliding) as an argument. Can be modified easily to work withdepth
.Example usage:
I hope someone would find it useful:
Mpileup performs significantly more filtering, which is why you get discordant results.
See the defaults for
-A
,-d
and-Q
(as well as-x
).