Using BEDOPS bedmap
and bam2bed
, you can calculate a BED file containing reference elements (genomic windows over which you want to calculate coverage) and a semi-colon-delimited list of read overlap sizes in the final column:
$ bedmap --echo --echo-overlap-size --delim '\t' windows.bed <(bam2bed reads.bam) > coverage.bed
Once you have this, you can use Unix tools like paste
and awk
to paste the first through N-1 columns together with the median read overlap size calculated from the final, Nth column:
$ paste <(awk '{ for (i=1; i<(NF-1); i++) { printf("%s\t", $i); } printf("%s", $(NF-1)); }' coverage.bed) <(awk '{ n=split($NF, a, ";"); n=asort(a); if (n % 2) { printf("%d", a[(n + 1) / 2]); } else { printf("%d", a[(n / 2)] + a[(n / 2) + 1]) / 2.0); } }' coverage.bed) > answer.bed
In answer.bed
, the first through N-1 columns contain the window interval, and the last column contains the median overlap size.
You can put in as many metrics here as you want, mean, etc. just by adjusting the second awk
command to print those statistics.
Oh that's brilliant! Includes all sorts of summaries.
reformat.sh
from BBMap suite also produces summaries of all sorts. Look at the histogram parameters.I have tried
bbmap pileup
and it was great. It's blazing fast. Unfortunately, it only gives the total number of mapped reads (sum) over windows. It would've been awesome if it could provide other summary metrics (mean, median etc) as well. I am not familiar withreformat.sh
, but I will look into it.