You could do:
$ bedmap --echo --count regions-of-interest.bed <(bam2bed < reads.bam) > roi-with-read-counts.bed
The --echo
option reports the region of interest, and the --count
operation counts the number of reads that overlap each region of interest. The results are written to a file called roi-with-read-counts.bed
.
To speed this up further, if you have a computational cluster, you can split the task by chromosome and run each chromosome on separate nodes, e.g., convert the BAM to BED via a SLURM cluster:
$ bam2bed_slurm indexed-reads.bam reads.bed
Or convert via an SGE cluster:
$ bam2bed_sge indexed-reads.bam reads.bed
Or convert via GNU Parallel:
$ bam2bed_gnuParallel indexed-reads.bam reads.bed
Then do per-chromosome mapping on each node:
$ bedmap --echo --count --chrom ${chromosome} regions-of-interest.bed reads.bed > roi-with-read-counts.${chromosome}.bed
The --chrom
option takes a chromosome name ${chromosome}
and uses the sort order of the input BED files to quickly jump to the right place in the BED file.
If you just want basic stats use samtools idxstats. It very fast because it only reads the bam index.