Hi,
I'd like to ask for help with finding an efficient way of counting reads from a bam file that lie within an interval (from a bed file). The problem is I only want reads that lie entirely within a given interval (no matter how long they are, or what percentage of the given interval they cover). The intervals may be overlapping. I'm dealing with amplicon sequencing. Currently, the only way I am able to do this is by separately intersecting (bedtools) the bam with each region in my bed file and then using samtools -c
. This approach however takes too long.
To me this seems like a very basic problem which I believe must have been solved but I'm unable to google the right solution.
Thanks for any suggestions.
Thanks,
your solution is indeed the fastest. However, I eventually returned to my original bedtools intersect solution adding Pierre's tweak with
samtools view -bu
which is a fast command on an indexed bam and reduces the searching space efficiently. It also allowed me to play with the flag option and print forward and reverse reads separately. Thanks again guys.Thanks for the feedback!