Count reads within region
2
2
Entering edit mode
9.4 years ago

Hi,

I'd like to ask for help with finding an efficient way of counting reads from a bam file that lie within an interval (from a bed file). The problem is I only want reads that lie entirely within a given interval (no matter how long they are, or what percentage of the given interval they cover). The intervals may be overlapping. I'm dealing with amplicon sequencing. Currently, the only way I am able to do this is by separately intersecting (bedtools) the bam with each region in my bed file and then using samtools -c. This approach however takes too long.

To me this seems like a very basic problem which I believe must have been solved but I'm unable to google the right solution.

Thanks for any suggestions.

amplicon samtools reads bedtools • 5.8k views
ADD COMMENT
5
Entering edit mode
9.4 years ago

Using my tool samjs https://github.com/lindenb/jvarkit/wiki/SamJS

samtools view -bu -F 4  input.bam seq2:250-300 |\
java -jar jvarkit-git/dist-1.133/samjs.jar -e 'record.alignmentStart >= 250 && record.alignmentEnd <= 300' |\
samtools view -Sc -
ADD COMMENT
5
Entering edit mode
9.4 years ago

Using BEDOPS bedmap --count:

$ bedmap --echo --count --fraction-map 1 intervals.bed <(bam2bed < reads.bam) > answer.bed

The file answer.bed contains each interval and the number of reads that are contained entirely within that interval.

As long as inputs are not nested (overlapping is fine), you can add the --faster option and gain a further performance boost:

$ bedmap --faster ...
ADD COMMENT
0
Entering edit mode

Thanks,

your solution is indeed the fastest. However, I eventually returned to my original bedtools intersect solution adding Pierre's tweak with samtools view -bu which is a fast command on an indexed bam and reduces the searching space efficiently. It also allowed me to play with the flag option and print forward and reverse reads separately. Thanks again guys.

ADD REPLY
0
Entering edit mode

Thanks for the feedback!

ADD REPLY

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6