Getting Number Of Reads In Intervals With Bedtools
2
1
Entering edit mode
12.0 years ago
user ▴ 950

What is the correct way to get the total number of reads strictly contained in each interval in a GFF from a BAM file while enforcing strandedness? What I am looking for is very close to this intersectBed feature:

-c    For each entry in A, report the number of overlaps with B.
    - Reports 0 for A entries that have no overlap with B.
    - Overlaps restricted by -f and -r.

Except that I'd like the number of overlaps in A for each entry in B (i.e. the other way around). If I do:

intersectBed -abam mybam.bam -b mygff.gff -s -f 1 -wb

Then my understanding is that this will report the entry in B for each overlap with A. But I'd like each entry in B to be outputted exactly once, with the number of reads from A that are contained strictly within it. I'm not sure how to enforce strict containment here.

Is coverageBed the solution to this? Or multicov? I'm not sure how to enforce strict containment using coverageBed - it's not clear to me if that's the default from the docs. Thanks.

bedtools rna-seq bam samtools • 9.6k views
ADD COMMENT
4
Entering edit mode
12.0 years ago

Another way to do this is to use the bedmap application from the BEDOPS suite for this, where your intervals are reference regions and reads are the map regions.

First, convert the GFF to sorted BED with the gff2bed script and sort-bed:

$ gff2bed.py < mygff.gff | sort-bed - > myGff.bed

Convert the reads from BAM to sorted BED with the bam2bed script:

$ bam2bed.csh < mybam.bam | sort-bed - > myBam.bed

With bedmap, add the --echo operator to output each reference interval from myGff.bed. Add the --fraction-map 1 operator to enforce the overlap criteria that mapped reads must be contained entirely within a reference interval. Add the --count operator to yield the number of mapped reads within each reference interval, which are printed adjacent to the reference interval data.

In sum, as an example:

$ bedmap --echo --fraction-map 1 --count myGff.bed myBam.bed > myAnswer.bed

If you need to apply strandedness criteria, such as shifting reverse-strand reads by a base, the bedops --range L:R operator can be used to pre-adjust element coordinates asymmetrically, e.g. bedops --range -1:-1 --everything foo.bed > bar.bed will reduce each start and stop coordinate by one base. Here, you might split your inputs by strand, pre-adjust a stranded input, run two bedmap operations on each input and then collate the two results at the end.

ADD COMMENT
0
Entering edit mode

Thanks, but I'd like to do this with bedtools if possible...

ADD REPLY
1
Entering edit mode

No problem. Hopefully it helps someone else trying to do the same thing.

ADD REPLY
3
Entering edit mode
12.0 years ago

Currently, the coverage tool cannot enforce fractional coverage. However, one strategy would be to use intersect to enforce the strict containment and then pass only those alignments that meet this criteria to the coverage tool. The example below demonstrates this strategy.

bedtools intersect -abam mybam.bam -b mygff.gff -s -f 1 -ubam | \
    bedtools coverage -abam - -b mygff.gff
ADD COMMENT
0
Entering edit mode

thanks. are there plans to add fractional coverage to coverage? would be nice because then you can use the pipe to pass an on-the-fly processed gff, e.g. cmd gff | bedtools coverage -abam mybam.bam -b - - right now the pipe is taken up by the bam

ADD REPLY
2
Entering edit mode

Perhaps look into mkfifo and named pipes: http://www.linuxjournal.com/article/2156

ADD REPLY
0
Entering edit mode

Fantastic! Thank you again

ADD REPLY

Login before adding your answer.

Traffic: 1650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6