Let's say I have a bam file(alignment) and gff(annotations of exons).
You can use the BEDOPS bedmap
application and its --indicator
operand or the --count
operand to report any or count all reads which overlap exons.
First, use the BEDOPS conversion utilities to render your inputs into sorted BED files:
$ bam2bed < reads.bam > reads.bed
$ gff2bed < exons.gff > exons.bed
Then perform the indicator function operation:
$ bedmap --echo --indicator exons.bed reads.bed \
| awk -F '|' '($2 == 1)' - \
> exons-with-overlapping-reads.bed
The file exons-with-overlapping-reads.bed
is a sorted BED file that contains exons that have one or more overlapping reads. The awk
statement just filters for results that meet this condition.
If you want the actual count of reads over an exon, use the --count
operator and skip the awk
test:
$ bedmap --echo --count exons.bed reads.bed \
> exons-with-number-of-overlapping-reads.bed
The file exons-with-number-of-overlapping-reads.bed
is a sorted BED file that contains exons and the number of reads which overlap each exon (by one or more bases).
(The --indicator
operand is the same as --count
, where the indicator result is 1
, or "true" if the mapped-element count is greater than zero, and 0
, or "false" otherwise.)
If you're interested in subsets of the exons, use bedops --range
or grep
or other approaches (awk
, Perl, etc.) to filter or adjust the coordinates in your exons.bed
file, as needed. Then just run the bedmap
steps, as previously described.
If you'd like to learn more about this toolkit, I have posted a brief summary of BEDOPS Bedops: The Fastest, Most Scalable, And Easily-Parallelizable Genome Analysis Toolkit!.
you have an extra
-
in your command. The correct way would bebedtools multicov a.bam b.bam annotation.bed > multicov.txt
.