Question

How To Count The Reads Mapped To Exon And Exon-Exon Junctions?

3

Entering edit mode

11.5 years ago

lwc628 ▴ 230

I want to make a list of reads and binary value whether it was mapped to exon or exon-exon junctions, with the ultimate purpose of counting how many reads are mapped to exons and exon-exon junctions.

I have read people reports this statistics in papers, but I couldn't find exactly how to do this anywhere online.

Let's say I have a bam file(alignment) and gff(annotations of exons).

Only way I could think of now is to manually parse the sam files( by sorting and filtering reads that are properly mapped) and merge the coordinates with the gff file using bedtools.

But Is this a right way to do this? How would you do it? Is there a standard way to do this task? Is there a standard script or program for doing this?

exon sam bam reads gff • 16k views

ADD COMMENT • link updated 5.4 years ago by m13113153781 • 0 • written 11.5 years ago by lwc628 ▴ 230

score 7 · Answer 1 · 2013-06-07

Since you are referring to exon-exon junctions I am going to assume you are analyzing RNA-seq data. If you generated your BAM file with tophat, you will already have a junctions.bed file. This file contains a summary of all the exon-exon junctions found and the number of reads that mapped to each.

As for generally getting exon/transcript counts by starting with a BAM file and GTF/GFF, you might consider the htseq package. As you mentioned, bedtools can also be used to solve this problem. The bamtobed function may allow you to avoid manually parsing the BAM file.

These discussions sounds relevant:

score 1 · Answer 2 · 2013-06-09

Let's say I have a bam file(alignment) and gff(annotations of exons).

You can use the BEDOPS bedmap application and its --indicator operand or the --count operand to report any or count all reads which overlap exons.

First, use the BEDOPS conversion utilities to render your inputs into sorted BED files:

$ bam2bed < reads.bam > reads.bed
$ gff2bed < exons.gff > exons.bed

Then perform the indicator function operation:

$ bedmap --echo --indicator exons.bed reads.bed \
    | awk -F '|' '($2 == 1)' - \
    > exons-with-overlapping-reads.bed

The file exons-with-overlapping-reads.bed is a sorted BED file that contains exons that have one or more overlapping reads. The awk statement just filters for results that meet this condition.

If you want the actual count of reads over an exon, use the --count operator and skip the awk test:

$ bedmap --echo --count exons.bed reads.bed \
    > exons-with-number-of-overlapping-reads.bed

The file exons-with-number-of-overlapping-reads.bed is a sorted BED file that contains exons and the number of reads which overlap each exon (by one or more bases).

(The --indicator operand is the same as --count, where the indicator result is 1, or "true" if the mapped-element count is greater than zero, and 0, or "false" otherwise.)

If you're interested in subsets of the exons, use bedops --range or grep or other approaches (awk, Perl, etc.) to filter or adjust the coordinates in your exons.bed file, as needed. Then just run the bedmap steps, as previously described.

If you'd like to learn more about this toolkit, I have posted a brief summary of BEDOPS Bedops: The Fastest, Most Scalable, And Easily-Parallelizable Genome Analysis Toolkit!.

score 1 · Answer 3 · 2013-06-10

1

Entering edit mode

11.5 years ago

ewre ▴ 250

bedtools is the choice for me to cope with the task u are dealing with. usually I get multiple bam files' coverage info with

bedtools -multicov a.bam b.bam xxx > multicov.txt

ADD COMMENT • link 11.5 years ago by ewre ▴ 250

0

Entering edit mode

you have an extra - in your command. The correct way would be bedtools multicov a.bam b.bam annotation.bed > multicov.txt.

ADD REPLY • link 5.2 years ago by Dataman ▴ 380

score 0 · Answer 4 · 2019-06-20

0

Entering edit mode

5.4 years ago

m13113153781 • 0

here is a R package could do this ,https://bioconductor.org/packages/devel/bioc/vignettes/IntEREst/inst/doc/IntEREst.html

ADD COMMENT • link 5.4 years ago by m13113153781 • 0