Question

Calculate RNAseq coverage of a transcript

2

Entering edit mode

7.0 years ago

andrey.v.shubin ▴ 70

Hi Biostars community,

I have bam files with RNA-seq results from ~1000 of single cell samples. For every transcript in each sample, I would like to calculate what percentage of spliced transcript's length is covered by aligned reads. Besides, It would be great to have this value for coding part of transcripts only, or, ideally, for every exon.

What would be the most straightforward way to do this?

Thanks a lot!

RNA-Seq transcriptome • 3.6k views

ADD COMMENT • link 7.0 years ago by andrey.v.shubin ▴ 70

2

Entering edit mode

To rephrase: per sample, you want to count how many reads overlap with an exon? In that case: featureCounts. Calculating a percentage afterward shouldn't be too hard, e.g. using R/Python/....

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

2

Entering edit mode

Agree with @WouterDeCoster, you can try to use featureCounts. You need to get an annototation file (GTF file), in featureCounts use isGTFAnnotationFile = TRUE (to set up your GTF file), GTF.featureType = "exon" (for read summarization), GTF.attrType = "exon_id" (to group features).

Otherwise, to quantify abundances of transcripts from RNA-Seq data, Kallisto could be an other way to do the trick (check this here https://pachterlab.github.io/kallisto/manual).

ADD REPLY • link 7.0 years ago by Bastien Hervé 5.9k

score 1 · Accepted Answer · 2017-12-07

1

Entering edit mode

7.0 years ago

andrey.v.shubin ▴ 70

Thanks to WouterDeCoster and Bastien Herve for their answers!

I have found out that bedtools does exactly what I need:

bedtools coverage -hist -abam "your bam file" -b "your bed file" > "file with results"

ADD COMMENT • link 7.0 years ago by andrey.v.shubin ▴ 70