Entering edit mode
8.3 years ago
PAn
▴
20
Hi all,
I have RNASeq aligned bam files from tophat for human samples and I need to find out the percent reads mapped to rRNA( apart from mapped reads to exonic, intronic and intergenic). Can anyone suggest on how to get these alignment statistics for the rRNAs ( based on feature tag rRNA in 2nd column of the gtf file)? I could find lot of posts about the exonic, intronic and intergenic % mapped but not rRNA. I have the gtf file with the annotations for all regions.
Thanks a lot!
I assume your GTF file does not have annotations for rRNA? Since precise location/number of rDNA repeats in human genome is not determined you may have to align your data to human rDNA repeat to estimate the number of reads mapping there.
The annotation file does have annotation for the rRNAs, its 2nd column has "rRNA" tags/column value based on which I created a bed file as well. I am just not sure about what are the right steps or any existing tools. Thanks!
You can use featureCounts or HTSeq-count to do the counting in that case. Take a look at the manuals for the tools first.
I would not go for this option. Sure, you'll get some reads landing in this features. But most of the reads mapping to rRNA will be discarded during mapping due to to many mapping location (rRNA repeats). I'd go for genomax2's first suggestion: download the rRNA-sequences (5S..45S, and the two MT-based), build a bowtie2 index and run bowtie2. The mapping rate is equal your rRNA content.
Thanks all! My group was using a script that would calculate the %exonic, %intronic and %intergenic by calculating the exonic bed file from the exon reads in gtf file, remove the exonic from the genic region etc and further calculate the intergenic, intronic etc using bedtools complement, merge etc. And I am exploring options, while trying to calculate the rRNA bed file from the gtf file. I will look the alignment using genomax2's suggestions.
I was trying to figure out something similar too: RNA-seq rRNA contamination