Hi all,
I have a basic question in my mind. What is the advantage of doing mapping RNA-Seq against genome? and How to do that?. I read couple of posts but not able to get the correct answer.
I would assume if i map against transcriptome, i will not get the information about intron and intergenic regions. Is there any gtf file for genome wide mapping?
Can you please suggest some links or information to read.
Thanks
However, worth noting that if you are mapping to the transcriptome, you need to use a proper transcript quantification program, like
salmon
,kallisto
orRSEM
, rather than simple read counting. This is because most transcribed regions of the genome are part of multiple transcripts. This means that if you use a simple read aligner, like BWA, Bowtie2 or HISAT2 to align to the transcriptome, the vast majority of reads will map to multiple locations (and be ignored by most read counters under default settings, or counted twice under alternate settings).If you are using a simple align and count strategy, genomic alignment can allow you distinguish a read that maps to one location in the genome, but that part is part of multiple transcripts, from a read that maps to multiple locations in the genome, and thus its identity cannot be established.
Salmon, Kallisto, RSEM etc, don't suffer this problem because the use the reads to estimate the expression using an EM model, rather than just counting reads.
Hi,
Thanks for the explanation and valuable links. The reason why i'm asking this question is that in the 3rd column of GTF, i could not find intron or intergenic information. Moreover, when i use feature counts it takes the exon counts or gene level. which means i'm loosing the information about introns and intergenic right. correct me if i'm wrong. I'm using STAR for aligning. hg38 genome and GTF and i have alignIntronMin ,alignIntronMax parameters ON. The BAM file has some good amount of reads in the introns. How can i extract the counts that belongs to introns or intergenic?
All of the standard bulk RNA-seq quantification pipelines discard reads mapping to intronic and intergenic regions. Depending on your biological question there is an argument to be made for counting intronic reads, although it's generally considered that intronic reads are a better reporter of rate of transcription, rather than steady state RNA levels, as the primary transcript is usually considered short lived. Either way, I'm not aware of an easy way of doing this, other than doing some GTF-fu to add in the introic regions to the GTF records (for something like STAR/featureCounts) or the fasta for something like salmon.
As for intergenic reads .... how would ou like to count them? featureCounts reports a total number of reads that didn't overlap any annotations.
Becareful if there are a lot of intergenic reads, like if you look in IGV at your BAM files, and there is just a continuous low level read depth across the whole genome, and this is roughly the same in intronic regions. This might suggest that you have genomic DNA contamination in your RNA prep. I'd expect around 1/3 of reads from a total RNA prep to map to exons, with a good chunk of the remaining mapping to introns, and 50%-2/3 of reads for a poly A sample.
Thanks. yeah, this is the case right in the RNA-Seq introns are not considered. First i apologise for the silly question. I want to quantify the introns for two reasons. First, to quantify the intron retention (only). Second, since i have total RNA seq it is highly likely to introns right compare to poly-A selection. I checked in the intergenic regions, it is fine(good suggestion from you). So i would assume they are rather pre mRNAs and not yet fully spliced and might have some information like gene length or biotypes.
As for intergenic reads .... how would ou like to count them? featureCounts reports a total number of reads that didn't overlap any annotations
We can compute the intergenic regions right like bedtools-complement or some other tools.
Yep, as I said, there is no standard way to do it, but you can use some GTF-fu to create them yourself.
thanks for the information and your kindness