Hi,
I am counting gene expression from a RNA-seq data. Shall I consider the reads that are mapped to introns as well or just the exons only? What is general trend?
Thanks
Hi,
I am counting gene expression from a RNA-seq data. Shall I consider the reads that are mapped to introns as well or just the exons only? What is general trend?
Thanks
Generally, no, you don't count intronic reads. You're interested in measuring mRNA, not pre-mRNA or mRNA + random intronic repeat region showing expression. There's also the genomic contamination issue (though really, that should add little to the overall counts).
Now if you're specifically interested in looking at pre-mRNA ->mRNA processing, then the case would be different.
if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.
OK, thanks, I will try this one as well.
I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while multiBamCov" from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!!
I count reads in exons only as introns are spliced out. This is the default in htseq-count by the way.
if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while
multiBamCov
from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!Also, as example, for the gene FBgn0261504 (drosophila melanogaster), with TopHat version 1.2.0 mapping I got no read count (using bedtools multiBamCov) while with latest topHat2 version 2.0.13 I got unusually high as 561450. How such discrepancies possible, while I kept mapping parameters same!!
htseq-count ignores multimappers, multiBamCov doesn't. There are MANY rRNA copies in the genome.
Regarding tophat, you're using a different aligner between tophat1 and tophat2.