Question

shall intronic reads be considered in RNAseq count?

2

Entering edit mode

10.0 years ago

bcn_yaar ▴ 20

Hi,

I am counting gene expression from a RNA-seq data. Shall I consider the reads that are mapped to introns as well or just the exons only? What is general trend?

Thanks

rna-seq • 4.6k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by bcn_yaar ▴ 20

0

Entering edit mode

I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while multiBamCov from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!

Also, as example, for the gene FBgn0261504 (drosophila melanogaster), with TopHat version 1.2.0 mapping I got no read count (using bedtools multiBamCov) while with latest topHat2 version 2.0.13 I got unusually high as 561450. How such discrepancies possible, while I kept mapping parameters same!!

tophat --solexa-quals -m 2 -g 1 --butterfly-search -p 10 --GTF genes.gtf --no-novel-juncs --output-dir Result $BOWTIE_INDEXES/genome MT.fastq
tophat2 --solexa-quals -m 2 -g 1 -p 10 --GTF genes.gtf --no-novel-juncs --output-dir Result $BOWTIE_INDEXES/genome MT.fastq

ADD REPLY • link updated 3.4 years ago by Ram 44k • written 10.0 years ago by bcn_yaar ▴ 20

0

Entering edit mode

htseq-count ignores multimappers, multiBamCov doesn't. There are MANY rRNA copies in the genome.

Regarding tophat, you're using a different aligner between tophat1 and tophat2.

ADD REPLY • link updated 3.4 years ago by Ram 44k • written 10.0 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-11-13

4

Entering edit mode

10.0 years ago

Devon Ryan 104k

Generally, no, you don't count intronic reads. You're interested in measuring mRNA, not pre-mRNA or mRNA + random intronic repeat region showing expression. There's also the genomic contamination issue (though really, that should add little to the overall counts).

Now if you're specifically interested in looking at pre-mRNA ->mRNA processing, then the case would be different.

ADD COMMENT • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.

ADD REPLY • link 10.0 years ago by bcn_yaar ▴ 20

1

Entering edit mode

Why not make your life easier and (1) just download the GTF file from Ensembl and then (2) use featureCounts to get your counts?

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

OK, thanks, I will try this one as well.

I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while multiBamCov" from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!!

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by bcn_yaar ▴ 20

score 0 · Answer 2 · 2014-11-13

0

Entering edit mode

10.0 years ago

dariober 15k

I count reads in exons only as introns are spliced out. This is the default in htseq-count by the way.

ADD COMMENT • link 10.0 years ago by dariober 15k

0

Entering edit mode

if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.

ADD REPLY • link 10.0 years ago by bcn_yaar ▴ 20