Question

Shoud I use "assigned reads" or total reads (assigned + unassigned) to the RPKM value?

0

Entering edit mode

10.1 years ago

gustavoborin01 ▴ 90

Dear all,

I'm recalculating the RPKM value of a RNASeq data on Rsubread through featureCounts function, and I'd like to know if should I use just the "assigned" reads or the total reads, including "unassigned ambiguity, multimapping..." (see below), in the RPKM formula. Looking for the answer in forums and in the Mortazaviet al.(2008), I've just find out that "N is the total number ofmappable reads in the experiment". So, could anybody please help in this regards?

RPKM = N/(L*T)

where:

N: number of reads assigned to a gene
L: length of the gene (kb)
T: total mapped reads (Millions)

                           T_reesei_F24.1_GGCTAC_L008_R1_001.cleanreads.fastq.gz_tophat2.F24h.1_accepted_hits.bam      
Assigned                   32270962
Unassigned_Ambiguity       6896
Unassigned_MultiMapping    116803
Unassigned_NoFeatures      10751746
Unassigned_Unmapped        0
Unassigned_MappingQuality  0
Unassigned_FragementLength 0
Unassigned_Chimera         0

Thanks in advance!

rpkm RNA-Seq R Rsubread • 5.2k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by gustavoborin01 ▴ 90

1

Entering edit mode

Well, RPKM is calculated with respect to total number of mapped reads.

If you are working on uniquely mapped reads on genome then you should only consider Assigned reads.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Thank you all! I really appreciated your answers!

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by gustavoborin01 ▴ 90

Ram · Answer 1 · 2014-10-28

3

Entering edit mode

10.1 years ago

Devon Ryan 104k

If you include things like Unassigned_Ambiguity in the numerator, then include it in the denominator. Likewise with Unassigned_MultiMapping. Unassigned_NoFeatures could be left as part of the denominator, though I wouldn't include it since that'll bias things by sample quality. Having said that, I wouldn't calculate RPKMs at all, since they shouldn't be used in my opinion, by perhaps you have a good reason.

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

2

Entering edit mode

The statOmique consortium tested different normalization methods, RPKM is the worst one: http://bib.oxfordjournals.org/content/14/6/671.long

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Asaf 10k

2

Entering edit mode

This really can't be emphasized enough. RPKMs really are a bad solution in search of a problem.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

I entirely agree Devon.

But the problem is that , if we want to compare gene expression level e.g. across the cell lines then other than RPKM, what should we trust on?

I think RPKM is bad solution for smaller transcripts (<500bps).

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

You'd be better off with counts. The really tricky comparison is between organisms, but that's largely an unsolved problem (last I looked, at least).

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

In order to compare between the organisms, would it be better that if we consider only those reads which are mapping uniquely to both of the genomes.

then count the reads in features divided by total number of mapped reads

then normalize them by their quantiles

would then data be ready for comparison?

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

The issue is more how things might be meaningfully normalized when the gene sets aren't even the same. But anyway that's off topic to this post.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, Certainly. I was just curious.

Thanks

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k