Question

Estimating RPKM or TPM in RNA-Seq data

0

Entering edit mode

10.1 years ago

darxsys ▴ 240

I am trying to test a software for abundance estimation, and I am trying to think of a way to generate my own set of reads, but knowing the expected values of benchmark relative abundances in advance to make sure I can compare the output to the benchmark. If I have a set of N transcripts, and I generate M reads from these transcripts knowing the origin of each read, can I, using that information estimate expected RPKM or TPM and how? Would TPM for a specific transcript just be num_reads_from_it / num_reads_overall * 10^6?

rpkm RNA-Seq • 3.8k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by darxsys ▴ 240

0

Entering edit mode

https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/

ADD REPLY • link 9.3 years ago by Lior Pachter ▴ 720

Ram · Accepted Answer · 2015-06-23

3

Entering edit mode

10.1 years ago

Rob 7.1k

If you know the number of reads originating from each transcript t (call it n_t), then you can compute TPM_t = 10^6 * [(n_t / l_t) / sum_t' (n_t' / l_t')]. Here l_t is the length of transcript t. Note, this is different than the formula you have above. That computes NPM (nucleotides per million), which is a measure of relative abundance that is not normalized for length. Also, I'd avoid FPKM / RPKM completely, there's no benefit relative to TPM, but there are some shortcomings (though it shouldn't really matter when assessing accuracy on simulated data in a single sample).

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by Rob 7.1k

1

Entering edit mode

Yes, you're right. I read that in this paper too and forgot about it. What I wrote is an estimate of NPM and can easily be converted to TPM or TPM can be calculated from your formulas. I am also aware of TPM benefits and RPKM drawbacks, but as you said, it should not make a whole lot of difference for my single sample, especially because I am not doing differential expression analysis. Thanks!

ADD REPLY • link 10.1 years ago by darxsys ▴ 240