Estimating RPKM or TPM in RNA-Seq data
1
0
Entering edit mode
9.5 years ago
darxsys ▴ 240

I am trying to test a software for abundance estimation, and I am trying to think of a way to generate my own set of reads, but knowing the expected values of benchmark relative abundances in advance to make sure I can compare the output to the benchmark. If I have a set of N transcripts, and I generate M reads from these transcripts knowing the origin of each read, can I, using that information estimate expected RPKM or TPM and how? Would TPM for a specific transcript just be num_reads_from_it / num_reads_overall * 10^6?

rpkm RNA-Seq • 3.5k views
ADD COMMENT
3
Entering edit mode
9.5 years ago
Rob 6.9k

If you know the number of reads originating from each transcript t (call it n_t), then you can compute TPM_t = 10^6 * [(n_t / l_t) / sum_t' (n_t' / l_t')]. Here l_t is the length of transcript t. Note, this is different than the formula you have above. That computes NPM (nucleotides per million), which is a measure of relative abundance that is not normalized for length. Also, I'd avoid FPKM / RPKM completely, there's no benefit relative to TPM, but there are some shortcomings (though it shouldn't really matter when assessing accuracy on simulated data in a single sample).

ADD COMMENT
1
Entering edit mode

Yes, you're right. I read that in this paper too and forgot about it. What I wrote is an estimate of NPM and can easily be converted to TPM or TPM can be calculated from your formulas. I am also aware of TPM benefits and RPKM drawbacks, but as you said, it should not make a whole lot of difference for my single sample, especially because I am not doing differential expression analysis. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6