RPKM and TPM calculation
1
2
Entering edit mode
5.3 years ago
friasoler ▴ 50

Dear colleagues

I have used three different ways to calculate RPKM from my data, one from a colleague, from a post in internet and my own way and we three have different outputs, obviously just one is the right one; but can you do your own calculation and shared with me? What would be the TPM you will estimate for these three samples?

Thanks in advance
Roberto

gene            t1   t2     t3   length
A                 4    2       7   2565

total lib    26396044 22124753 27522357
counts
RNA-Seq RPKM • 3.9k views
ADD COMMENT
1
Entering edit mode

Why do you need these values? For years we have come to the conclusion that RPKM is not the way to go (see Biostars discussions and advices from the past years), they are not recommended to use in RNAseq analysis. So my advice is not to bother about your differences, but ask yourself what you want from your data, and find the right way to analyze it.

ADD REPLY
1
Entering edit mode

It's not entirely clear what exactly you're after -- do you want to know THE one and only correct way of calculating RPKM? Or do you just want to add another entry to your collection of, presumably, different formulae? It'd be more helpful if you (i) included the formulae you've already tried, (ii) included the sources for those and (iii) stated your actual aim (as alluded to by Benn).

The TPM that I know cannot be calculated based off of a single gene value alone since it takes into account the abundances and lengths of the all other transcripts, too.

ADD REPLY
1
Entering edit mode

obviously just one is the right one

Actually, no, it is not obvious one of them is the right alternative: all three could be wrong. As you didn't show the three alternative methods, we can't advise which one is best, or even correct.

ADD REPLY
0
Entering edit mode

Here is a nice paper to read https://www.ncbi.nlm.nih.gov/pubmed/22872506

ADD REPLY
1
Entering edit mode
5.3 years ago
cpm = gene_counts / (lib_size / 1e6)
rpkm/fpkm = cpm / (gene_length / 1e3)

But why not just use pre-implemented functions like edgeR::cpm() or edgeR::rpkm() which will do it for an entire matrix.

Bwt if you want TPM values you will need to use tools which directly generat those - such as Kallisto, Salmon or RSEM.

ADD COMMENT
3
Entering edit mode

Just for completeness sake for future users, here are some formulae which are supported, e.g. by Oshlack & Wakefield and the above mentioned Wagner et al.

ADD REPLY
0
Entering edit mode

For TPM, Wagner also includes 'rl' in the numerator for the "average number of nucleotides mapped per read". The formula for TPM above is good but it's not descriptive enough from the mathematics perspective, what is the L_k mean? what is the j iterating? Which is how Wagner describes it so well in the paper. The reason why so many people are confused is because it's never fully explained. If we use variables in a formula we need to explain what those variables stand for.

ADD REPLY
0
Entering edit mode

l_k is a typo, should probably be l_j based on the verbose description under 1.

ADD REPLY

Login before adding your answer.

Traffic: 1624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6