Entering edit mode
5.0 years ago
star
▴
350
I would like to do normalizing on my data using TPM
methods like what explained https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
TPM is very similar to RPKM and FPKM. The only difference is the order of operations. Here’s how you calculate TPM:
- Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
- Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
- Divide the RPK values by the “per million” scaling factor. This gives you TPM.
I used the below codes but I do not know why the output is not correct?
CODE:
RPK<- data.matrix(Data [-1] / Data$Length.Kbp)
TPM <- t(t(RPK)*1e6 / colSums(RPK))
Data:
Length.Kbp FB_1 FB_2 FB_3
1:15040-15500 0.46 0 4 0
1:108570-109500 0.93 1 5 0
1:248240-249110 0.87 2 1 1
RPK:
FB_1 FB_2 FB_3
1:15040-15500 0 8.695652 0
1:108570-109500 1.075269 5.376344 0
1:248240-249110 2.298851 1.149425 1.149425
TPM:
FB_1 FB_2 FB_3
1:15040-15500 0 2577162.0 0
1:108570-109500 70641.81 353209.1 0
1:248240-249110 2000000.00 1000000.0 1000000.0
while for the first row (related value to FB_2) should be like :
8.695652 * 1000000 / 15.221422 =571277.2
Did you try storing
colSums2(RPK)
in a vector and verifying a few values in it to ensure you're dividing by the right value? There is something odd about the third row - it seems to be exactly1e6 x original_counts
.Also, your datasets don't conform to the code. If
RPK <- data.matrix(Data / Data$Length.Kbp)
is exactly what was run, then RPK would also have a column titledLength.Kbp
with all values = 1. Did you remove that column?Thanks for your reply! Yes, I have removed it and Edited the cod now.
In my cod I just used transpose :
TPM <- t(t(RPK)*1e6 / colSums(RPK))
and it looks work. but I don`t know what exactly happens after two times transposing?
Are you sure you should be using colSums and not rowSums? You're dividing transposed-RPK by per-sample RPK sums, not per-region RPK sums. Try using rowSums instead.
I want to divide RPK per-sample RPK based on the below explanation:
1) Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
2) Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
3) Divide the RPK values by the “per million” scaling factor. This gives you TPM.
Please read those three statements and interpret them to get to the denominator you need to use. I can help you with specific questions, but I will not read English and translate it to reproducible code for you - you should be able to do that on your own.
I have edited your post and updated the TPM object with the formula above. Going forward, please give us the exact code you use - it is impossible to help you when you withhold critical information.