Entering edit mode
15 months ago
shakyaram079
•
0
How to convert data_mrna_seq_v2_rsem.txt
having rsem.genes.normalized_results file from LUSC (Lung Squamous cell carcinoma) TCGA data into TPM values?
Can you give us a link to the dataset please?
Hi, I have the same question. You can find the dataset in here.
I think those are simply log2-transformed TPM values. It's log2(TPM+x) where x is a pseudocount. I think x is either 1 or 0.001. In any case, just reverse-log2 transform it, and if you did it correctly, the sum of the recovered counts across all transcripts/genes within any given sample should be 10^6.
Thank you for your reply. But I'm not sure whether it was log2-transformed or not because the range of values is too large. For example, the maximum value in the LUSC TCGA data is 1,737,510 for the ADAM6 gene in the TCGA-85-A513-01 sample. Can you give me a description of it?
The URL you linked to only has "batch normalized" expression. Unfortunately, there is no way to undo batch normalization to recover TPM values.