I've read this amazing article, yet I'm struggling to understand how did they normalize the bulk RNA-seq data. I've downloaded the data from the supplementary information, and the values of all the genes are around 30, so this can't be log2 cause the values are too high, and they are too uniform to be a TPM normalization.
I've read in the article that it is log2(TPM + 1), but the RNA-seq data I've downloaded says otherwise...
So, what type of normalization is it? it's very important for me to understand this. Thank you!
I see, do you think there is any way to retrieve the uncorrected TPM data? can I undo the corrections and scalings applied to the data?
I would contact the corresponding author in that case. But unless you only want to use parts of the data, the applied Batch correction is obviously needed.