Question

What type of normalization did they use in this article?

2

Entering edit mode

18 months ago

JACKY ▴ 160

I've read this amazing article, yet I'm struggling to understand how did they normalize the bulk RNA-seq data. I've downloaded the data from the supplementary information, and the values of all the genes are around 30, so this can't be log2 cause the values are too high, and they are too uniform to be a TPM normalization.

I've read in the article that it is log2(TPM + 1), but the RNA-seq data I've downloaded says otherwise...

So, what type of normalization is it? it's very important for me to understand this. Thank you!

normalization TPM r • 802 views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 18 months ago by JACKY ▴ 160

score 0 · Answer 1 · 2023-05-31

Transcriptomic analysis

[...] We subsequently filtered genes that were not expressed in any of the samples (in each cohort independently) then upper quartile-normalized the TPMs to an upper quartile of 1000, and log2-transformed them. Since the sequencing had been performed in 4 separate batches, principal component analysis (PCA) was used to evaluate for batch effects and 4 batches were observed. These 4 batches were corrected for using ComBat. Subsequently, a PCA was performed on the ComBat-corrected expression matrix to confirm that batch effects had been adequately corrected for. Moreover, a constant that was equal to the first integer above the minimum negative expression value obtained post-ComBat (constant of +21) to eliminate negative gene expression values that were a by-product of ComBat correction. The ComBat-corrected expression matrix was used for all downstream analyses.

Seems they added +21 to all TPM values to get rid of negatives that emerged from batch correction?