Maths implications: log2 transformation before or after normalisation
1
1
Entering edit mode
5.1 years ago
gablenord ▴ 10

Hi guys,

A (hopefully) quite straightforward question:

What are the different implications of log2 transforming variables before or after performing normalisation (for example quantile norm) on a dataset?

I have in mind microarray gene expression data but I guess the question would stand for any type of data as well.

I found contradictory sources aorund, from norm functions (in R packages) that even expect log2 data input to people stating log2 transformation MUST be done after normalisation, I would like to understand the implications better.

Thanks,

log2 normalisation gene-expression • 8.2k views
ADD COMMENT
1
Entering edit mode

Well it depends on the normalization, if it's quantile normalization it should not make a difference if you log transform before or after, provided you don't have negative numbers.

ADD REPLY
0
Entering edit mode

I had already read that, but I found it more prescriptive than descriptive, I was interested more in the why, not in the how :)

ADD REPLY
1
Entering edit mode

Given the diversity of microarray designs and detection systems for each, I'm not surprised that you have come across seemingly contradictory material online.

As an example, for two-colour arrays, the 'raw' signal intensities are log (base 2) ratios between the cDNA in the test and reference samples - these are then further normalised and kept on the log (base 2) scale. Agilent produces most if not all of these two-colour arrays, I believe.

For the Affymetrix and Illumina arrays, the raw data is just fluorescent signal intensity from whatever detection system that they are using, so, it's not yet logged.

ADD REPLY
5
Entering edit mode
5.1 years ago
jomo018 ▴ 730

Log2 is monotonic but a non-linear transformation. The ratios between elements in a sample are not kept. Once performed, a downstream linear operation such as depth normalization is less appropriate. In such cases, it makes more sense to begin with the linear operations and end with non-linear ones. Quantile normalization is also non-linear. In this case, both workflow arrangements are reasonable noting that right from start, ranking is kept but not the original ratios.

ADD COMMENT
0
Entering edit mode

Thank you Jomo, that's what I wanted to know!

ADD REPLY
0
Entering edit mode

I moved this to an answer. Just to 'sure it up', the two approaches do produce different end results:

mat <- matrix(c(5,2,3,4,4,1,4,2,3,4,6,8), ncol = 3)

log2(preprocessCore::normalize.quantiles(mat))
         [,1]     [,2]     [,3]
[1,] 2.502500 2.369234 1.000000
[2,] 1.000000 1.000000 1.584963
[3,] 1.584963 2.369234 2.222392
[4,] 2.222392 1.584963 2.502500

preprocessCore::normalize.quantiles(log2(mat))
          [,1]      [,2]      [,3]
[1,] 2.4406427 2.3178151 0.8616542
[2,] 0.8616542 0.8616542 1.5283208
[3,] 1.5283208 2.3178151 2.1949875
[4,] 2.1949875 1.5283208 2.4406427
ADD REPLY
0
Entering edit mode

Thanks Kevin. I did some tests myself on my data and I saw that, while the actual values are indeed slightly different as you showed, the results coming from downstream analysis (unsupervised clustering, PCA, paired t-tests,...) are basically totally comparable, if not the same. I was therefore interested in understanding if, apart from the specific empirical experience, there was a "mathematical" reason why one way should be preferred over the other and why.

From @jomo018 answer, I got that the choice should be based (mostly?) on the linear/non-linear nature of the transformation methods involved, so that there would be no "wrongs" in going either way in case of log2 and quantile normalisation (being both non-linear), while there would be a problem in case of other linear normalisations for example.

Did I get it right?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6