Hi guys,
A (hopefully) quite straightforward question:
What are the different implications of log2 transforming variables before or after performing normalisation (for example quantile norm) on a dataset?
I have in mind microarray gene expression data but I guess the question would stand for any type of data as well.
I found contradictory sources aorund, from norm functions (in R packages) that even expect log2 data input to people stating log2 transformation MUST be done after normalisation, I would like to understand the implications better.
Thanks,
Well it depends on the normalization, if it's quantile normalization it should not make a difference if you log transform before or after, provided you don't have negative numbers.
See Normalisation before log2 transformation or after in Microarray Gene expression data?
I had already read that, but I found it more prescriptive than descriptive, I was interested more in the why, not in the how :)
Given the diversity of microarray designs and detection systems for each, I'm not surprised that you have come across seemingly contradictory material online.
As an example, for two-colour arrays, the 'raw' signal intensities are log (base 2) ratios between the cDNA in the test and reference samples - these are then further normalised and kept on the log (base 2) scale. Agilent produces most if not all of these two-colour arrays, I believe.
For the Affymetrix and Illumina arrays, the raw data is just fluorescent signal intensity from whatever detection system that they are using, so, it's not yet logged.