I have been doing a bit of reading about Principle Component Analysis (PCA) and have been using R to carry out PCA on some microarray data and have started with background subtracted raw fluorescent values.
I can run PCA on any of the following datasets, each one giving different plots:
1) Unnormalised data, log transformed data 2) Normalised data 3) Normalised, Log transformed data
Am I right in saying that depending on what question I am asking, I should run the PCA on 1) or 3)
If I wanted to see if samples clustered specifically based on different microarray chips for example I would have to do it on 1) otherwise normalisation would mask this effect?
Once I have done this and ascertained that there is not specific clustering on PCA based on microarrays I can run another PCA on Normalised (for example quantile normalised) data to see if my samples cluster by treatment for example?
And it is my understanding that with this type of data it should really always be log transformed to stop the PC's being weighted based on huge differences in fluorescence values?
Also I could not find a consensus on whether to scale the data or not. I presume as all the data is on the same scale here that scaling the data should have little effect on the outcome?
Thank you