Entering edit mode
10.0 years ago
chengzhao41
▴
110
Should I log-transform microarray gene expression data before applying PCA? What kind of assumption does PCA make about the data in terms of distribution? Does it work best when normally distributed?
Right now my data looks as follows (after scaling to zero mean and unit variance):
range(scale(data$cell_line))
[1] -14.35770 16.15416
PCA is not a hypothesis or statistical test; it doesn't make assumptions of your data and will produce results no matter what you put in. Whether those results are what you want is another question. In fact, for well behaved data, scaling and shifting does not impact the PCA (but log is nonlinear and will have an impact)
Since your scaled data looks symmetrical, perhaps the data is already log-scale and you do not need any more transformation. It does however contain some outliers, capping those to +- 6 could give more well behaved data.