Should I log-transform microarray gene expression data before applying PCA?
1
3
Entering edit mode
10.0 years ago
chengzhao41 ▴ 110

Should I log-transform microarray gene expression data before applying PCA? What kind of assumption does PCA make about the data in terms of distribution? Does it work best when normally distributed?

Right now my data looks as follows (after scaling to zero mean and unit variance):

range(scale(data$cell_line))
[1] -14.35770  16.15416
PCA microarray-gene-expression • 6.9k views
ADD COMMENT
0
Entering edit mode

PCA is not a hypothesis or statistical test; it doesn't make assumptions of your data and will produce results no matter what you put in. Whether those results are what you want is another question. In fact, for well behaved data, scaling and shifting does not impact the PCA (but log is nonlinear and will have an impact)

Since your scaled data looks symmetrical, perhaps the data is already log-scale and you do not need any more transformation. It does however contain some outliers, capping those to +- 6 could give more well behaved data.

ADD REPLY
0
Entering edit mode
10.0 years ago
Ann ★ 2.4k

Rule of thumb: Put data on the same scale before applying PCA.

See here and here.

Also, Jeff Leek posted this easy-to-follow, short lecture on PCA from Week 3 of his Coursera class.

ADD COMMENT

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6