Question

Denoising Of Microarray Data Using Pca

2

Entering edit mode

12.0 years ago

Leo ▴ 50

I'm looking for references and comments regarding the validaty of the following method for data denoising, which I found while reading a code doing analysis of some gene expression dataset. The dataset consists of columns x1, ..., xn of length m (expression levels for n genes observed in m samples). Someone with the knowledge of the dataset said that if we look at the top 10% columns with the maximum variance, we find that those columns have the maximum variance due to some artifacts (noise) in the measurement of expression level of the corresponding genes. In addition, we know that the rest 90% of columns are either not affected by any kind of noise, or are affected by some noise but to a much lesser degree than the top 10% columns.

Now, in the code that I'm examining the following method is used to remove the noise from the dataset. They calculate the principal components y1, ..., yn for variables x1, ..., xn. They took y1 (the leading principal components) and (this is my guess) assume that it mostly captures the variance caused by the artifacts described above. Then they transform the data (all n columns) using the following rule:

xi = xi - (projection of xi onto y1).

That is, from each column they remove the component that is collinear with y1, and keep the component that is orthogonal to y1.

Can anybody please provide any references for this method or comment on its applicability in this case?

pca microarray • 2.3k views

ADD COMMENT • link updated 11.3 years ago by Biostar 20 • written 12.0 years ago by Leo ▴ 50

1

Entering edit mode

You might want to look at approaches like ComBat and SVA to remove biases in a statistically-controlled way. Alternatively, you could model the "noise" as a covariate in a linear model. Knowledge of the experimental design will be important to know to what extent the "noise" would be expected to affect results.

ADD REPLY • link 12.0 years ago by Sean Davis 27k

0

Entering edit mode

I agree, look at SVA, it does basically what (I think) you're describing except on the residuals of the model that you specify. Some have shown ComBat or PEER to do a better job at batch removal, so you could have a look at those as well.

ADD REPLY • link 11.3 years ago by brentp 24k