Unexpected sample clustering after microarray normalization
0
0
Entering edit mode
8.3 years ago
lvaremo • 0

I am analyzing a dataset of ~1000 Illumina microarrays for a human population. There are no defined subgroups as the data is from a healthy "normal" population. Nevertheless, after performing quantile normalization (using the normalizeBetweenArrays function of the limma R package) on the log2-transformed data, a PCA plot reveals two very distinct clusters (the smaller consisting of ~130 samples).

The only pattern that I have found is that it seems like arrays with high raw expression (high average signal in the raw data) are overrepresented in this smaller cluster. However, they are not unique to that cluster, so not completely explaining the distinct separation.

I have failed to find any other explanation of this separation (which by the way is not visible for the raw data). There is no connection to age or gender, no genes in particular driving the separation (determined by inspecting PCA loadings), no potential influence from highly or lowly expressed genes or genes with high or low variance.

It could perhaps be some sort of batch effect, however we do not have access to information such as experiment date, operator etc.

Is there anyone who has encountered a similar situation or has any suggestions for other things to check, that could explain the unexpected separation.

Thanks!

microarray normalization PCA clustering • 1.8k views
ADD COMMENT
0
Entering edit mode

I suspect you are seeing a batch effect. It is not uncommon for differences in batches to explain more of the variance in your data than the biological effects of interest. The fact that there are also differences in the total expression levels differentiating the two groups also supports that idea.

ADD REPLY

Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6