Hi guys I have a question about microarray gene expression normalization techniques and clustering. I have a gene expression microarray matrix of around 13.000 genes (the rows) and 200 samples (the columns). I normalized the matrix using RMA (that gives the values in log2 scale) and then I clusterized it (the samples and the genes) using the pearson correlation and "average linkage" for HCL. The genes and the samples clusterize very well! If I repeat the normalization but now using MAS5 (and then I log2 transform the data) and again if I clusterized using the same criteria as above, the genes and the samples do not cluster anymore!!!!! I tried to center the genes and the samples, that is for each row (gene) I subtracted the median value across the samples both after RMA and after Mas5 normalization but again the genes and the smples clusterize very well using RMA but not using Mas5. Then, for each gene (row) I computed the median across all samples and after RMA normalization the distribution of the median of the genes across the samples is Normal (as from Shapiro test) while after Mas5 it is not Normal. Can this aspect affect the quality of the clustering? Why this great difference using the two methods?
What do you mean by cluster well? Are you getting more clusters with one versus the other? Are you getting better cluster densities? Are you getting better cluster separation? Do the clusters make more sense biologically?
Hi Damian! The genes and samples group well together. In other words with RMA I get better cluster separation!