I asked a question previously but I think the answers were not matched with my goal. And I completely confused!
I have a set of data from some microarray dataset that they utilize 2 kind of affymetrix platform for mouse. The data have nearly the same biological background and only one variable change among them. Moreover, For each study there is the same story, I mean they have different dataset and only one variable have altered. For example, in my data for one study, abc add to culture media, then they add abcd. For other microarray data set, they add abc, abcde and abcdef. After using limma package for each microarray dataset separately, I have extracted a small set of differentially expressed genes that I want. Then I compared them with hierarchical clustering (Euclidean for log transformed genes), unexpectedly dataset from one study cluster close together and another study fell into other cluster. Before clustering I assume that abc data from two different studies fell into one cluster but my hypothesis was wrong.
- So would this be because of using different affymetrix and the batch effect?
- Would combat or sva a good package for compensating batch effect?
- Or a clustering method would be wrong? How can I normalize gene expression data from different microarray studies to be comparable with each other?
You received an answer in your previous question: A: Problem in hierarchical clustering
Nowhere do I see you mentioning the exact array versions that you are using. Is the same array version used in each study?
Also, why did you choose hierarchical clustering? You simply merged the datasets together and then clustered the data together?
What is your ultimate goal?