Hi, I am trying to combine several microarray dataset downloaded from GEO, all made by the same technology (GPL96) and normalized with the same algorithm (RMA). I thought all of these similarities between them make them statistically comparable but it seems I was wrong.
A simple hierarchical clustering based on Euclidean distance shows that instances of each dataset are cluster together!
I read about algorithms like DWD (Distance Weighted Discrimination) method for combining datasets but still I have a hard time using it mainly because it doesn't have an R implementation.
Any suggestions here?
Thanks in advance
--Saman
I think, you should not use euclidean distance in this case. Pearson based distances would be a better choice.
I am not quite sure what do you mean?! Not using Euclidean distance for what?!
For the clustering. You are using Euclidian distance for the clustering, but there are other possible choices to measure the distance between two profiles. See wikipedia "euclidian distance" for more details.
Thanks both of you, I already forgot my post!! So you mean that if I use Pearson correlation for distance then I wouldn't see that effect?! I can check that. I will let you know whether this makes a different or not.
Can i take some cel files for disease1 from experiment1 and some cel files for the same disease1 from experiment2 and similarly ,,taking raw data and then normalizing together ,is it a good idea ?
Hi Saman
I am keen to combine multiple GEO datasets (all run on Affymetrix U133 plus 2.0) and came across your thread. I was wondering what approach you ended up using in order to combine your datasets? I would appreciate any help.
Thanks