I am trying to replicate some computational experiments I found in a paper. In this paper, the authors have ~30 genechip human genome u133 plus 2.0 arrays, 1 for each experimental sample and no references. They process the .CEL files into log2 RMA normalized signal intensity files. They then create a dendrogram that demonstrates that there are 2 main phenotypes within these 30 samples, based on gene expression.
I am trying to replicate their work, but I'm not sure how they went from the log2 RMA normalized signal intensity files to a clustered dendrogram. There explanation is, "Hierarchical clustering was performed using Euclidean distance and a complete linkage metric."
I've reached out to the authors, but this paper is nearly 5 years old, no I may not get a response. Does anybody know how this can be done?
Will try, thanks!
So I tried using these commands with my matrix. The full matrix tracks 56,000 genes, and R crashes, stating,
Error: cannot allocate vector of size 544.4 Gb.
I tried just using a subset of 100 genes, and the command executed, so I have ahie_clust
object. However, when I plot this, I get a dendrogram that clusters the individual genes rather than the samples. How can I fix this? Also, is there a way to get a text list of the clustering rather than a plot? Thanks for your help, I'm not very good with R!dist()
compute the distance between the rows of your matrix so you can just transposeyour_matrix
usingt(your_matrix)
hie_clust
is an object with the clustering information if you typehie_clust$
you can access the ordering, the height etc.You can perform different operations on the hclust object, like cutting it into a k number of clusters Example:
cutree(hie_clust,k = 10)
Thanks for the reply Gian. I've tried transposing my matrix, but for some reason the terminal dendrogram branches still do not represent samples (there are far more of them than input samples)
Solved my program, I was accidentally calling
as.matrix
on a matrix. It works great now, thanks!