Entering edit mode
5.7 years ago
BlastedBadger
▴
160
I created a gene distance matrix, based on GO terms similarity (using the R package GOSemSim
).
I am then basically doing the following, where gene_dist
is my distance matrix:
plot(hclust(gene_dist, method="average"))
I don't understand why the displayed tree from plot.hclust
is not ultrametric, even though "average" agglomeration is UPGMA. The similarities from GOSemSim::mgeneSim
(measure "Wang", combine "Best Matching Average") are likely not producing euclidean distances, but that shouldn't matter.
Also, if I use the library ape
for plotting:
library(ape)
plot(as.phylo(hclust(gene_dist, method="average")))
It shows as ultrametric. Why? Is it just a display choice?
Here is some reproducible example code:
library(GOSemSim)
examplelist <- c("ZNF575", "GALNT11", "GJC3", "POLRMT", "PKDCC", "COL18A1", "INS-IGF2", "IQSEC1", "CFC1", "OPA3")
hsGOex <- godata('org.Hs.eg.db', ont='BP', computeIC=F, keytype='SYMBOL')
gene_sim_ex <- mgeneSim(genes=examplelist, semData=hsGOex, measure='Wang', combine='BMA')
isSymmetric(gene_sim_ex)
## [1] TRUE
gene_dist_ex <- as.dist(1 - gene_sim_ex)
labels(gene_dist_ex)
## [1] "GALNT11" "POLRMT" "PKDCC" "COL18A1" "IQSEC1" "CFC1" "OPA3"
plot(hclust(gene_dist_ex, method="average"))
library(ape)
dev.new()
plot(as.phylo(hclust(gene_dist_ex, method="average")))
axisPhylo()