Hi,
I am using the R package ConsensusClusterPlus. Here is an example with the ALL data:
library(ConsensusClusterPlus)
library(ALL)
data(ALL)
d = exprs(ALL)
res <- ConsensusClusterPlus(d,
clusterAlg = "pam",
finalLinkage = "average",
distance = "spearman",
plot = NULL,
reps = 1000,
maxK = 10,
pItem = 0.8,
pFeature = 1,
seed = 100)
So if I want to get information on the cluster membership for each sample when k = 5
, I would get it by using:
cluster5 <- res[[5]]
> head(cluster5$consensusClass, n = 10)
01005 01010 03002 04006 04007 04008 04010 04016 06002 08001
1 2 1 2 1 1 2 1 1 3
My question is: how do I extract the most contributing features (or genes in this case) in each cluster?
Since you are clustering patients/samples using expression values, my best guess would be to separate patients based on cluster membership, e.g. For cluster 1, get a matrix of patients that are only associated with cluster 1 and compare the gene expression between other clusters. You can use something like a Wilcox test. Sort results based on fold-change or P-values.
Hi, It is not an answer, but would like to know whether you have found a way to extracting the most contributing features for each cluster? I am also stuck at this point.