fviz_nbclust (kmeans) with method "gap_stat" error: did not converge in 10 iterations
1
0
Entering edit mode
6.5 years ago
lessismore ★ 1.4k

Dear all,

im trying to find the optimum number of clusters to fit to a gene expression dataset.

For this, Im using the packages FactoMineR and factoextra and the function fviz_nbclust on my scaled dataframe (simple dataframe with genes in rows and samples in columns).

It scales (z-scoring) by column so im transposing first and then scaling. Then i retranspose and calculate the optimal number of clusters.

The problem is that i get a Warning message " did not converge in 10 iterations ".
The question is, do you know a way to modify the number of iteractions?

This is the code im using

df <- scale (t(mydata))
df <- t(df)
fviz_nbclust(df, kmeans, method = "gap_stat")
fit <- kmeans(df, ?) 
mydata2 <- data.frame(df, fit$cluster)

?: this value is dictated by the clusters prediction

Thanks in advance

Clustering kmeans • 12k views
ADD COMMENT
2
Entering edit mode
6.5 years ago

Buenos dias amigo,

Yes, you can create a custom kmeans function and then supply that to fviz_nbclust(). In the custom function, you increase the iter.max parameter from the default of 10 to something higher, like (here) 50:

MyKmeansFUN <- function(x,k) list(cluster=kmeans(x, k, iter.max=50))

fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")

Note that I also have parallel processing enabled Gap Statistic functions on my GitHub page:

Please try that.

Kevin

ADD COMMENT
0
Entering edit mode

Dear Kevin, thanks a lot for your help. Do you have any idea about what this means?

> fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")
    Clustering k = 1,2,..., K.max (= 10): .. done
    Bootstrapping, b = 1,2,..., B (= 100)  [one "." per sample]:
    .................................................. 50 
    .................................................. 100 
    There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
2: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
3: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
4: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
.....
50: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
ADD REPLY
0
Entering edit mode

The error occurs when you attempt to perform operations on objects of unequal dimensions. For example:

c(1, 2, 3, 4) * c(10, 10, 10)
[1] 10 20 30 40
Warning message:
In c(1, 2, 3, 4) * c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

c(1, 2, 3, 4) + c(10, 10, 10)
[1] 11 12 13 14
Warning message:
In c(1, 2, 3, 4) + c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

What are is the dimension size of your input data? You're using k.max=10 and B=100?

ADD REPLY
0
Entering edit mode
> dim(df)
[1] 2068   25

Im using the default parameters:

fviz_nbclust(df, FUNcluster = MyKmeansFUN, method = "gap_stat", diss = NULL, k.max = 10, nboot = 100,
  verbose = interactive(), barfill = "steelblue", barcolor = "steelblue",
  linecolor = "steelblue", print.summary = TRUE, ...)
ADD REPLY
0
Entering edit mode

Hi, Kevin!

I've got exactly the same question (but for clusGap function). I tried your solution with custom kmeans and it produces the same warnings. I have a large matrix as input 20k*300 and even 15 clusters is a lot; so without more than 10 iterations it is unlikely to converge. Could you please help to solve the problem? The code is below. Thanks a lot!

MyKmeans <- function(x,k) list(cluster=kmeans(x, k, iter.max=300,nstart = 25))

gap_stat <- clusGap(w2v300_emb, FUN = MyKmeans, K.max = 15, B = 50)

ADD REPLY

Login before adding your answer.

Traffic: 1448 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6