Question

fviz_nbclust (kmeans) with method "gap_stat" error: did not converge in 10 iterations

0

Entering edit mode

6.5 years ago

lessismore ★ 1.4k

Dear all,

im trying to find the optimum number of clusters to fit to a gene expression dataset.

For this, Im using the packages FactoMineR and factoextra and the function fviz_nbclust on my scaled dataframe (simple dataframe with genes in rows and samples in columns).

It scales (z-scoring) by column so im transposing first and then scaling. Then i retranspose and calculate the optimal number of clusters.

The problem is that i get a Warning message " did not converge in 10 iterations ".
The question is, do you know a way to modify the number of iteractions?

This is the code im using

df <- scale (t(mydata))
df <- t(df)
fviz_nbclust(df, kmeans, method = "gap_stat")
fit <- kmeans(df, ?) 
mydata2 <- data.frame(df, fit$cluster)

?: this value is dictated by the clusters prediction

Thanks in advance

Clustering kmeans • 12k views

ADD COMMENT • link updated 3.2 years ago by hse.kate112 • 0 • written 6.5 years ago by lessismore ★ 1.4k

score 2 · Answer 1 · 2018-06-14

2

Entering edit mode

6.5 years ago

Kevin Blighe 88k

Buenos dias amigo,

Yes, you can create a custom kmeans function and then supply that to fviz_nbclust(). In the custom function, you increase the iter.max parameter from the default of 10 to something higher, like (here) 50:

MyKmeansFUN <- function(x,k) list(cluster=kmeans(x, k, iter.max=50))

fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")

Note that I also have parallel processing enabled Gap Statistic functions on my GitHub page:

Please try that.

Kevin

ADD COMMENT • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin, thanks a lot for your help. Do you have any idea about what this means?

> fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")
    Clustering k = 1,2,..., K.max (= 10): .. done
    Bootstrapping, b = 1,2,..., B (= 100)  [one "." per sample]:
    .................................................. 50 
    .................................................. 100 
    There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
2: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
3: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
4: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
.....
50: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length

ADD REPLY • link 6.5 years ago by lessismore ★ 1.4k

0

Entering edit mode

The error occurs when you attempt to perform operations on objects of unequal dimensions. For example:

c(1, 2, 3, 4) * c(10, 10, 10)
[1] 10 20 30 40
Warning message:
In c(1, 2, 3, 4) * c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

c(1, 2, 3, 4) + c(10, 10, 10)
[1] 11 12 13 14
Warning message:
In c(1, 2, 3, 4) + c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

What are is the dimension size of your input data? You're using k.max=10 and B=100?

ADD REPLY • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

> dim(df)
[1] 2068   25

Im using the default parameters:

fviz_nbclust(df, FUNcluster = MyKmeansFUN, method = "gap_stat", diss = NULL, k.max = 10, nboot = 100,
  verbose = interactive(), barfill = "steelblue", barcolor = "steelblue",
  linecolor = "steelblue", print.summary = TRUE, ...)

ADD REPLY • link 6.5 years ago by lessismore ★ 1.4k

0

Entering edit mode

Hi, Kevin!

I've got exactly the same question (but for clusGap function). I tried your solution with custom kmeans and it produces the same warnings. I have a large matrix as input 20k*300 and even 15 clusters is a lot; so without more than 10 iterations it is unlikely to converge. Could you please help to solve the problem? The code is below. Thanks a lot!

MyKmeans <- function(x,k) list(cluster=kmeans(x, k, iter.max=300,nstart = 25))

gap_stat <- clusGap(w2v300_emb, FUN = MyKmeans, K.max = 15, B = 50)

ADD REPLY • link 3.2 years ago by hse.kate112 • 0