38

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20).

I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400), and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone could clarify once and for all what I can do to mitigate the issue.

tonytonov
  • 22,820
  • 16
  • 72
  • 92
Anna Dunietz
  • 735
  • 1
  • 7
  • 18
  • 2
    I think it's more likely to do with the number of centers. Really? 100 clusters? Did you try a different algorithm, as in: `kmeans(dataset, algorithm="Lloyd", ...)`? That error message seems specific to the default algorithm, `Hartigan-Wong`. – jlhoward Jan 27 '14 at 14:04
  • @jlhoward - thanks! I then did try Lloyd and got no errors, although I really would prefer using Hartigan-Wong. – Anna Dunietz Jan 27 '14 at 18:23
  • Note, the actual error flag is from here: http://svn.r-project.org/R/trunk/src/library/stats/src/kmns.f (search `IFAULT = 4`). Still doesn't really explain what it means. – naught101 Sep 15 '14 at 23:50

4 Answers4

29

I just had the same issue.

See the documentation of kmeans in R via ?kmeans:

The Hartigan-Wong algorithm generally does a better job than either of those, but trying several random starts (‘nstart’> 1) is often recommended. In rare cases, when some of the points (rows of ‘x’) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ‘ifault = 4’). Slight rounding of the data may be advisable in that case.

In these cases, you may need to switch to the Lloyd or MacQueen algorithms.

The nasty thing about R here is that it continues with a warning that may go unnoticed. For my benchmark purposes, I consider this to be a failed run, and thus I use:

if (kms$ifault==4) { stop("Failed in Quick-Transfer"); }

Depending on your use case, you may want to do something like

if (kms$ifault==4) { kms = kmeans(X, kms$centers, algorithm="MacQueen"); }

instead, to continue with a different algorithm.

If you are benchmarking K-means, note that R uses iter.max=10 per default. It may take much more than 10 iterations to converge.

Erich Schubert
  • 8,318
  • 2
  • 22
  • 41
13

Had the same problem, seems to have something to do with available memory.

Running Garbage Collection before the function worked for me:

gc()

or reference:

Increasing (or decreasing) the memory available to R processes

Community
  • 1
  • 1
ABarnhard
  • 131
  • 1
  • 4
2

@jlhoward's comment:

Try

kmeans(dataset, algorithm="Lloyd", ..)
dfrankow
  • 16,533
  • 35
  • 121
  • 177
0

I got the same error message, but in my case it helped to increase the number of iterations iter.max. That contradicts the theory of memory overload.

Jørgen K. Kanters
  • 684
  • 1
  • 10
  • 21