0

I see that the WEKA interface requires a minimum and maximum number of clusters to be specified before running the X-means clustering algorithm. What is a good way to determine these numbers? Isn't X-means supposed to take away the burden of choosing the number of clusters?

DaTaBomB
  • 593
  • 2
  • 9
  • 23

1 Answers1

1

You can use any background knowledge you have on the data to set the minimum and maximum number of clusters. XMeans takes some of the burden from you as it doesn't require you to specify a number of clusters but only bounds on the number of clusters. If you have no background knowledge, you could set them to really low and really high values.

For example, if you want to cluster questions on stackoverflow and you know the tags assigned to each, you could derive bounds from the total number of tags, tags per question, etc.

The answers to this question may help. In general, you'll have to experiment with different values and see which produce the result you like most.

Lars Kotthoff
  • 101,128
  • 13
  • 187
  • 191