Questions tagged [k-means]

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (least squares).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3112 questions
445
votes
8 answers

Cluster analysis in R: determine the optimal number of clusters

Being a newbie in R, I'm not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be appropriate? How can I perform cluster dendro analysis? n = 1000 kk = 10 …
user2153893
  • 4,487
  • 3
  • 11
  • 5
185
votes
8 answers

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
bmasc
  • 1,980
  • 2
  • 12
  • 9
145
votes
20 answers

How do I determine k when using k-means clustering?

I've been studying about k-means clustering, and one thing that's not clear is how you choose the value of k. Is it just a matter of trial and error, or is there more to it?
Jason Baker
  • 171,942
  • 122
  • 354
  • 501
76
votes
2 answers

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in tensroflow : http://learningtensorflow.com/lesson6/ and scikit-learn : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to…
blue-sky
  • 45,835
  • 124
  • 360
  • 647
54
votes
16 answers

K-means algorithm variation with equal cluster size

I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce equally sized groups. Is there a variation of this…
pixelistik
  • 6,797
  • 2
  • 28
  • 39
48
votes
8 answers

Python k-means algorithm

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.
Eeyore
  • 2,024
  • 5
  • 32
  • 49
45
votes
3 answers

Scikit Learn - K-Means - Elbow - criterion

Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with…
Linda
  • 2,195
  • 4
  • 23
  • 32
41
votes
3 answers

Simple approach to assigning clusters for new data after k-means clustering

I'm running k-means clustering on a data frame df1, and I'm looking for a simple approach to computing the closest cluster center for each observation in a new data frame df2 (with the same variable names). Think of df1 as the training set and df2…
josliber
  • 41,865
  • 12
  • 88
  • 126
39
votes
2 answers

Calculating the percentage of variance measure for k-means?

On the Wikipedia page, an elbow method is described for determining the number of clusters in k-means. The built-in method of scipy provides an implementation but I am not sure I understand how the distortion as they call it, is calculated. More…
Legend
  • 104,480
  • 109
  • 255
  • 385
39
votes
3 answers

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm. Is the probability function used based…
38
votes
7 answers

Kmeans without knowing the number of clusters?

I am attempting to apply k-means on a set of high-dimensional data points (about 50 dimensions) and was wondering if there are any implementations that find the optimal number of clusters. I remember reading somewhere that the way an algorithm…
Legend
  • 104,480
  • 109
  • 255
  • 385
38
votes
4 answers

kmeans: Quick-TRANSfer stage steps exceeded maximum

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20). I get the following error: Quick-TRANSfer stage steps exceeded maximum…
Anna Dunietz
  • 735
  • 1
  • 7
  • 18
32
votes
1 answer

Cluster one-dimensional data optimally?

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works? Or: what is the most optimal way to do k-means clustering in one-dimension?
Laciel
  • 347
  • 1
  • 3
  • 6
31
votes
6 answers

How to get the samples in each cluster?

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it? Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in…
user77005
  • 1,271
  • 4
  • 14
  • 24
31
votes
2 answers

Will pandas dataframe object work with sklearn kmeans clustering?

dataset is pandas dataframe. This is sklearn.cluster.KMeans km = KMeans(n_clusters = n_Clusters) km.fit(dataset) prediction = km.predict(dataset) This is how I decide which entity belongs to which cluster: for i in range(len(prediction)): …
Dark Knight
  • 699
  • 1
  • 7
  • 17
1
2 3
99 100