Questions tagged [k-means]

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (least squares).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3112 questions

445

votes

8 answers

Cluster analysis in R: determine the optimal number of clusters

Being a newbie in R, I'm not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be appropriate? How can I perform cluster dendro analysis? n = 1000 kk = 10 …

r cluster-analysis k-means

asked Mar 13 '13 at 02:39

user2153893

4,487
3
11
5

185

votes

8 answers

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

python machine-learning cluster-analysis k-means scikit-learn

asked Apr 03 '11 at 12:39

bmasc

1,980
2
12
9

145

votes

20 answers

How do I determine k when using k-means clustering?

I've been studying about k-means clustering, and one thing that's not clear is how you choose the value of k. Is it just a matter of trial and error, or is there more to it?

cluster-analysis k-means

asked Nov 24 '09 at 22:58

Jason Baker

171,942
122
354
501

votes

2 answers

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in tensroflow : http://learningtensorflow.com/lesson6/ and scikit-learn : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to…

python tensorflow scikit-learn k-means neuraxle

asked Jan 10 '17 at 11:37

blue-sky

45,835
124
360
647

votes

16 answers

K-means algorithm variation with equal cluster size

I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce equally sized groups. Is there a variation of this…

algorithm map cluster-analysis k-means

asked Mar 27 '11 at 21:27

pixelistik

6,797
2
28
39

votes

8 answers

Python k-means algorithm

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

python algorithm cluster-analysis k-means

asked Oct 09 '09 at 19:16

Eeyore

2,024
5
32
49

votes

3 answers

Scikit Learn - K-Means - Elbow - criterion

Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with…

python machine-learning scikit-learn cluster-analysis k-means

asked Oct 05 '13 at 12:19

Linda

2,195
4
23
32

votes

3 answers

Simple approach to assigning clusters for new data after k-means clustering

I'm running k-means clustering on a data frame df1, and I'm looking for a simple approach to computing the closest cluster center for each observation in a new data frame df2 (with the same variable names). Think of df1 as the training set and df2…

r k-means

asked Dec 16 '13 at 21:27

josliber

41,865
12
88
126

votes

2 answers

Calculating the percentage of variance measure for k-means?

On the Wikipedia page, an elbow method is described for determining the number of clusters in k-means. The built-in method of scipy provides an implementation but I am not sure I understand how the distortion as they call it, is calculated. More…

python numpy statistics cluster-analysis k-means

asked Jul 11 '11 at 04:55

Legend

104,480
109
255
385

votes

3 answers

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm. Is the probability function used based…

algorithm language-agnostic machine-learning cluster-analysis k-means

asked Mar 28 '11 at 23:45

Anton Andreev

1,886
1
19
23

votes

7 answers

Kmeans without knowing the number of clusters?

I am attempting to apply k-means on a set of high-dimensional data points (about 50 dimensions) and was wondering if there are any implementations that find the optimal number of clusters. I remember reading somewhere that the way an algorithm…

python machine-learning data-mining k-means

asked Jul 07 '11 at 18:58

Legend

104,480
109
255
385

votes

4 answers

kmeans: Quick-TRANSfer stage steps exceeded maximum

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20). I get the following error: Quick-TRANSfer stage steps exceeded maximum…

r cluster-analysis k-means

asked Jan 27 '14 at 13:55

Anna Dunietz

votes

1 answer

Cluster one-dimensional data optimally?

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works? Or: what is the most optimal way to do k-means clustering in one-dimension?

r cluster-analysis k-means

asked Oct 23 '11 at 22:12

Laciel

votes

6 answers

How to get the samples in each cluster?

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it? Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in…

python scikit-learn cluster-analysis k-means

asked Mar 24 '16 at 07:56

user77005

1,271
4
14
24

votes

2 answers

Will pandas dataframe object work with sklearn kmeans clustering?

dataset is pandas dataframe. This is sklearn.cluster.KMeans km = KMeans(n_clusters = n_Clusters) km.fit(dataset) prediction = km.predict(dataset) This is how I decide which entity belongs to which cluster: for i in range(len(prediction)): …

python pandas scikit-learn cluster-analysis k-means

asked Jan 19 '15 at 02:17

Dark Knight

2 3

…

99 100 Next