Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
Wikipedia has an entry on the topic here.

1079 questions

votes

2 answers

Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

I have a distance matrix n*n M where M_ij is the distance between object_i and object_j. So as expected, it takes the following form: / 0 M_01 M_02 ... M_0n\ | M_10 0 M_12 ... M_1n | | M_20 M_21 0 ... …

python scipy hierarchical-clustering

asked Sep 23 '13 at 05:56

Sibbs Gambling

16,478
33
87
161

votes

1 answer

Tutorial for scipy.cluster.hierarchy

I'm trying to understand how to manipulate a hierarchy cluster but the documentation is too ... technical?... and I can't understand how it works. Is there any tutorial that can help me to start with, explaining step by step some simple…

python scipy hierarchical-clustering

asked Feb 07 '14 at 21:32

user2988577

2,825
5
19
21

votes

4 answers

Text clustering with Levenshtein distances

I have a set (2k - 4k) of small strings (3-6 characters) and I want to cluster them. Since I use strings, previous answers on How does clustering (especially String clustering) work?, informed me that Levenshtein distance is good to be used as a…

r matlab cluster-analysis levenshtein-distance hierarchical-clustering

asked Feb 02 '14 at 14:38

Alexandros

2,020
4
23
49

votes

1 answer

differences in heatmap/clustering defaults in R (heatplot versus heatmap.2)?

I'm comparing two ways of creating heatmaps with dendrograms in R, one with made4's heatplot and one with gplots of heatmap.2. The appropriate results depend on the analysis but I'm trying to understand why the defaults are so different, and how to…

r cluster-analysis heatmap hierarchical-clustering bioconductor

asked Jul 29 '13 at 13:02

user248237

votes

2 answers

Extracting clusters from seaborn clustermap

I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results). However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link,…

python cluster-analysis hierarchical-clustering seaborn dendrogram

asked Jan 13 '15 at 14:48

sedavidw

9,023
12
44
76

votes

2 answers

Hierarchical clustering of 1 million objects

Can anyone point me to a hierarchical clustering tool (preferable in python) that can cluster ~1 Million objects? I have tried hcluster and also Orange. hcluster had trouble with 18k objects. Orange was able to cluster 18k objects in seconds, but…

python machine-learning cluster-analysis data-mining hierarchical-clustering

asked Feb 06 '12 at 07:40

Atish Kathpal

votes

1 answer

How to give sns.clustermap a precomputed distance matrix?

Usually when I do dendrograms and heatmaps, I use a distance matrix and do a bunch of SciPy stuff. I want to try out Seaborn but Seaborn wants my data in rectangular form (rows=samples, cols=attributes, not a distance matrix)? I essentially want…

python matplotlib heatmap seaborn hierarchical-clustering

asked Aug 01 '16 at 18:05

O.rka

24,289
52
152
253

votes

3 answers

Clustering words based on Distance Matrix

My objective is to cluster words based on how similar they are with respect to a corpus of text documents. I have computed Jaccard Similarity between every pair of words. In other words, I have a sparse distance matrix available with me. Can anyone…

python cluster-computing scikit-learn hierarchical-clustering

asked Apr 26 '13 at 22:19

user2115183

votes

1 answer

how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy

I had a confusion regarding this module (scipy.cluster.hierarchy) ... and still have some ! For example we have the following dendrogram: My question is how can I extract the coloured subtrees (each one represent a cluster) in a nice format, say…

python python-2.7 numpy scipy hierarchical-clustering

asked Jun 02 '13 at 13:55

titan

votes

5 answers

Distributed hierarchical clustering

Are there any algorithms that can help with hierarchical clustering? Google's map-reduce has only an example of k-clustering. In case of hierarchical clustering, I'm not sure how it's possible to divide the work between nodes. Other resource that I…

algorithm cluster-analysis hierarchical-clustering

asked Sep 17 '08 at 16:00

Roman

12,757
2
44
63

votes

3 answers

How to specify a distance function for clustering?

I'd like to cluster points given to a custom distance and strangely, it seems that neither scipy nor sklearn clustering methods allow the specification of a distance function. For instance, in sklearn.cluster.AgglomerativeClustering, the only thing…

python scipy scikit-learn hierarchical-clustering

asked Nov 15 '15 at 16:27

Mark Morrisson

1,950
3
16
21

votes

1 answer

How to adjust branch lengths of dendrogram in matplotlib (like in astrodendro)? [Python]

Here is my resulting plot below but I would like it to look like the truncated dendrograms in astrodendro such as this: There is also a really cool looking dendrogram from this paper that I would like to recreate in matplotlib. Below is the code…

python matplotlib plot hierarchical-clustering dendrogram

asked Jun 13 '18 at 21:35

O.rka

24,289
52
152
253

votes

3 answers

Tag hierarchies and handling of

This is a real issue that applies on tagging items in general (and yes, this applies to StackOverflow too, and no, it is not a question about StackOverflow). The whole tagging issue helps cluster similar items, whatever items they may be (jokes,…

tags tagging hierarchical-clustering

asked Sep 23 '08 at 12:23

tzot

81,264
25
129
197

votes

4 answers

How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

Using the code posted here, I created a nice hierarchical clustering: Let's say the the dendrogram on the left was created by doing something like Y = sch.linkage(D, method='average') # D is a distance matrix cutoff = 0.5*max(Y[:,2]) Z =…

python cluster-analysis scipy hierarchical hierarchical-clustering

asked Oct 05 '11 at 16:51

conradlee

10,743
15
46
81

votes

2 answers

Implementing an efficient graph data structure for maintaining cluster distances in the Rank-Order Clustering algorithm

I'm trying to implement the Rank-Order Clustering here is a link to the paper (which is a kind of agglomerative clustering) algorithm from scratch. I have read through the paper (many times) and I have an implementation that is working although it…

python algorithm hierarchical-clustering

asked Apr 18 '17 at 01:19

YellowPillow

3,000
3
25
50

2 3

…

71 72 Next