1

I'm using mclust::Mclust() function to cluster a small dataset. However, I'm struggling with extracting the clustering classification for each data to put into the dataset.

Here is the data:

df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406, 
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125, 
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484, 
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863, 
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434, 
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041, 
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719, 
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156, 
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934, 
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445, 
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418, 
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652, 
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707, 
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211, 
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578, 
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109, 
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523, 
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211, 
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133, 
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477, 
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531, 
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922, 
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039, 
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875, 
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043, 
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742, 
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422, 
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211, 
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA, 
-58L), class = c("tbl_df", "tbl", "data.frame"))

Clustering:

d_clust <- Mclust(df)

Now, when I run plot(d_clust) it shows all the graphs and everything. But it doesn't show me which cluster corresponds to each row. I have looked into the documentation and others (1, 2, 3) and also the stackoverflow questions related to Mclust() (1, 2) doesn't fulfill my question.

I'm looking for something like this:

| latitud | longitud | cluster_id |

By the way, when I do class(d_clust) is a Mclust class. How is it possible to plot d_clust when if you run d_clust alone it doesn't give you a table/dataframe to plot?

Chris
  • 1,502
  • 3
  • 15
  • 35

1 Answers1

0

when you run Mclust, it tries different models and different values of G (number of clusters). So do check out the BIC plot:

enter image description here

Because Mclust will only choose the best model based on BIC, and will keep this as d_clust$modelName and d_clus$G.

Once you know what model is used (I think its EVE and G=4 for your case), the classification then makes sense, and you can take it out simply using:

d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
   latitud longitud cluster
1 -43.8189 -72.3833       1
2 -34.2731 -71.3717       2
3 -47.0667 -72.8000       1
4 -35.7544 -71.0864       3
5 -47.1414 -72.7258       1
6 -36.6261 -72.4892       3

You can also plot:

with(results,plot(latitud,longitud,col=factor(cluster)))

enter image description here

And you can think about whether the clustering makes sense, for example, should you use instead G=4..

StupidWolf
  • 34,518
  • 14
  • 22
  • 47