I am facing some issues with my k-means clustering results on Alteryx. I am trying to conduct topic modelling on my data set of around 5000 text descriptions. After data cleaning, parsing and removing stop words and common words, I created a Document Term Matrix of 20 words and around 5000 documents.
After running K-Means Clustering on Alteryx, no matter how many clusters I indicated, there will always be only 1 document in all clusters except one with all the rest. For example:
2 Clusters
- Cluster 1: 19 words
- Cluster 2: 1 word
3 Clusters
- Cluster 1: 18 words
- Cluster 2: 1 word
- Cluster 3: 1 word
5 Clusters
- Cluster 1: 16 words
- Cluster 2: 1 word
- Cluster 3: 1 word
- Cluster 4: 1 word
- Cluster 5: 1 word
This clustering behavior happens no matter how many clusters I indicated. Looking for some help to shed some light and identify if these results would mean my data has problems or if I did not use the correct settings?
Thanks in advance!