Clustering Without Labels: K-Means, Hierarchical, and How They See the World Differently
Everything up to now has been supervised: models that learn from labeled data. Today I hit the first unsupervised algorithms: clustering. The task is to find groups in data that nobody labeled. No answer key.
K-Means is the most common clustering algorithm. You pick (number of clusters) upfront. The algorithm:
- Places centroids randomly
- Assigns each point to its nearest centroid
- Moves each centroid to the average of its assigned points
- Repeats until centroids stop moving
The problem: K-Means is sensitive to initial centroid placement (hence K-Means++ for smarter initialization), only finds roughly spherical clusters, and you have to know in advance. Picking uses the elbow method: plot within-cluster sum of squares against , pick where the curve bends.
Hierarchical Clustering doesn't need upfront. It builds a dendrogram: a tree showing how points merge into clusters step by step:
- Agglomerative (bottom-up): Start with every point as its own cluster. Merge the two closest. Repeat until one cluster remains.
- Divisive (top-down): Start with everything in one cluster. Split recursively.
You then "cut" the dendrogram at a height that gives you the number of clusters you want.
What clicked
K-Means is faster and scales better but assumes you know and assumes roughly circular clusters. Hierarchical gives more flexibility and a visual picture of structure, but is in memory and time: doesn't scale to large datasets.
Still shaky on
How do you evaluate clustering quality when there are no labels? I know about silhouette score and within-cluster sum of squares but haven't worked through what "good" looks like in practice.
What's next
What if the problem isn't grouping but compression: reducing 50 features to 3 while keeping the most important signal? That's PCA.