In the previous posts of the ‘Clustering in ML Series’, we learned about the Density-Based Clustering methods in ML. In this part, we will discuss another clustering method, known as Connectivity-Based Clustering. It is also known as Hierarchical clustering.

In part 2 of this Clustering in ML series, we learned about k-means which is easy to understand and implement in practice but has some limitations, so Connectivity-based clustering is an alternative approach to it. In k-means clustering, since we start with a random choice of clusters, the results produced by running the algorithm multiple times might differ, while results are reproducible in Hierarchical clustering. k-means is found to work well when the shape of the clusters is hyperspherical (like a circle in 2D, a sphere in 3D) while Hierarchical clustering is not shape-dependent. Hierarchical clustering has an added advantage over k-means clustering in that its results can be easily visualized using an attractive tree-based representation called a dendrogram. k-means clustering requires prior knowledge of k i.e. no. of clusters you want to divide your data into. But, you can stop at whatever number of clusters you find appropriate in Hierarchical clustering by interpreting the dendrogram.

It is based on the core idea of objects being more related to nearby objects than to objects farther away. This is similar to Centroid-based clustering but instead of having clusters defined by a set of centers, clusters are described by the maximum distance needed to connect parts of the cluster. Changing this constant will create different clusters. Hierarchical clustering is the hierarchical decomposition of the data based on group similarities. Dendrograms are tree structures that resemble hierarchy.

Types of Hierarchical Clustering

1- Agglomerative clustering

It is also known as AGNES (AGglomerative NESting), it works in a bottom-up manner. That is, each observation is initially considered as a single-element cluster (leaf). At each step of the algorithm, the two clusters that are the most similar are combined into a new bigger cluster (nodes). This procedure is iterated until all points are a member of just one single big cluster (root). The result is a tree that can be displayed using a dendrogram.

Recursively form new bigger clusters by combining small clusters

2- Divisive hierarchical clustering

It is also known as DIANA (DIvise ANAlysis), it works in a top-down manner. DIANA is like the reverse of AGNES. It begins with the root, in which all observations are included in a single cluster. At each step of the algorithm, the current cluster is split into two clusters that are considered most heterogeneous. The process is iterated until all observations are in their own cluster.

Recursively form new small clusters by dividing a bigger cluster

Measure the dissimilarity between two clusters

The most common methods are:

  • Complete linkage clustering – Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the largest value of these dissimilarities as the distance between the two clusters. It tends to produce more compact clusters.
  • Single linkage clustering – Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the smallest of these dissimilarities as a linkage criterion. It tends to produce long, “loose” clusters.
  • Mean linkage clustering – Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the average of these dissimilarities as the distance between the two clusters. Can vary in the compactness of the clusters it creates.
  • Centroid linkage clustering – Computes the dissimilarity between the centroid for cluster 1 (a mean vector of length p, one element for each variable) and the centroid for cluster 2.
  • Ward’s method – Minimizes the total within-cluster variance. At each step, the pair of clusters with the smallest between cluster distance are merged. Tends to produce more compact clusters.

Complete linkage and Ward’s method are preferred for AGNES clustering, while for DIANA, the mean or average linkage clustering method is preferred. In Hierarchical clustering, we use a concept called a proximity matrix. This stores the distances between each point, depending upon the type of linkage used.

Choosing the Number of Clusters in Hierarchical Clustering

For this, we use the concept of dendrograms, which is a tree-like diagram that records the sequences of merges or splits.  Whenever we merge two clusters, a dendrogram will record the distance between these clusters and represent it in graph form. We will have the samples of the dataset on the x-axis and the distance on the y-axis and whenever two clusters are merged, it will be joined in the dendrogram and the height of the join will be the distance between these points. Let’s say we are attempting the merging of points like the ones below:

Then our overall dendrogram will look like this:

More the distance of the vertical lines in the dendrogram, the more the distance between those clusters. We can now set a threshold distance and draw a horizontal line. Let’s say if we set the threshold as 12, and draw a horizontal line to it. Then the number of clusters will be the number of vertical lines that are being intersected by the line drawn using the threshold. In this example, since the threshold line intersects 2 vertical lines, we will have 2 clusters. One cluster will have a sample (1,2,4) and the other will have a sample (3,5) and this is how we can decide the number of clusters using a dendrogram in Hierarchical Clustering.

However, it has some limitations as well, as noise and outliers in datasets can cause problems for connectivity-based methods. They show up as additional clusters, which increases the complexity of clustering or causes clusters to merge improperly. Also, density-based methods fail if the clusters are of various densities. It can’t handle big data well, because the time complexity of hierarchical clustering is quadratic i.e. O(n^2).

This was a quick introduction to Connectivity-based clustering. In the next part of this series, we will study a couple of different clustering algorithms, Distribution-Based Clustering, and Fuzzy Clustering. See you in the next part of this series! 🙂

For all images: Reference

Author

Shubham Bindal

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s