Clustering is a hugely important step of exploratory data analysis and finds plenty of great applications. Typically, clustering technique will identify different groups of observations among your data. For example, if you need to perform market segmentation, cluster analysis will help you with labeling each segment so that you can evaluate each segment’s potential and target some attractive segments. Therefore, your marketing program and positioning strategy rely heavily on the very fundamental step – grouping of your observations and creation of meaningful segments. We may also find many more use cases in computer science, biology, medicine or social science. However, it often turns out to be quite difficult to define properly how a well-separated cluster looks like.
Today, I will discuss some technical aspects of hierarchical cluster analysis, namely Agglomerative Clustering. One great advantage of this hierarchical approach would be fully automatic selection of the appropriate number of clusters. This is because in genuine unsupervised learning problem, we have no idea how many clusters we should look for! Also, in my view, this clever clustering technique solves some ambiguity issues regarding vague definition of a cluster and thus is more than suitable for automatic detection of such structures. On the other hand, the agglomerative clustering process employs standard metrics for clustering quality description. Therefore, it will be fairly easy to observe what is going on. Continue reading