Author Archives: petrbour

Intuition vs Unsupervised Learning – Agglomerative Clustering in practice

Clustering is a hugely important step of exploratory data analysis and finds plenty of great applications. Typically, clustering technique will identify different groups of observations among your data. For example, if you need to perform market segmentation, cluster analysis will help you with labeling each segment so that you can evaluate each segment’s potential and target some attractive segments. Therefore, your marketing program and positioning strategy rely heavily on the very fundamental step – grouping of your observations and creation of meaningful segments. We may also find many more use cases in computer science, biology, medicine or social science. However, it often turns out to be quite difficult to define properly how a well-separated cluster looks like.

Today, I will discuss some technical aspects of hierarchical cluster analysis, namely Agglomerative Clustering. One great advantage of this hierarchical approach would be fully automatic selection of the appropriate number of clusters. This is because in genuine unsupervised learning problem, we have no idea how many clusters we should look for! Also, in my view, this clever clustering technique solves some ambiguity issues regarding vague definition of a cluster and thus is more than suitable for automatic detection of such structures. On the other hand, the agglomerative clustering process employs standard metrics for clustering quality description. Therefore, it will be fairly easy to observe what is going on. Continue reading

Analytical Market Segmentation with t-SNE and Clustering Pipeline

Irrespective of whether the underlying data comes from e-shop customers, your clients, small businesses or both large profit and non-profit organizations, market segmentation analysis always brings valuable insights and helps you to leverage otherwise hidden information in your favor, for example greater sales. Therefore, it is vitally important to utilize an efficient analytical pipeline, which would not only help you understand your customer base, but also further serve you during planning of your tailored offers, advertising, promos or strategy.  Let us play with some advanced analytics in order to provide a simple example of efficiency improvement when using segmentation techniques, namely clusteringprojection pursuit and t-SNE.

As your goal might be improving your sales through tailored customer contact, you need to discover homogeneous groups of people. The different groups of customers behave and respond differently, therefore it is only natural to treat them in a different way. The idea is to get greater profit in each segment separately, through diverse strategy. Thus, we need to accomplish two fundamental tasks:

  1. identify homogeneous market segments (i.e. which people are in which group)
  2. identify important features (i.e. what is decisive for customer behavior)

In this post, I am focusing on the first problem from the technical point of view, using some advanced analytic methods. For the sake of brief demonstration, I will work with simple dataset, describing the annual spending of clients of a wholesale distributor on diverse product categories. Following the figure below, it would be difficult to detect some well separated clusters of clients at the first sight.

scatter_all

Continue reading

Boosting Your Model Accuracy with AdaBoost

In this post, I am going to talk about an exceptional ensemble method for improving classification accuracy (boosting) called AdaBoost. AdaBoost algorithm efficiently converts a weak classifier, which is defined as a classifier that achieves only a slightly better accuracy than random guessing, into a strong classifier, which performs significantly better. AdaBoost is fast, does not require any inner parameters to tune and we can combine it with any weak learner, for example Decision Tree.decision

Imagine you are dealing with a classification task critical to your underlying business. For example, you may want to identify two different groups of customers in order to make a targeted offer, suitable only for one of the groups. In this case, the more accurately you classify your two major groups of customers, the more profit you gain on your targeted offers.

Continue reading