In this post, I am going to talk about an exceptional ensemble method for improving classification accuracy (boosting) called AdaBoost. AdaBoost algorithm efficiently converts a weak classifier, which is defined as a classifier that achieves only a slightly better accuracy than random guessing, into a strong classifier, which performs significantly better. AdaBoost is fast, does not require any inner parameters to tune and we can combine it with any weak learner, for example Decision Tree.
Imagine you are dealing with a classification task critical to your underlying business. For example, you may want to identify two different groups of customers in order to make a targeted offer, suitable only for one of the groups. In this case, the more accurately you classify your two major groups of customers, the more profit you gain on your targeted offers.
Let us formalize this task into a binary classification problem, in which you need to separate your data into two different classes A and B. Your data set may be similar to the picture below. We are giving an example of non-trivial situation. You would not be able to split the observations using a simple linear cut. In a complicated case like this, many basic classification algorithms will achieve only a slightly better accuracy than random guessing.
During each iteration of the AdaBoost algorithm, we increase the weight of an observation if the observation has been misclassified, otherwise, we decrease the weight. In a nutshell, this enables the weak classifier to concentrate on previously misclassified observations so that the overall accuracy is improved. At the last iteration, our AdaBoost classifier may produce a decision function as shown below. The brighter color corresponds to the higher classification probability:
For those who like to dig a little deeper, let me note that AdaBoost algorithm has some nice theoretical properties. For example, the training error of AdaBoost classification is reduced exponentially with the number of iterations. This ensures very rapid convergence of the algorithm to the best possible accuracy achieved as shown in the picture below:
And what is more, the AdaBoost classifier tends to overfit only very rarely – it is an empirical fact observed by many experiments that the generalization error of the AdaBoost classifier often continues to decrease even after the training error reaches zero.
For anyone interested in practical application, AdaBoost algorithm can be found as out-of-the-box method in machine learning packages
scikit-learn (Python) and
caret (R). More advanced example can be found in here.
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning
and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139,