Ensemble Learning

2024-04-03

Ensemble Learning mainly combines two different models into a larger model in order to further boost the accuracy.

Brief Introduction

Intuition

Combining some different predictions may have better accuracy for the same problem.

Reasons:

It’s quite easy to find a very accurate ‘rules of thumb’, but it’s hard to find a single high-accuracy global rule.
If the Sample Space is small and the Hypothesis Space is big, then there may be some assumptions obtaining the same accuracy. Choosing only one assumption may perform terribly on the test set.
Algorithms may converge to a local optimal solution. Combining different assumptions may reduce the risk of converging to a bad local optimal solution.

Learners

Strong Learners: Learning algorithms having high accuracy
Weak Learners: Have slightly higher accuracy than randomly predicting. (error = 1/2 - )
Question: Could we turn a Weak Learner into a Strong Learner or turn some Weak Learners into a Strong Learner?

Sometimes, a single classifier doesn’t perform well, but the combination of them perform well. So, we can ensemble the weights of them together to predict.

Ensemble Strategy

Average

Simple Mean
Weighted Mean

Vote

Majority Voting Method
Weighted Voting Method

Learning

Weighted Majority Algorithm
Stacking

Algorithms

AdaBoost (Adaptive Boosting)

Initialization: AdaBoost starts by assigning equal weights to all training examples.
Sequential Training: AdaBoost iteratively trains a series of weak learners on the training data. In each iteration, the algorithm focuses on the examples that were misclassified in the previous iterations by assigning higher weights to them. This way, AdaBoost gives more emphasis to the difficult examples that the weak learners struggle to classify correctly.
Weighted Voting: After training each weak learner, AdaBoost assigns a weight to it based on its performance. The weight of each weak learner is determined by its accuracy in classifying the training examples. The more accurate a weak learner is, the higher its weight in the final model.
Final Model: The final model in AdaBoost is a weighted combination of all the weak learners, where the weights are determined based on the accuracy of each learner. During prediction, the weak learners’ outputs are combined through a weighted voting scheme to make the final prediction.

Notes
- The performance depends on the data and the weak learners.
- In the following situations, the Adaboost Algorithm will lose efficacy:
  - The Weak Learners are too complicated. (Over-fitting)
  - The Weak Learners are too weak
- Previous experiments show that Adaboost seems to be easily affected by noise.

Bagging (Bootstrap Aggregating)

Bootstrap Sampling:
- Randomly select multiple subsets of the original dataset with replacement. Each subset is of the same size as the original dataset.
Base Learner Training:
- Train a base learner (e.g., decision tree) on each bootstrap sample independently. (Each base learner learns from a different subset of the data, which introduces diversity in the models.)
Parallel Training:
- The base learners are trained in parallel, meaning that each learner is trained independently of the others.
Prediction Aggregation:
- Once all base learners are trained, predictions are made on new data points using each individual model. For regression tasks, the final prediction is often the average of the predictions from all base learners. For classification tasks, the final prediction is determined by majority voting among the base learners.
Final Prediction:
- The final prediction of the Bagging ensemble model is the aggregated result of all base learners.
Evaluation:
- The performance of the Bagging ensemble model is evaluated using metrics such as accuracy, mean squared error, or other relevant evaluation metrics.