Decision Tree

2024-04-03

DT solves the classification problem with discrete features.(If some features are continuous, then discretizing them.) To decide which feature to use in each node, we calculate the Entropy or Gini Hybridity. Having the hybridity, we can calculate the information gain and decide when to stop.

Entropy

For Entropy, we define

Since, is , which is quite difficult to use. We define .
In the information theory, entropy measures the purity or uncertainty of the information. This can be used in deciding which feature to use.

Gini Hybridity

For Gini Hybridity, we define

Intuitively, it measures the mixing level of the information.

Information Gain(IG)

We define

Pruning

Pruning is a very important technique to avoid overfitting. It is divided into two cases ---- Pre-Pruning and Post-Pruning.

Pre-Pruning

Normally, there are two conditions when we stop pruning.

Set a fixed number of layers.
Set a fixed number of information gain. If the value goes below this, we stop classifying.

Post-Pruning

Normally, we use validation set to help do the post-pruning. We greedily delete the nodes when deleting them improve the performance in the validation set.
Or, we could use rule post-pruning technique. C4.5 uses it.