[7] About RandomForest

2021. 1. 7. 16:12English/Machine learning algorithm

728x90
반응형

 

Random forest can be thought of as an ensemble of decision trees. Because individual decision trees have a high variance problem, the goal of Random Forest is to average multiple decision trees to improve generalization performance and reduce the risk of overfitting.


Learning Process for Random Forests
1. Draw n random bootstrap samples by allowing redundancy in the training set.
2. Learn the decision tree from the bootstrap sample.
    a. Choose d characteristics randomly without allowing redundancy.
    b. Nodes are split using properties that make the best segmentation based on objective functions such as                  information gain.
3. Repeat steps 1 and 2 k times.
4. Gather predictions from each decision tree and assign class labels by majority vote.



Characteristics of Random Forests
- Unlike decision trees, they learn using only random d characteristics when learning.
- Generally, pruning is not necessary.
- The parameter that should be considered when tuning the hyperparameters is 'number of trees( in step 3 k)'.

Random forest is a robust model that can be generalized, so in most cases, just increasing the number of trees improves performance. As the sample size of the bootstrap decreases, the diversity of the individual trees increases. Thus, the smaller the bootstrap sample size, the greater the randomness of random forests and the less the impact of overfitting. Furthermore, the overall performance of random forests tends to decrease.

In the library of commonly used 'sklearn' when implementing random forest, the size of the bootstrap sample equals the number of samples in the original training set. This is because this allows a balanced bias-distributed trade-off. However, the number of characteristics (d) used for classification uses \(\sqrt{the number of characteristics in the training set}\).

728x90
반응형