[7] About RandomForest

[7] About RandomForest

2021. 1. 7. 16:12ㆍEnglish/Machine learning algorithm

728x90

Random forest can be thought of as an ensemble of decision trees. Because individual decision trees have a high variance problem, the goal of Random Forest is to average multiple decision trees to improve generalization performance and reduce the risk of overfitting.

Learning Process for Random Forests
1. Draw n random bootstrap samples by allowing redundancy in the training set.
2. Learn the decision tree from the bootstrap sample.
a. Choose d characteristics randomly without allowing redundancy.
b. Nodes are split using properties that make the best segmentation based on objective functions such as information gain.
3. Repeat steps 1 and 2 k times.
4. Gather predictions from each decision tree and assign class labels by majority vote.

Characteristics of Random Forests
- Unlike decision trees, they learn using only random d characteristics when learning.
- Generally, pruning is not necessary.
- The parameter that should be considered when tuning the hyperparameters is 'number of trees( in step 3 k)'.

Random forest is a robust model that can be generalized, so in most cases, just increasing the number of trees improves performance. As the sample size of the bootstrap decreases, the diversity of the individual trees increases. Thus, the smaller the bootstrap sample size, the greater the randomness of random forests and the less the impact of overfitting. Furthermore, the overall performance of random forests tends to decrease.

In the library of commonly used 'sklearn' when implementing random forest, the size of the bootstrap sample equals the number of samples in the original training set. This is because this allows a balanced bias-distributed trade-off. However, the number of characteristics (d) used for classification uses \(\sqrt{the number of characteristics in the training set}\).

728x90

'English > Machine learning algorithm' 카테고리의 다른 글

[9] Handle missing data (Python Data Analysis, Machine Learning) (0)	2021.02.06
[8] Understanding K-Neighborhood (KNN) (0)	2021.01.17
[6] Information gain and impurity of decision tree (0)	2021.01.06
[5] Non-linear Troubleshooting with Kernel SVM (0)	2021.01.04
[4] Understand Support Vector Machine (SVM) Principles (linear) (0)	2020.12.31

콩하나, 니 백엔드 개발 하나?

콩하나, 니 백엔드 개발 하나?

태그

최근글

댓글

공지사항

아카이브

'English > Machine learning algorithm' 카테고리의 다른 글

관련글

티스토리툴바