English(9)
-
[9] Handle missing data (Python Data Analysis, Machine Learning)
Handle missing data - Missing data can be addressed in two main ways: 1. Delete a sample (Row) or column (Feature) with missing data. 2. Use interpolation to predict missing data. - Average, median, frequency, constant, etc. are used. Let's use the code to check. 0. Create a data frame with missing data Calling up required libraries import pandas as pd # When it used to make DataFrame import num..
2021.02.06 -
[8] Understanding K-Neighborhood (KNN)
KNN KNN belongs to a typical 'Lazy Learner'. That is, instead of learning the discriminant function from the training data, we proceed with the learning by storing the training dataset in memory. Thanks to this, there is no cost in the learning process. Instead, the computational cost in the prediction phase is high. Memory-based classifiers have the advantage of being able to adapt immediately ..
2021.01.17 -
[7] About RandomForest
Random forest can be thought of as an ensemble of decision trees. Because individual decision trees have a high variance problem, the goal of Random Forest is to average multiple decision trees to improve generalization performance and reduce the risk of overfitting. Learning Process for Random Forests 1. Draw n random bootstrap samples by allowing redundancy in the training set. 2. Learn the de..
2021.01.07 -
[6] Information gain and impurity of decision tree
Decision trees are named because they are like trees in the form of class classification through certain criteria. The criteria for classifying decision trees are information gain. Information gains can be determined based on impurity. As the name suggests, impurity is an indicator of how various classes are mixed into the node. \( IG(D_p,f) = I(D_p) - \sum_{j=1}^{m}\frac{N_j}{N_p}I(D_j) \) Info..
2021.01.06 -
[5] Non-linear Troubleshooting with Kernel SVM
Algorithms such as linear SVMs and regression cannot distinguish classes that are distinguished by nonlinearity. Using kernel methods using the mapping function\(\phi\) can solve nonlinear problems. Using the mapping function, the nonlinear combination of the original characteristics can be projected into a linearly differentiated high-dimensional space, where the hyperplane is distinguished and..
2021.01.04 -
[4] Understand Support Vector Machine (SVM) Principles (linear)
The optimization target for SVMs is to maximize margins. Margin: The distance between the superplane (decision boundary) that separates the class and the training sample closest to this superplane. * Decision Boundary: Boundary that separates classes The decision boundaries of large margins tend to reduce generalization error. On the other hand, the decision boundaries of small margins are likel..
2020.12.31