Machine Learning(8)
-
[9] Handle missing data (Python Data Analysis, Machine Learning)
Handle missing data - Missing data can be addressed in two main ways: 1. Delete a sample (Row) or column (Feature) with missing data. 2. Use interpolation to predict missing data. - Average, median, frequency, constant, etc. are used. Let's use the code to check. 0. Create a data frame with missing data Calling up required libraries import pandas as pd # When it used to make DataFrame import num..
2021.02.06 -
[8] Understanding K-Neighborhood (KNN)
KNN KNN belongs to a typical 'Lazy Learner'. That is, instead of learning the discriminant function from the training data, we proceed with the learning by storing the training dataset in memory. Thanks to this, there is no cost in the learning process. Instead, the computational cost in the prediction phase is high. Memory-based classifiers have the advantage of being able to adapt immediately ..
2021.01.17 -
[7] About RandomForest
Random forest can be thought of as an ensemble of decision trees. Because individual decision trees have a high variance problem, the goal of Random Forest is to average multiple decision trees to improve generalization performance and reduce the risk of overfitting. Learning Process for Random Forests 1. Draw n random bootstrap samples by allowing redundancy in the training set. 2. Learn the de..
2021.01.07 -
[6] Information gain and impurity of decision tree
Decision trees are named because they are like trees in the form of class classification through certain criteria. The criteria for classifying decision trees are information gain. Information gains can be determined based on impurity. As the name suggests, impurity is an indicator of how various classes are mixed into the node. \( IG(D_p,f) = I(D_p) - \sum_{j=1}^{m}\frac{N_j}{N_p}I(D_j) \) Info..
2021.01.06 -
[5] Non-linear Troubleshooting with Kernel SVM
Algorithms such as linear SVMs and regression cannot distinguish classes that are distinguished by nonlinearity. Using kernel methods using the mapping function\(\phi\) can solve nonlinear problems. Using the mapping function, the nonlinear combination of the original characteristics can be projected into a linearly differentiated high-dimensional space, where the hyperplane is distinguished and..
2021.01.04 -
[3] Logistic Regression Principles
Logistic regression uses the sigmoid function as an activation function and the logability function as a cost function. Activation function Odds ratio: the probability that a particular event will occur. $$\frac{P}{(1 - P)}$$ Where P is the probability that it is a positive sample, which refers to the probability that the target to be predicted will occur. The log function is usually defined by ..
2020.12.31