2020. 12. 31. 18:25ㆍEnglish/Machine learning algorithm
Logistic regression uses the sigmoid function as an activation function and the logability function as a cost function.
Activation function
Odds ratio: the probability that a particular event will occur.
$$\frac{P}{(1 - P)}$$
Where P is the probability that it is a positive sample, which refers to the probability that the target to be predicted will occur.
The log function is usually defined by using the Ozzby logarithmic function.
$$ logit(P) = log \frac{P}{(1 - P)} $$
The linear relationship between the weighted sum of the characteristics and the logarithmic odds is representable as follows:
$$ logit(P(y=1 \mid x)) = W_0X_0 + W_1X_1 + W_2X_2 + \cdots + W_mX_m = W^T X$$
* (P(y=1|x) is the conditional probability of 1 given x.
We use the logit function upside down because what we want to find with the algorithm is to predict the probability that a sample belongs to a particular class. This is called a logistic sigmoid function or sigmoid function.
$$ \Phi(z) = \frac{1}{(1 + e^{-z})} $$
This constructed logistic sigmoid function has the following s-shape.
$$ \hat{y} = 1, \{\phi(z) \ge 0.5\} $$
$$ \hat{y} = 0, \{\phi(z) < 0.5\} $$
\(\phi(z)\) converges to 1 because the larger the z, the smaller the e. Conversely, \(\phi(z)\) converges to zero because the smaller z, the larger the e, and the larger the denominator. If z is 0, the value of \\phi(z)\ is 0.5. With this principle, logistic regression classifies classes according to whether the final input value z is greater than or less than zero. In other words, \\phi(z)\ is represented by z, which can be expressed as follows: Logistic regression utilizes the logistic sigmoid function as an activation function.
$$ \hat{y} = 1, \{z \ge 0\} $$
$$ \hat{y} = 0, \{z < 0\} $$
Cost function
- Cost function of the existing (such as Perceptron): defined as the sum of squared errors (SSE)
$$ J(w) = \sum_i \frac{1}{2}(\phi(z^i)-y^i)^2 $$
SSE
- Cost function in logistic regression : defined using logability function.
$$ l(w) = logL(w) = \sum_{i=1}^{n} \left[y^{(i)} log(\phi(z^{(i)})) + (1-y^{(i)})log(1-\phi(z^{(i)}) \right] $$
Logability function
Definition of likelihood function
The likelihood function is to take a log into the product of a sequence between the \(y^i \) of the probability of being a positive sample and the \(1-y^i \)square of the probability of being a negative sample(1-the probability of being a positive sample).
$$ L(w) = \prod_{i=1}^{n} (\phi(z^{(i)}))^{y^{(i)}} (1 -\phi(z^{(i)}))^{1-y^{(i)}} $$
Characteristics of the likelihood function
- For a positive sample, the probability of being a negative sample is 1 and only the probability of being a positive sample remains.
- For voice samples, the probability of being a positive sample is 1 and only the probability of being a negative sample remains
- To maximize this, the probability of being a positive sample should be as high as possible and the probability of being a negative sample should be as small as possible.
- The conditions for the formulation of the expression should assume that each sample is independent.
The application of the log function to the likelihood function is the logability function.
$$ logL(w) = \sum_{i=1}^{n} \left[y^{(i)} log(\phi(z^{(i)})) + (1-y^{(i)})log(1-\phi(z^{(i)}) \right] $$
Advantages of the logability function
1. Applying the log function prevents underflow in numbers that occurs when the likelihood is very small.
2. You can change the product of the coefficients to the sum of the coefficients.
3. It is easy to derive the derivative and the maximum value through differentiation.
Logistic regression uses this to define cost functions as follows:
$$ J(w) = \sum_{i=1}^{n} \left[-y^{(i)} log(\phi(z^{(i)})) - (1-y^{(i)})log(1-\phi(z^{(i)}) \right] $$
Cost function in logistic regression
If you look at this as a binary classification problem, the values of y will have values of 0 and 1.
If so, the cost function will have a value of \(-log(\phi(z))\ when y is 1, and \(-log(1- \phi(z)))\ when y is 0.
If you graph this, you can see the following.
The x-axis is the active value of the sigmoid function. The y-axis is the logistic cost. Looking at the solid line, accurately predicting a sample belonging to class 1 brings the cost close to zero, but the cost of incorrect prediction approaches infinity. Likewise, the exact prediction of samples belonging to class 0 brings the cost closer to zero, but the cost of incorrect prediction approaches infinity. That is, the advantage of defining the logability function as a cost function is that it imposes a higher cost on the wrong prediction.
'English > Machine learning algorithm' 카테고리의 다른 글
[6] Information gain and impurity of decision tree (0) | 2021.01.06 |
---|---|
[5] Non-linear Troubleshooting with Kernel SVM (0) | 2021.01.04 |
[4] Understand Support Vector Machine (SVM) Principles (linear) (0) | 2020.12.31 |
[2] Implement Adaline (adaptive linear neuron) (0) | 2020.12.31 |
[1] Try implementing Perceptron (0) | 2020.12.31 |