Regression Analysis is a statistic analysis method to describe the relationship between variables.
Linear Regression
Underlying Assumptions
- There is a linear relationship between the independent and dependent variables.
- Independence between data points.
- No covariance between independent variables, independent of each other.
- The residuals are independent, equal variance, and normally distributed.
Target
Find a line Y=WX+b to fit the data.
Loss Function
Least Square Method
To minimize the loss function
We transform this into a convex optimization problem.
Combining these two equations, we get:
Gradient Descent
In addition to least squares, the intercept and slope can be iteratively updated using gradient-based methods.
- Initialize
- Repeat the following process
Multiple Linear Regression
When there are multiple independent variables, we use matrices to express:
So,
Solving for the partial derivative, we get
So
We can see that, if
Correlation Coefficient for Linear Regression
We define correlation coefficient r as
It describes the degree of linear correlation between the two variables.
Coefficient of Determination
Coefficient of determination
It measures the degree to which the model is sturdy to the data.
- What percentage of fluctuations in y can be described by fluctuations in x.
- The closer
is to 1, the better the independent variable explains the dependent variable in the regression analysis.