Regression analysis , Only one independent variable and one dependent variable are included , The relationship between them can be expressed by a straight line , This kind of regression analysis is called unitary linear regression analysis ; If two or more independent variables are included in the regression analysis , The relationship between dependent variable and independent variable is linear , It is called multiple linear regression analysis . The commonly used methods are gradient descent method and least square method .

1. Gradient descent method (GD)

1.1 principle :

        

        among , Is the learning rate or step size (Learning rate)

1.2 Hypothesis function :

        

1.3 loss function :

         

1.4 Analysis process :

        1.4.1 Batch gradient descent method (BGD)

                Batch gradient descent method , It is the most commonly used form of gradient descent method , This is done by using all m To update . Update the formula to :

                

        1.4.2 Stochastic gradient descent method (SGD)

                 Stochastic gradient descent method , In fact, it is similar to the principle of batch gradient descent method , The difference between the gradient and not all m Data of samples , It's just a sample i
To find the gradient . Update the formula to :

                

             
   Stochastic gradient descent method , and 4.1 The batch gradient descent method is two extremes , One uses all the data for gradient descent , One uses one sample for gradient descent . Naturally, their advantages and disadvantages are very prominent . For training speed , The stochastic gradient descent method uses only one sample at a time , Training is fast , And the batch gradient descent method is used when the sample size is large , Training speed is not satisfactory . For accuracy , The random gradient descent method is used to determine the direction of the gradient with only one sample , The resulting solution is probably not optimal .

        1.4.3  Small batch gradient descent method (MBGD)

               
  Small batch gradient descent method is a compromise between batch gradient descent method and random gradient descent method , That is, for m Samples , We use x Let's iterate ,1<x<m. Generally, it can be taken x=10, Of course, according to the sample data , You can adjust this x Value of . Update the formula to :

                

1.5  Algorithm optimization :

         When using the gradient descent method , Tuning is needed . What needs to be tuned ?

    ①
Step size selection of algorithm . In the previous algorithm description , I mentioned taking the step size as 1, But the actual value depends on the data sample , You can take more values , From big to small , Run the algorithm separately , Look at the iteration effect , If the loss function is decreasing , It indicates that the value is valid , Otherwise, increase the step size . I said it earlier . The step size is too large , It will cause the iteration to be too fast , It may even miss the optimal solution . The step size is too small , The iteration speed is too slow , The algorithm won't end for a long time . Therefore, the step size of the algorithm needs to be run many times to get a better value . For step size , The following data can be used for testing :

         
  0.001,0.01,0.1,1,10... Or you can use it 0.001,0.003,0.01,0.03,0.1,0.3,1... It can be used immediately 3 Times or 10 Times the speed , Slowly adjust it to a range , Then fine tune it

    ②
Initial value selection of algorithm parameters . The initial values are different , The minimum values obtained may also be different , Therefore, only the local minimum is obtained by gradient descent ; Of course, if the loss function is convex, it must be the optimal solution . Because of the risk of local optimal solution , The algorithm needs to be run multiple times with different initial values , Minimum value of critical loss function , Select the initial value of loss function minimization .

    ③
normalization .  Due to different samples, the range of values of characteristics is different , It can lead to slow iterations , In order to reduce the influence of feature value , Feature data can be normalized , That is, for each feature x, The standard deviation of the sum of its expectations std(x), And then it turns into :

            

            The new expectation of this feature is 0, The new variance is 1, The iteration speed can be greatly accelerated .

2. least square method (OLS)

Matrix expression : 

order :

        ,

that ,

                

of course , It may not exist , This is very rare . If it's irreversible , The following two cases are generally considered :

        (1) Remove redundant features . There is a linear dependence between some features

        (2) When there are too many features , Delete some features , such as (m<n), For small sample data, regularization is used .

3. Selection of gradient descent method and least square method

Gradient descent method least square method

shortcoming :

* Need to choose learning rate
* Multiple iterations are required
* When the range of eigenvalues is too different , Normalization is required
advantage :

* When the characteristic number n When I was very young , Be able to work well
advantage :

* There is no need to choose the learning rate
* Multiple iterations are not required
* Normalization is not required
shortcoming :

* When the characteristic number n When I was very young , It's very slow , Because the inverse matrix is slow

under normal conditions , When n<10000 Time , Using the least square method . When n>=10000 Time , Consider the gradient descent method .

For some more complicated algorithms, only gradient descent method can be selected

Technology
©2019-2020 Toolsou All rights reserved,
What are the protection measures for website security ? Talking about uni-app Page value transfer problem Remember once EventBus Project issues caused by memory leaks I've been drinking soft water for three years ? What is the use of soft water and water softener Time conversion front desk will 2020-07-17T03:07:02.000+0000 Into 2020-07-17 11:07:02 Chuan Shen Count the number of letters (java Language implementation ) Time format 2020-01-13T16:00:00.000Z Medium T and Z What do they mean , How to deal with it msf Generate Trojan horse attack android mobile phone The world's first free open source ERP Odoo Ubuntu The best development environment What's the difference in server prices ?