On linear classifier

   Given some data sets , They belong to two different categories . For example, for advertising data , It is a typical dichotomous problem , Generally, the data to be clicked is called positive sample , Data that has not been clicked is called a negative sample . Now we're going to find a linear classifier , These data are divided into two categories ( Of course, in reality , Advertising data is particularly complex , It is impossible to distinguish by a linear classifier ). use X Represents sample data ,Y Represents the sample category ( for example 1 And -1, perhaps 1 And 0). The purpose of our linear classifier , It's about finding a hyperplane (Hyperplan) Separate the two types of samples . For this hyperplane , It can be described by the following formula :

ωTx+b=0

   about logistic regression , Yes :

hθ(x)=g(θTx)=11+e−θTx

   among x Sample ,x=[x1,x2,⋯,xn] by n Dimensional vector , function g For what we often say logistic function .g The more general formula of is :

g(z)=11+e−z

   This formula , Students who have a little knowledge of machine learning may be very familiar with it , Not only in logistic Returning , stay SVM in , stay ANN in , You can see him , It is widely used . Most of the information is about this formula , It's all given directly . But I don't know if you have thought about it , Since this formula is so widely used , So why should we use it ?

   Have many people been stunned . That's what everybody uses . That's what the books say . yes , But when something's always hanging around in front of your eyes , Should you think about why ? Anyway, for me , If something has appeared in front of me for the third time and I don't know why , I'll try to figure out why .

Why use it Logistic function

   Students who have learned pattern recognition must have learned all kinds of classifiers . The simplest natural classifier is linear classifier , In linear classifier , The simplest should belong to the perceptron . In the 1950s and 1960s , The perceptron comes up :

y=0,∑i=1nωix≤b
y=1,∑i=1nωix>b

   The idea of perceptron , It's the dot product of all features and weights ( inner product ), Then compare the size with the threshold , The samples are divided into two categories . Students who know a little bit about neural network , You must be familiar with this picture :

   you 're right , This is a picture of a perceptron .
   My postgraduate entrance examination is the control principle , If you have studied the control principle or the signal system , We know that the perceptron is equivalent to the step function in those two courses :

   The essence of the two is the same , That is, by setting a threshold , Then compare the size of sample and threshold to classify .

   This model is simple and intuitive , It's easier to implement ( How about the simplest existing classifier ). But the problem is , This model is not smooth enough . first , hypothesis t0=10
, Now there's a sample coming in , Finally, the calculated value is 10.01, You said that the sample classification should be 1 still 0 What about ? It's not very reliable . second , This function is in the t0
There's a step , From 0 reach 1 The mutation of , This leads to this discontinuity , It's not convenient to deal with it mathematically .

   I've written so much , At last it was my turn logistic Function comes out . Compare the previous perceptron or step function , What are his strengths ?

   adopt logistic The image of the function , We can easily sum up the following advantages of him :
  1. His input range is −∞→+∞ , And it happens to be (0,1), The probability distribution is exactly satisfied (0,1) Requirements of . We use probability to describe classifiers , Nature is much more convenient than a certain threshold ;
  2. It's a monotone rising function , It has good continuity , There are no discontinuities .

   Write it here , Partners should understand why they use it logistic Function .
   Coming soon logistic Follow up article series .

Technology
©2019-2020 Toolsou All rights reserved,
Forbes China Auto rich list : He xiaopengdi 11 Li Xiangdi 14 Li Bindi 15 Change one's mind ! Tesla starts to deliver made in China to European market Model 3 The difference between memory overflow and memory leak , Causes and Solutions Character recognition technology of vehicle license plate based on Neural Network Vue Transfer parameters and receiving of page Jump SparkSQL Achieve partition overlay write 1190 Reverses the substring between each pair of parentheses leetcode Note 14 : The second biggest obstacle to motivating others in R & D management Chrome OS, For programmers and Windows What does it mean ? Internet Marketing JAVA Convert a string to a numeric type