On linear classifier

Given some data sets , They belong to two different categories . For example, for advertising data , It is a typical dichotomous problem , Generally, the data to be clicked is called positive sample , Data that has not been clicked is called a negative sample . Now we're going to find a linear classifier , These data are divided into two categories （ Of course, in reality , Advertising data is particularly complex , It is impossible to distinguish by a linear classifier ）. use X Represents sample data ,Y Represents the sample category （ for example 1 And -1, perhaps 1 And 0）. The purpose of our linear classifier , It's about finding a hyperplane （Hyperplan） Separate the two types of samples . For this hyperplane , It can be described by the following formula ：

ωTx+b=0

about logistic regression , Yes ：

hθ(x)=g(θTx)=11+e−θTx

among x Sample ,x=[x1,x2,⋯,xn] by n Dimensional vector , function g For what we often say logistic function .g The more general formula of is ：

g(z)=11+e−z

This formula , Students who have a little knowledge of machine learning may be very familiar with it , Not only in logistic Returning , stay SVM in , stay ANN in , You can see him , It is widely used . Most of the information is about this formula , It's all given directly . But I don't know if you have thought about it , Since this formula is so widely used , So why should we use it ?

Have many people been stunned . That's what everybody uses . That's what the books say . yes , But when something's always hanging around in front of your eyes , Should you think about why ? Anyway, for me , If something has appeared in front of me for the third time and I don't know why , I'll try to figure out why .

Why use it Logistic function

Students who have learned pattern recognition must have learned all kinds of classifiers . The simplest natural classifier is linear classifier , In linear classifier , The simplest should belong to the perceptron . In the 1950s and 1960s , The perceptron comes up ：

y=0,∑i=1nωix≤b
y=1,∑i=1nωix>b

The idea of perceptron , It's the dot product of all features and weights （ inner product ）, Then compare the size with the threshold , The samples are divided into two categories . Students who know a little bit about neural network , You must be familiar with this picture ：

you 're right , This is a picture of a perceptron .
My postgraduate entrance examination is the control principle , If you have studied the control principle or the signal system , We know that the perceptron is equivalent to the step function in those two courses ：

The essence of the two is the same , That is, by setting a threshold , Then compare the size of sample and threshold to classify .

This model is simple and intuitive , It's easier to implement （ How about the simplest existing classifier ）. But the problem is , This model is not smooth enough . first , hypothesis t0=10
, Now there's a sample coming in , Finally, the calculated value is 10.01, You said that the sample classification should be 1 still 0 What about ? It's not very reliable . second , This function is in the t0
There's a step , From 0 reach 1 The mutation of , This leads to this discontinuity , It's not convenient to deal with it mathematically .

I've written so much , At last it was my turn logistic Function comes out . Compare the previous perceptron or step function , What are his strengths ?

adopt logistic The image of the function , We can easily sum up the following advantages of him ：
1. His input range is −∞→+∞ , And it happens to be （0,1）, The probability distribution is exactly satisfied （0,1） Requirements of . We use probability to describe classifiers , Nature is much more convenient than a certain threshold ;
2. It's a monotone rising function , It has good continuity , There are no discontinuities .

Write it here , Partners should understand why they use it logistic Function .
Coming soon logistic Follow up article series .

Technology
Daily Recommendation
views 26