hypothesis n−1n - 1n−1 Dimension space variable point is x⃗=(x1,x2,⋯ ,xn−1)T\vec{x}= (x_1, x_2,
\cdots, x_{n-1})^Tx=(x1​,x2​,⋯,xn−1​)T , And suppose there is mmm Such sample points Recorded as x⃗(1),x⃗(2),⋯
,x⃗(m)\vec{x}^{(1)}, \vec{x}^{(2)}, \cdots ,\vec{x}^{(m)}x(1),x(2),⋯,
x(m), We want to find a hyperplane like this , To fit these sample points as much as possible , Formal representation is equivalent to finding such coefficients w⃗\vec{w}w and bbb bring w⃗Tx⃗
+b≈y{\vec{w}}^T\vec{x}+b \approx ywTx+b≈y , To simplify the above expression , We will w⃗\vec{w}w and bbb
Put it together and write it down as (w⃗T,b)=w⃗T({\vec{w}}^T,b) = {\vec{w}}^T(wT,b)=wT, And order x⃗n(i)=1
\vec{x}^{(i)}_{n} = 1xn(i)​=1, So the above expression is equivalent to finding w⃗\vec{w}w bring w⃗Tx⃗≈y
{\vec{w}}^T\vec{x}\approx ywTx≈y

We consider the sample point set as a matrix XXX, Then there are X=(x⃗(1)Tx⃗(2)T⋮x⃗(m)T)=(x1(1)x2(1)⋯xn(1)x1(2)x2(2)⋯xn(2)⋮⋮⋱⋮
x1(m)x2(m)⋯xn(m))X = \begin{pmatrix} {\vec{x}^{(1)}}^{T} \\ {\vec{x}^{(2)}}^{T}
\\ \vdots \\ {\vec{x}^{(m)}}^{T} \end{pmatrix} = \begin{pmatrix} x_1^{(1)}
& x_2^{(1)} & \cdots & x_n^{(1)}\\ x_1^{(2)} & x_2^{(2)} &
\cdots & x_n^{(2)}\\ \vdots & \vdots & \ddots & \vdots \\
x_1^{(m)} & x_2^{(m)} & \cdots & x_n^{(m)}\\ \end{pmatrix}X=⎝⎜⎜⎜⎜⎛​x
(1)Tx(2)T⋮x(m)T​⎠⎟⎟⎟⎟⎞​=⎝⎜⎜⎜⎜⎛​x1(1)​x1(2)​⋮x1(m)​​x2(1)​x2(2)​⋮x2(m)​​⋯⋯⋱⋯​xn(1
)​xn(2)​⋮xn(m)​​⎠⎟⎟⎟⎟⎞​

So the above statement is equivalent to finding w⃗\vec{w}w bring Xw⃗≈y⃗X\vec{w} \approx \vec{y}Xw≈y​.

Consider such a special case ： Suppose all the sample points are in a hyperplane , And the space formed by the sample points (Span SpaceSpan\ SpaceSpan Space) For the sake of nnn
Dimensional space , signify m≥nm \ge nm≥n And rank(X)=nrank(X) = nrank(X)=n
In this case, the equation Xw⃗=y⃗X\vec{w} = \vec{y}Xw=y​ There happens to be a unique solution ( This is the hyperplane ) , The derivation is as follows :Xw⃗=y⃗⇔XTXw⃗=XTy⃗⇔w⃗=(
XTX)−1XTy⃗X\vec{w} = \vec{y} \Leftrightarrow X^TX\vec{w}=X^T\vec{y}
\Leftrightarrow \vec{w} = (X^TX)^{-1}X^T\vec{y}Xw=y​⇔XTXw=XTy​⇔w=(XTX)−1XTy​

( notes ： because XXX Is the column full rank , therefore rank(XTX)=rank(X)=nrank(X^TX)= rank(X) = nrank(XTX)=rank(X)=n
, Namely XTXX^TXXTX It is a reversible square matrix )

But for the general situation , In general, all sample points will not be in the same hyperplane , So the equation Xw⃗=y⃗X\vec{w} = \vec{y}Xw=y​
There is no solution at this time , This system of equations is also called overdetermined equations (Overdetermined SystemOverdetermined\ SystemOverdetermined
System), That is, the number of equations exceeds the number of unknowns , At this point, we hope to find a hyperplane such that Xw⃗≈y⃗X\vec{w} \approx \vec{y}Xw≈y​ And error ∥Xw
⃗−y⃗∥\Vert X\vec{w} - \vec{y}\Vert∥Xw−y​∥ As small as possible （ Here is the symbol ∥ ∥\Vert\ \Vert∥ ∥ by L2L_2L2
​ norm , It is common sense to measure the error by measuring Euclidean distance ）. Formal expression is equivalent to w^⃗=arg⁡min⁡w⃗∥Xw⃗−y⃗∥\vec{\hat{{w}}} =
\arg\min_{\vec{w}}\Vert X\vec{w} - \vec{y}\Vertw^=argwmin​∥Xw−y​∥

In order to facilitate calculation , We may as well order w^⃗=arg⁡min⁡w⃗∥Xw⃗−y⃗∥=arg⁡min⁡w⃗∥Xw⃗−y⃗∥2\vec{\hat{{w}}} =
\arg\min_{\vec{w}}\Vert X\vec{w} - \vec{y}\Vert = {\arg\min_{\vec{w}}\Vert
X\vec{w} - \vec{y}\Vert}^2w^=argwmin​∥Xw−y​∥=argwmin​∥Xw−y​∥2

And order L(w1,w2,⋯ ,wn)=∥Xw⃗−y⃗∥2L(w_1,w_2,\cdots,w_n)={\Vert X\vec{w} -
\vec{y}\Vert}^2L(w1​,w2​,⋯,wn​)=∥Xw−y​∥2

It is still advisable to assume that this is the case XXX It's full rank
The above problems are transformed into extremum problems , It's natural to think of using derivatives to find extreme values .
So yes wiw_iwi​ Take the partial derivative and make it zero ∂L∂wi=2(xi(1),xi(2),⋯ ,xi(m))(Xw⃗−y⃗)=0
\frac{\partial{L}}{\partial{w_i}}=2(x_i^{(1)},x_i^{(2)},\cdots,x_i^{(m)})(X\vec{w}-\vec{y})=0
∂wi​∂L​=2(xi(1)​,xi(2)​,⋯,xi(m)​)(Xw−y​)=0

therefore (∂L∂w1,∂L∂w2,⋯ ,∂L∂wn)T=0⃗T⇔2XT(Xw⃗−y⃗)=0⃗⇔XTXw⃗−XTy⃗=0⃗
(\frac{\partial{L}}{\partial{w_1}}, \frac{\partial{L}}{\partial{w_2}},
\cdots,\frac{\partial{L}}{\partial{w_n}})^T=\vec{0}^T \Leftrightarrow
2X^T(X\vec{w}-\vec{y})=\vec{0} \Leftrightarrow X^TX\vec{w}-X^T\vec{y}=\vec{0}(∂w
1​∂L​,∂w2​∂L​,⋯,∂wn​∂L​)T=0T⇔2XT(Xw−y​)=0⇔XTXw−XTy​=0

Launch w⃗=(XTX)−1XTy⃗\vec{w}=(X^TX)^{-1}X^T \vec{y}w=(XTX)−1XTy​

The above is known as the basic idea of linear least square method
however , There are two questions
(1) Why do we find a minimum in this case ?
(2) Why is this minimum the minimum we need ?

Technology
Daily Recommendation
views 2