Machine learning is basically divided into supervised learning , Unsupervised learning , Strengthen learning , Self encoder is a kind of unsupervised learning , But it is more conscious , Others don't supervise it, but it supervises itself , For input samples xxx
Training , After the results are obtained, the xxx Compare .
Through this feature , The self encoder can generate random data similar to the training data , For example, reconstruct the image .
<>AE The structure of
The data of unsupervised learning has no additional annotation information , Only data xxx itself .
Using data xxx It is used as a supervisory signal to guide the network training , It is hoped that the neural network can be mapped to learning fθ:x→xf_{\theta}:x\to xfθ:x→x. We put the network f
θf_{\theta}fθ It is divided into two parts , The previous subnetwork tries to learn the mapping relationship gθ1:x→zg_{\theta 1}:x\to zgθ1:x→z
, The following sub network learning mapping relationship hθ2:z→xh_{\theta 2}:z\to xhθ2:z→x
We put gθ1g_{\theta 1}gθ1 Think of it as a data encoding (Encode) process , The input of high dimension xxx Encoding as low dimensional hidden variables zzz, be called
Encoder network ( encoder ); hold hθ2h_{\theta 2}hθ2 As data decoding (Decode) The process of , Put the coded input zzz Decoded as dimension oriented xxx
, be called Decoder network ( decoder ).
The encoder and decoder complete the input data together xxx The process of encoding and decoding , So the whole network fθf_{\theta}fθ Self encoder (Auto-Encoder).
<> Operation flow
The best state is that the output of decoder can recover the original input perfectly or approximately , Namely x‾≈x\overline{x}\approx xx≈x, Therefore, the optimization objective is written as :
MinimizeL=dist(x,x‾)x‾=hθ2(gθ1(x))Minimize L = dist(x,\overline{x})\\
\overline{x}=h_{\theta 2}(g_{\theta 1}(x))MinimizeL=dist(x,x)x=hθ2(gθ1(x))
among dist(x,x‾)dist(x,\overline{x})dist(x,x) express xxx and x‾\overline{x}x Distance measure of .
<>AE The variant network of
In order to try to let the self encoder learn the true distribution of data , A series of self encoder variants are generated .
<> Denoising self encoder (Denoising Auto-Encoder)
Add random noise disturbance to input data , If input is given xxx Adding noise sampled from Gaussian distribution ε\varepsilonε:
x~=x+ε,ε−N(0,var)\widetilde{x}=x + \varepsilon,\varepsilon -N(0,var)x=x+ε,ε−N(0
,var)
Added noise , The network needs to x~\widetilde{x}x Learning the real hidden variables of data zzz, And restore the original input xxx, Optimization objective of the model :
θ∗=argmin⏟θdist(hθ2(gθ1(x~)),x)\theta^*=\underbrace{argmin}_\theta
dist(h_{\theta 2}(g_{\theta 1}(\widetilde{x})),x)θ∗=θargmindist(hθ2(gθ1(x)),
x)
<> sparse autoencoder (Dropout Auto-Encoder)
The ability of network expression is reduced by randomly disconnecting the network , Prevent over fitting .
Selectively activate the network area according to the input data , Limit the input data capacity of network memory , It does not limit the ability of the network to extract features from the data . This allows us to consider separately the characterization and regularization of the potential state of the network , In this way, we can choose the potential state representation according to the meaning of the given data context ( The coding dimension ), At the same time, regularization is applied by sparsity constraint .
<> Compressed self encoder (Compression Auto-Encoder)
People would expect very similar inputs , The code of learning will be very similar . We can train the model for this , So that the derivative activated by the hidden layer is small relative to the input . let me put it another way , For small input changes , We should still maintain a very similar coding state . This is similar to the noise reduction self encoder , Because the input small disturbance is considered noise in essence , And we hope that the model has strong robustness to noise .
Noise reduction self encoder makes reconstruction function ( decoder ) Resistance to small input disturbances , The compressed self encoder makes the feature extraction function ( encoder ) Resistance to input infinitesimal disturbance .
Explicitly encourage the model to learn a code , In this kind of coding , Similar inputs have similar codes . Basically, it forces the model to learn how to shrink the input neighborhood to a smaller output neighborhood . Note the slope of the reconstructed data ( Differential ) For the local neighborhood of input data, it is basically zero .
This can be achieved by constructing a loss term , This item penalizes a large number of derivations from the input training samples , In essence, it penalizes those instances where a small change in the input results in a huge change in the coding space .
<> Variational auto encoder (Variational Auto-Encoder)
The basic self encoder is essentially learning input xxx And hidden variables zzz Mapping relationship between , It is a discriminant model , Is it possible to adjust it to a production model .
Given the distribution of hidden variables P(z)P(z)P(z), If you can learn conditional probability distribution P(x∣z)P(x|z)P(x∣z), Then, the probability distribution of the joint P(x,z)=P(x∣
z)P(z)P(x,z) = P(x|z)P(z)P(x,z)=P(x∣z)P(z) Sampling , Generate different samples .
From the perspective of neural network ,VAE Relative to the self encoder model , It also has two subnetworks: encoder and decoder . The decoder accepts input xxx, Output as hidden variable zzz; The decoder is responsible for the hidden variables zzz
Decoded as reconstructed x‾\overline{x}x. The difference is that ,VAE Model versus hidden variable zzz The distribution of is explicitly constrained , Expected hidden variable zzz Prior distribution in accordance with the preset P(z)P(z)
P(z). therefore , On the design of loss function , In addition to the original reconstruction error term , Hidden variables are also added zzz Distributed constraints .
Technology
Daily Recommendation