Machine learning is basically divided into supervised learning , Unsupervised learning , Strengthen learning , Self encoder is a kind of unsupervised learning , But it is more conscious , Others don't supervise it, but it supervises itself , For input samples xxx
Training , After the results are obtained, the xxx Compare .

Through this feature , The self encoder can generate random data similar to the training data , For example, reconstruct the image .

<>AE The structure of

The data of unsupervised learning has no additional annotation information , Only data xxx itself .

Using data xxx It is used as a supervisory signal to guide the network training , It is hoped that the neural network can be mapped to learning fθ:x→xf_{\theta}:x\to xfθ​:x→x. We put the network f
θf_{\theta}fθ​ It is divided into two parts , The previous subnetwork tries to learn the mapping relationship gθ1:x→zg_{\theta 1}:x\to zgθ1​:x→z
, The following sub network learning mapping relationship hθ2:z→xh_{\theta 2}:z\to xhθ2​:z→x

We put gθ1g_{\theta 1}gθ1​ Think of it as a data encoding (Encode) process , The input of high dimension xxx Encoding as low dimensional hidden variables zzz, be called
Encoder network ( encoder ); hold hθ2h_{\theta 2}hθ2​ As data decoding (Decode) The process of , Put the coded input zzz Decoded as dimension oriented xxx
, be called Decoder network ( decoder ).

The encoder and decoder complete the input data together xxx The process of encoding and decoding , So the whole network fθf_{\theta}fθ​ Self encoder (Auto-Encoder).

<> Operation flow

The best state is that the output of decoder can recover the original input perfectly or approximately , Namely x‾≈x\overline{x}\approx xx≈x, Therefore, the optimization objective is written as :
MinimizeL=dist(x,x‾)x‾=hθ2(gθ1(x))Minimize L = dist(x,\overline{x})\\
\overline{x}=h_{\theta 2}(g_{\theta 1}(x))MinimizeL=dist(x,x)x=hθ2​(gθ1​(x))

among dist(x,x‾)dist(x,\overline{x})dist(x,x) express xxx and x‾\overline{x}x Distance measure of .

<>AE The variant network of

In order to try to let the self encoder learn the true distribution of data , A series of self encoder variants are generated .

<> Denoising self encoder (Denoising Auto-Encoder)

Add random noise disturbance to input data , If input is given xxx Adding noise sampled from Gaussian distribution ε\varepsilonε:
x~=x+ε,ε−N(0,var)\widetilde{x}=x + \varepsilon,\varepsilon -N(0,var)x=x+ε,ε−N(0

Added noise , The network needs to x~\widetilde{x}x Learning the real hidden variables of data zzz, And restore the original input xxx, Optimization objective of the model :
dist(h_{\theta 2}(g_{\theta 1}(\widetilde{x})),x)θ∗=θargmin​​dist(hθ2​(gθ1​(x)),

<> sparse autoencoder (Dropout Auto-Encoder)

The ability of network expression is reduced by randomly disconnecting the network , Prevent over fitting .

Selectively activate the network area according to the input data , Limit the input data capacity of network memory , It does not limit the ability of the network to extract features from the data . This allows us to consider separately the characterization and regularization of the potential state of the network , In this way, we can choose the potential state representation according to the meaning of the given data context ( The coding dimension ), At the same time, regularization is applied by sparsity constraint .

<> Compressed self encoder (Compression Auto-Encoder)

People would expect very similar inputs , The code of learning will be very similar . We can train the model for this , So that the derivative activated by the hidden layer is small relative to the input . let me put it another way , For small input changes , We should still maintain a very similar coding state . This is similar to the noise reduction self encoder , Because the input small disturbance is considered noise in essence , And we hope that the model has strong robustness to noise .

Noise reduction self encoder makes reconstruction function ( decoder ) Resistance to small input disturbances , The compressed self encoder makes the feature extraction function ( encoder ) Resistance to input infinitesimal disturbance .

Explicitly encourage the model to learn a code , In this kind of coding , Similar inputs have similar codes . Basically, it forces the model to learn how to shrink the input neighborhood to a smaller output neighborhood . Note the slope of the reconstructed data ( Differential ) For the local neighborhood of input data, it is basically zero .

This can be achieved by constructing a loss term , This item penalizes a large number of derivations from the input training samples , In essence, it penalizes those instances where a small change in the input results in a huge change in the coding space .

<> Variational auto encoder (Variational Auto-Encoder)

The basic self encoder is essentially learning input xxx And hidden variables zzz Mapping relationship between , It is a discriminant model , Is it possible to adjust it to a production model .

Given the distribution of hidden variables P(z)P(z)P(z), If you can learn conditional probability distribution P(x∣z)P(x|z)P(x∣z), Then, the probability distribution of the joint P(x,z)=P(x∣
z)P(z)P(x,z) = P(x|z)P(z)P(x,z)=P(x∣z)P(z) Sampling , Generate different samples .

From the perspective of neural network ,VAE Relative to the self encoder model , It also has two subnetworks: encoder and decoder . The decoder accepts input xxx, Output as hidden variable zzz; The decoder is responsible for the hidden variables zzz
Decoded as reconstructed x‾\overline{x}x. The difference is that ,VAE Model versus hidden variable zzz The distribution of is explicitly constrained , Expected hidden variable zzz Prior distribution in accordance with the preset P(z)P(z)
P(z). therefore , On the design of loss function , In addition to the original reconstruction error term , Hidden variables are also added zzz Distributed constraints .

©2019-2020 Toolsou All rights reserved,
Digital rolling lottery program Keras Save and load model (JSON+HDF5) Remember once EventBus Project issues caused by memory leaks I've been drinking soft water for three years ? What is the use of soft water and water softener msf Generate Trojan horse attack android mobile phone Time conversion front desk will 2020-07-17T03:07:02.000+0000 Into 2020-07-17 11:07:02 Chuan Shen 1190 Reverses the substring between each pair of parentheses leetcodehive Summary of processing methods for a large number of small files SparkSQL Achieve partition overlay write Image format conversion