In feedforward neural network , Information transmission is one-way , It can be regarded as a complex function , The output of the network only depends on the current input . Unable to process timing data .

However, cyclic neural network is a kind of neural network with short-term memory ability . In cyclic neural network , Neurons can not only receive information from other neurons , Can also accept their own information , Form a network structure with loop . The parameters of recurrent neural network can be learned by back propagation algorithm over time . Back propagation algorithm with time can transfer error information step by step according to the reverse order of time . But when the input sequence is long , There may be gradient explosion and gradient disappearance . Gradient explosion can be cut by gradient method , But when the gradient disappears, long-term and short-term neural networks need to be introduced ( gating mechanism ).

There are three ways to add memory to the network

* Delayed neural network , That is to say, in the non output layer of feedforward network, time delay is added , Record the output of neurons in recent times . In this way, the delayed neural network shares the weight in time dimension , Can reduce the number of parameters .
* A nonlinear autoregressive model with external input , At every moment t All have an external input to produce an output , And record the latest external input and output through the delayer .
Cyclic neural networks use neurons with self feedback , Able to process any length of time series data . Given an input sequence , Cyclic neural network updates the active value of hidden layer with feedback edge by inputting the active value of the previous time and the input sequence of this time .

The state of hidden layer in recurrent neural network is not only related to the current time input sequence , It's also related to the hidden layer state at the last moment . And use sigimod Activate function activate . If we regard the state of every moment as a layer of feedforward neural network , Cyclic neural network can be regarded as a neural network with weight sharing in time dimension . That is to say, three sets of weight sharing in time dimension of recurrent neural network .

Because of its short-term memory ability , Equivalent to storage device , So its computing power is very strong . Feedforward neural network can simulate any continuous function , The cyclic neural network can simulate any program .

According to the general approximation theorem , Two layer feedforward neural network can approximate any continuous function on any bounded closed set . therefore , Two functions of dynamic system can be approximated by two-layer feedforward neural network .

All Turing machines Can be used by a Sigmoid The simulation is based on a fully connected cyclic network composed of neurons of type A activation function . A fully connected recurrent neural network can approximately solve all computable problems .

Recurrent neural network is divided into : Sequence to category mode , Synchronous sequence to sequence mode , Asynchronous Sequence to sequence mode .

Parameters of recurrent neural network can be learned by gradient descent method . There are two main ways to calculate the gradient : along with Time back propagation (BPTT) Algorithm and real-time circular learning (RTRL) algorithm .

The main idea of back propagation algorithm over time is to calculate the gradient through the error back propagation algorithm similar to feedforward neural network . It is to treat the recurrent neural network as an expanded multilayer feedforward network , among “ Each floor ” Corresponding to the “ Every moment ”. such , The gradient can be calculated by the back propagation algorithm . And because the parameters are shared , All are the sum of the gradient parameters of each layer .

Real time circular learning (Real-Time Recurrent Learning,RTRL) The gradient is calculated by forward propagation

RTRL Algorithm and BPTT All algorithms are based on gradient descent , Before passing Applying the chain rule to calculate the gradient in the direction and direction patterns . In cyclic neural network , General network output
Dimension is much lower than input dimension , therefore BPTT The calculation of the algorithm will be less , however BPTT Algorithm needs to protect Save the middle gradient of all times , High space complexity .RTRL The algorithm does not need gradient echo , So not
Often suitable for tasks requiring online learning or infinite sequence .

©2019-2020 Toolsou All rights reserved,
Message quality platform series | Full link troubleshooting What is? FPGA? New York Youth Project “ Recapture Wall Street ”: Safeguarding the interests of retail investors Redis Queue implementation java Second kill system , No script , It can be used in production Don't annoy the panda with any cat !「 Kung Fu Panda 」20 It's the year of man 4 blood cartoon | CPU Warfare 40 year , The real king finally appeared !JS How to operate 20 Review strategy for senior students with high marks !!! Big data tells you , How tired are Chinese women MySQL An interview is a must !