Recurrent neural
networks (RNN) is one of the most popular neural networks. If you hear about
LSTM ( Long-short term memory), it is one of types of RNN. Specially, the
RECURSIVE neural networks is a general of Recurrent neural networks. The
different between them is shared weights. In Recursive neural network, shared
weights are put in every node, however in recurrent neural networks, shared
weights are put through sequences.
Problem:
Could you know what
word will be filled in the sentence: “I like French … ” ?. Let represent the
sentence in numberic of words, based on dictionary, we are facing with a
sequences problem: predicting the next word given by previous words. RNN does
not only dare with sequence problem, It also build a neural network that can
remmember. It is exactly what the brain does regularily.
Normally, a
feedforward neural networks only process information through layers and forget
information in the previous layers. In RNN, the information can be remembered,
updated, forgotten.
Model:
$$ s_{k} = s_{k-1}*w_{rec} + x * w_{x} $$
$s_{k}, s_{k-1} $: can be a unit or a layer
let see with another
intuition of RNN:
Training
Autoregressive model:
Feedforward neural
net:
To training with
RNN, we have to unrolded it. We can think of RNN as a feedforward neural net
with many hidden layers with shared weights. We also can think of this training
algorithm in the time domain.
Regularization
One of the most
downside of RNN is vanshing/exploding problem. Beside using some common
techniques like penalty and dropout, LSTM is a RNN networks that can avoid this
problem. Will talk about it in the next section.
Example
The simplest RNN
networks is the sum of one (1) number that appeared in the list of ones, zeros
(1,0).
[0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1] à sum = 5
Forward
$$ s_{k} = s_{k-1}*w_{rec} + x * w_{x} $$
Backward
$$ \frac{\partial{E}}{\partial{w_{rec}}} = \frac {\partial{(\sum_{k=1}^{n}{E_{k}}})} {\partial{s_{k}}} * \frac{\partial{s_{k}}}{\partial{w_{rec}}} = \sum_{k=1}^{n}{\frac{\partial{E_{k}}}{\partial{s_{k}}} } * s_{k-1} $$
$$ \frac{\partial{E}}{\partial{w_{x}}} = \frac {\partial{(\sum_{k=1}^{n}{E_{k}}})} {\partial{s_{k}}} * \frac{\partial{s_{k}}}{\partial{w_{x}}} = \sum_{k=1}^{n}{\frac{\partial{E_{k}}}{\partial{s_{k}}} } * x_{k} $$
Comments