Skip to main content

Introduction to recurrent neural networks


Recurrent neural networks (RNN) is one of the most popular neural networks. If you hear about LSTM ( Long-short term memory), it is one of types of RNN. Specially, the RECURSIVE neural networks is a general of Recurrent neural networks. The different between them is shared weights. In Recursive neural network, shared weights are put in every node, however in recurrent neural networks, shared weights are put through sequences.



Problem:
Could you know what word will be filled in the sentence: “I like French … ” ?. Let represent the sentence in numberic of words, based on dictionary, we are facing with a sequences problem: predicting the next word given by previous words. RNN does not only dare with sequence problem, It also build a neural network that can remmember. It is exactly what the brain does regularily.
Normally, a feedforward neural networks only process information through layers and forget information in the previous layers. In RNN, the information can be remembered, updated, forgotten.

Model:
$$ s_{k} = s_{k-1}*w_{rec} + x * w_{x} $$


$s_{k}, s_{k-1} $: can be a unit or a layer

let see with another intuition of RNN:




 

Training
Autoregressive model:
In the autoregressive model, t-2 and t-1 previous input neural output will be trained to neural t.


Feedforward neural net:
To training with RNN, we have to unrolded it. We can think of RNN as a feedforward neural net with many hidden layers with shared weights. We also can think of this training algorithm in the time domain.


Regularization 
One of the most downside of RNN is vanshing/exploding problem. Beside using some common techniques like penalty and dropout, LSTM is a RNN networks that can avoid this problem. Will talk about it in the next section.

Example
The simplest RNN networks is the sum of one (1) number that appeared in the list of ones, zeros (1,0).

[0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1] à sum = 5

Forward 
$$ s_{k} = s_{k-1}*w_{rec} + x * w_{x} $$

Backward
$$ \frac{\partial{E}}{\partial{w_{rec}}} = \frac {\partial{(\sum_{k=1}^{n}{E_{k}}})} {\partial{s_{k}}} * \frac{\partial{s_{k}}}{\partial{w_{rec}}} = \sum_{k=1}^{n}{\frac{\partial{E_{k}}}{\partial{s_{k}}} } * s_{k-1} $$
$$ \frac{\partial{E}}{\partial{w_{x}}} = \frac {\partial{(\sum_{k=1}^{n}{E_{k}}})} {\partial{s_{k}}} * \frac{\partial{s_{k}}}{\partial{w_{x}}} = \sum_{k=1}^{n}{\frac{\partial{E_{k}}}{\partial{s_{k}}} } * x_{k} $$



Comments

Popular posts from this blog

NLG pipeline

NLG ¶ NLG stands for Natural Languate Generation. NLG is one field of AI aims to generate the understandable and appropriate texts from raw data. We should differientiate the concepts of NLG with NLP and NLU. NLP is natural languate processing. This is a field in AI working on text generally. NLP contains Speed recornigion, Speed synthesis, NLG and NLU. NLU and NLG is subsets of NLP. While NLG generate the text, NLU uses text as input and generate some pattern such as Sentiment Analysis, Summary. The pipeline of NLG NLG can be divided into 3 phases: Document planning, Microplanning and Realisation. The purpose of Document planning is to chose what to say and the purpos of Microplanning and Realisation is to find how to say. There are some components in each phase. In traditional NLG system, we have 5 components: Content Determination, Text Structure, Aggregation, Lexicalisation, Reffering expression, Realisation. Content Determination Content Determination is sets of enti...

Generative Adversarial Networks

Generative Adversarial Networks (GAN) is one a Neural Network architecture which simulates zero-sum game. There are 2 parts of this Neural Network. The first is called Generator and other is called Discriminator. Generator tries to mimic data and make the fake data likes the real data in distribution. Meanwhile, the Discriminator tries to maximize the difference between real data and fake data. It is reason we call zero-sum game. Two parts are coaction with each other. This structure makes the GAN to be a interesting Neural Network architecture and it has many application in both academic and industry. In modeling, the GAN is an approach of equilibrium Networks such as Boltzmann Machine did. It is an optimization problem with objectives of: minimize Generator and maximum Discriminator simultaneously.  $max_{D}min_{G} V(D,G)$  $max_{D}min_{G} V(D,G) = E_{x\sim p_{data}(x)}[log(D(x))] + E_{z\sim p_{z}(z)}[log(1 - D(G(z)))]$ $V(D,G)$ is optimization problem subject to G ...

The basic concepts of deep neural networks

We will explore 3 basic concepts of deep neural networks. The first part will explain how back-propagation algorithm works. The second talks about softmax function. And the last one is cost function. Inside Backpropagation algorithm The main idea of Backpropagation algorithm is aims to minimize the overall neural network error. To do that, we find out the partial derivative of error and weight vector. Take a look in a simple Neural Network: From Sigmoid function to Softmax function and Cross-entropy cost function In the classification problem with multinomial distribution, how do we know which label is choosen? Softmax function is used to do that. In practice, some people might also use one2All mechanism to detect label by traning many logistic regression models. However, it is not effective solution. Softmax function helps us to determine the label also we can calculate the cost from this easily by using Cross-entropy cost function. Sigmoid function Sigmoid function co...