Part 1: What the…RNN?

Dipesh Trikam
4 min readMay 11, 2021

If you use a smart phone and frequently search the internet, odds are you’ve used applications that leverage Recurrent Neural Networks or RNNs for short. RNNs are used in speech recognition services, language translations, stock predictions and image recognitions to describe the content of the pictures. I’m not going to be bore you with all the maths involved in RNN’s but rather use visualisations to help you understand what is going on.

Siri

If you were asked to decide which way the ball will go in the picture below? What would you choose? There are three arrows below, but if you’ve ever had a bad kicking day you’ll know the ball might not even go into the net.

There are different ways of predicting the motion of this ball with varying levels of accuracy, as humans we might look at the positioning of his foot or the motion of the leg before making a judgment.

When the ball is kicked we have a lot more confidence that a ball is going in a particular direction. As a goal keeper, the quicker you can react to this sequence of events the better the chance of saving that goal.

Why are the alphabets in a sequence? “The practice of having the letters in an established order makes sense: It’s easier to teach and to learn.”

As humans we have been learning sequences since we were children, for example the alphabet. There is no logic to it, it’s not arranged by vowels and consonants, similar sounds, or how often the letters are used. We usually use songs or other means to remember it. If we are asked to tell someone the three letters after ‘F’; most people start from a few letters before or even the beginning to ensure the correct sequence is followed.

Alphabet Order

Now being able to say the alphabet in reverse is much more difficult because it’s not something we have spent time learning. It’s definitely not impossible but it would require some training.

RNN’s are good at processing sequential data for predictions. How?

All these concepts you have just read about have one major theme in common, they are using sequential data. RNN’s use sequential memory. This is the same thing our brains use to remember the alphabet and figure out the pattern.

Looping equivalent to more sequential data

When creating NLP engines for example (Google search, voice assistants), we look to classify some text for the intent. The sequential nature of the text gets fed into the loop one word at a time. Each word gets encoded sequentially and the hidden state is fed to the next stage, which represents information from the previous states. This process is repeated till the last step, it is then passed to the feed forward layer, where we have enough information to classify the text using existing trained models and output the result.

What is a weight? It is important to note that we talk about the term state when referring to RNNs, these weights help process the input to an output.

The ‘state’ is short-term memory and is reset after the sequence completes.

As the RNN cell get’s information it:

  1. Processes them, changing the state with each input.
  2. Emits an output.
  3. After seeing the last output, the RNN learns the best weights for mapping the input into the correct output using back-propagation.

Keep an 👀 out for Part 2…coming soon.

Teaser: Now that we know at a high level more about how RNNs work intuitively, let’s look at one way you could get started.

--

--

Dipesh Trikam

My opinions and insights in layman’s terms. 🤖 Check out some of my other work here: https://dipesht.myportfolio.com/