Fractal Analytics Blog

Do traditional machine learning algorithms have memory?

Do traditional machine learning algorithms have memory?

Abhishek Priyam
By Abhishek Priyam
April 28, 2015

Machine learning algorithms are getting better and better at understanding and predicting data. They learn from historical data and predict the future. But are they actually learning from their history? Do they even have a memory to know what the history is?

At a high level, machine learning algorithms learn that there was a certain situation or state in history, more formally defined by some independent variables {x1, x2, x3….}. Owing to this situation there was an outcome to it, defined by the dependent variable y. The algorithm learns from these historic x, y pairs and tries to predict y for an unknown state in future. To some extent it feels like the algorithm has a sense of history because it is using past data to predict. But do they actually have a memory of data points they processed in the previous iteration?

Searching for memory in conventional Machine Learning techniques

To investigate whether most of the machine learning models have a memory of historic events or not, let us take an abstract problem of predicting whether a person is happy or sad in a situation.

We have following x : y data points –

{ breakfast : happy, drive : neutral, work : neutral, lunch : happy, fight : sad, dinner : sad, sleep : neutral, breakfast : happy, fight : sad, lunch : ? }

The data set implies that the person was happy during breakfast, neutral while driving, neutral in work and so on. Given the dataset we have to predict the mood of the person associated with “lunch”. We can use some arbitrary machine learning algorithm and chances are, it will predict mood as ‘happy’ for lunch. This is because of the fact that almost all food events including lunch are associated with ‘happy’ state historically. Let us take a different, slightly modified dataset –

{ breakfast : happy, work : neutral, lunch : happy, fight : sad, dinner : sad, drive : neutral, fight : sad, sleep : neutral, breakfast : happy, lunch : ? }

The result will not change and the algorithm will predict the mood as “happy” again, because we have the same data points, only reshuffled and most of the machine learning algorithms do not care about the sequence in which it gets data. More formally, some algorithms have an underlying assumption that the data is not dependent on the previous data points. This is because the algorithm does not have a memory of what the previous data point was, while processing the new one.

A thought experiment

Let’s look at the data one more time but in a different form –


Considering this a sequence of events you might think that previous events can affect the mood during later stages and can conclude that after any fight the mood is ‘sad’ no matter what the event is, unless you sleep. But our machine learning algorithm was unable to identify this pattern and we did. Why so?

Because we have memory, we were able to link the current event with the events just before it. This particular situation often arises when we are trying to process time series data. There are various ways to solve this problem but in most of the algorithms we have to remind the algorithm what the previous data point was. x(i-1), x(i-2), x(i-3)…. can be fed along with x(i) to remind the algorithm of the previous events. But it becomes impractical to give more than 2-3 historic data points for any variable even more so when there are multiple independent variables.

Linear and logistic regressions, feed forward neural networks, Gradient boosting models among many others suffer this form of amnesia about historic data points and are unable to understand the sequence of events and lose a big chunk of information in some cases. Time series forecasting techniques tackle this issue quite well but they are not able to work easily with more than 1-2 independent variable.

Machine Learning techniques with “memory”

There are some implementations of neural network called “Recurrent neural networks” which works on the principle of “Backpropagation through time (BPTT)” and have the ability to “look back” in time and can help in situations where sequence in data is important.

There are plenty of resources available on internet on Recurrent Neural Networks and BPTT. You can take a look at free online course on “Intelligent system and control” co-ordinated by IIT Kanpur on NPTEL (National program on Technology enhanced learning) to learn more on Recurrent Neural networks


1. Wikipedia – Recurrent Neural Network

2. A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach

3. Intelligent Systems and Control  – IIT Kanpur on NPTEL


About the author:

Based out of Gurgaon, Abhishek Priyam works at Fractal Analytics. He works with Telecom players, helping business leaders adopt analytics in their strategic decision making. He can be contacted on LinkedInTwitter and his personal website.

Post Comment
Category: Advanced Analytics

Leave a Reply

Your email address will not be published. Required fields are marked *


Institutionalize Forecasting Within an Organization

Download Paper


  • collapse2017 (6)
  • expand2016 (7)
  • expand2015 (43)
  • expand2014 (15)
  • expand2013 (47)
  • expand2012 (15)