# 3.11. LSTM - Long Short Term Memory¶

http://data.is/1bKs2mG

International airline passengers: monthly totals in thousands. Jan 49 – Dec 60

In [1]:

from conx import Network, Layer, LSTMLayer, plot, frange

Using Theano backend.
conx, version 3.5.5


For this experiment, we will use the monthly counts of international airline passengers on tickets between 1949 and 1961, in thousands:

In [2]:

data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115,
126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140, 145, 150,
178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 171, 180, 193,
181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 196, 236, 235,
229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 235, 227, 234,
264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 269, 270, 315,
364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 318, 374, 413,
405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 422, 465, 467,
404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 491, 505, 404,
359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407,
362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390,
432]


Plotting the data shows a regular, but varying, cyclic pattern:

In [3]:

plot(["", data],
title="International airline passengers: monthly totals in thousands. Jan 49 – Dec 61",
xlabel="year",
ylabel="counts (thousands)",
xs=frange(1949, 1961, 1/12))


Let’s scale the counts into the range 0 - 1:

In [4]:

def scale(data):
"""
Scale data to between 0 and 1
"""
minv = min(data)
maxv = max(data)
span = maxv - minv
return [(v - minv)/span for v in data]

In [5]:

scaled_data = scale(data)

In [6]:

plot(["Scaled Data", scaled_data])


For our dataset, we will contruct a history sequence. First, we need to put each scaled value into a list. This is the list of features. In our case, we just have the one feature:

In [7]:

sequence = [[datum] for datum in scaled_data]


We wish that the inputs -> targets are constructed as follows:

1. [S0] -> S1
2. [S1] -> S2

where Sn is a list of features in the sequence.

We need to inform the network of the shape of the sequence. We need the:

• time_steps - the length of the history
• batch_size - how many vectors are the inputs composed of?
• features - the length of each input bank vector
In [8]:

time_steps = 1  # history
batch_size = 1  # how many to load at once
features = 1    # features (length of input vector)

In [9]:

def create_dataset(sequence, time_steps):
dataset = []
for i in range(len(sequence)-time_steps-1):
dataset.append([sequence[i:(i+time_steps)],
sequence[i + time_steps]])
return dataset

In [10]:

dataset = create_dataset(sequence, time_steps)

In [11]:

print(dataset[0])
print(dataset[1])

[[[0.015444015444015444]], [0.02702702702702703]]
[[[0.02702702702702703]], [0.05405405405405406]]


Now we construct the network giving the batch_shape in terms of (look_back, banks, width):

In [12]:

net = Network("LSTM")
net.connect()

In [13]:

net.dataset.load(dataset)

In [14]:

net.dashboard()

In [15]:

net.dataset.split(.33)

In [16]:

net.propagate([[.02]])

Out[16]:

[-0.000337743986165151]

In [17]:

outputs = [net.propagate(i) for i in net.dataset.inputs]
plot([["Network", outputs], ["Training data", net.dataset.targets]])

In [18]:

if net.saved():
net.plot_loss_acc()
else:
net.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)

In [20]:

outputs = [net.propagate(i) for i in net.dataset.inputs]
plot([["Network output", outputs], ["Training data", net.dataset.targets]])


NOTE: Even though the above plot of the Network output appears to closely track the Training data, don’t be fooled! As can be seen in the accuracy plot after training, the trained network has about 70% accuracy. Why does it look so good in the plot? Take a moment to consider how it can look so good, and yet be so bad.

## 3.11.1. LSTM with Window¶

In [21]:

time_steps = 3

In [22]:

dataset = create_dataset(sequence, time_steps)

In [23]:

print(dataset[0])
print(dataset[1])

[[[0.015444015444015444], [0.02702702702702703], [0.05405405405405406]], [0.04826254826254826]]
[[[0.02702702702702703], [0.05405405405405406], [0.04826254826254826]], [0.032818532818532815]]

In [24]:

net2 = Network("LSTM with Window")
net2.connect()

In [25]:

net2.dataset.load(dataset)
net2.dataset.split(.33)

In [26]:

net2

Out[26]:

In [27]:

net2.propagate([[0.1], [0.2], [0.3]])

Out[27]:

[0.010807438753545284]

In [29]:

outputs = [net2.propagate(i) for i in net2.dataset.inputs]
plot([["Network output", outputs], ["Training data", net2.dataset.targets]])

In [30]:

if net2.saved():
net2.plot_loss_acc()
else:
net2.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)

In [31]:

net2.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)

========================================================================
|  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  200 |   0.00173 |   0.98925 |   0.00713 |   0.68085
Saving network... Saved!

In [33]:

outputs = [net2.propagate(i) for i in net2.dataset.inputs]
plot([["Network output", outputs], ["Training data", net2.dataset.targets]])


## 3.11.2. LSTM with State¶

In [34]:

net3 = Network("LSTM with Window and State")
net3.connect()

In [35]:

net3.dataset.load(dataset)
net3.dataset.split(.33)

In [36]:

net2

Out[36]:

In [37]:

if net3.saved():
net3.plot_loss_acc()
else:
net3.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)

In [39]:

outputs = [net3.propagate(i) for i in net3.dataset.inputs]
plot([["Network output", outputs], ["Training data", net3.dataset.targets]])


## 3.11.3. LSTM - Stacked¶

In [40]:

net4 = Network("LSTM with Window and State and Stacked")
net4.connect()

In [41]:

net4.dataset.load(dataset)
net4.dataset.split(.33)

In [42]:

net4

Out[42]:

In [43]:

net4.propagate([[0.1], [-0.2], [0.8]])

Out[43]:

[0.0064370655454695225]

In [44]:

if net4.saved():

In [45]:

outputs = [net4.propagate(i) for i in net4.dataset.inputs]