3.15. LSTM - Long Short Term Memory¶

International airline passengers: monthly totals in thousands. Jan 49 – Dec 60

After https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

[1]:

import conx as cx

Using TensorFlow backend.
ConX, version 3.6.0

For this experiment, we will use the monthly counts of international airline passengers on tickets between 1949 and 1961, in thousands:

[2]:

data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115,
        126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140, 145, 150,
        178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 171, 180, 193,
        181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 196, 236, 235,
        229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 235, 227, 234,
        264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 269, 270, 315,
        364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 318, 374, 413,
        405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 422, 465, 467,
        404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 491, 505, 404,
        359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407,
        362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390,
        432]

Plotting the data shows a regular, but varying, cyclic pattern:

[4]:

cx.plot(["", data],
        title="International airline passengers: monthly totals in thousands. Jan 49 – Dec 61",
        xlabel="year",
        ylabel="counts (thousands)",
        xs=cx.frange(1949, 1961, 1/12))

Let’s scale the counts into the range 0 - 1:

[5]:

def scale(data):
    """
    Scale data to between 0 and 1
    """
    minv = min(data)
    maxv = max(data)
    span = maxv - minv
    return [(v - minv)/span for v in data]

[6]:

scaled_data = scale(data)

[8]:

cx.plot(["Scaled Data", scaled_data])

For our dataset, we will contruct a history sequence. First, we need to put each scaled value into a list. This is the list of features. In our case, we just have the one feature:

[9]:

sequence = [[datum] for datum in scaled_data]

We wish that the inputs -> targets are constructed as follows:

[S0] -> S1
[S1] -> S2
…

where Sn is a list of features in the sequence.

We need to inform the network of the shape of the sequence. We need the:

time_steps - the length of the history
batch_size - how many vectors are the inputs composed of?
features - the length of each input bank vector

[10]:

time_steps = 1  # history
batch_size = 1  # how many to load at once
features = 1    # features (length of input vector)

[11]:

def create_dataset(sequence, time_steps):
    dataset = []
    for i in range(len(sequence)-time_steps-1):
        dataset.append([sequence[i:(i+time_steps)],
                       sequence[i + time_steps]])
    return dataset

[12]:

dataset = create_dataset(sequence, time_steps)

[13]:

print(dataset[0])
print(dataset[1])

[[[0.015444015444015444]], [0.02702702702702703]]
[[[0.02702702702702703]], [0.05405405405405406]]

Now we construct the network giving the batch_shape in terms of (look_back, banks, width):

[14]:

net = cx.Network("LSTM")
net.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net.add(cx.LSTMLayer("lstm", 4))
net.add(cx.Layer("output", 1))
net.connect()
net.compile(error="mse", optimizer="adam")

[15]:

net.dataset.load(dataset)

[19]:

dash = net.dashboard()
dash

[17]:

net.dataset.split(.33)

[22]:

dash.propagate([[1]])

[22]:

[0.127245232462883]

[23]:

outputs = [net.propagate(i) for i in net.dataset.inputs]
cx.plot([["Network", outputs], ["Training data", net.dataset.targets]])

[24]:

if net.saved():
    net.load()
    net.plot_results()
else:
    net.train(100, batch_size=batch_size, shuffle=False, save=True)

========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00200 |   0.97895 |   0.00900 |   0.72340
Saving network... Saved!

[20]:

outputs = [net.propagate(i) for i in net.dataset.inputs]
plot([["Network output", outputs], ["Training data", net.dataset.targets]])

NOTE: Even though the above plot of the Network output appears to closely track the Training data, don’t be fooled! As can be seen in the accuracy plot after training, the trained network has about 70% accuracy. Why does it look so good in the plot? Take a moment to consider how it can look so good, and yet be so bad.

3.15.1. LSTM with Window¶

[25]:

time_steps = 3

[26]:

dataset = create_dataset(sequence, time_steps)

[27]:

print(dataset[0])
print(dataset[1])

[[[0.015444015444015444], [0.02702702702702703], [0.05405405405405406]], [0.04826254826254826]]
[[[0.02702702702702703], [0.05405405405405406], [0.04826254826254826]], [0.032818532818532815]]

[28]:

net2 = cx.Network("LSTM with Window")
net2.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net2.add(cx.LSTMLayer("lstm", 4))
net2.add(cx.Layer("output", 1))
net2.connect()
net2.compile(error="mse", optimizer="adam")

[29]:

net2.dataset.load(dataset)
net2.dataset.split(.33)

[33]:

dash2 = net2.dashboard()
dash2

[35]:

dash2.propagate([[0.1], [0.2], [0.8]])

[35]:

[-0.07787566632032394]

[37]:

outputs = [net2.propagate(i) for i in net2.dataset.inputs]
cx.plot([["Network output", outputs], ["Training data", net2.dataset.targets]])

[38]:

if net2.saved():
    net2.load()
    net2.plot_results()
else:
    net2.train(100, batch_size=batch_size, shuffle=False, save=True)

========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00220 |   0.96774 |   0.00993 |   0.72340
Saving network... Saved!

[40]:

outputs = [net2.propagate(i) for i in net2.dataset.inputs]
cx.plot([["Network output", outputs], ["Training data", net2.dataset.targets]])

3.15.2. LSTM with State¶

[41]:

net3 = cx.Network("LSTM with Window and State")
net3.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net3.add(cx.LSTMLayer("lstm", 4, stateful=True))
net3.add(cx.Layer("output", 1))
net3.connect()
net3.compile(error="mse", optimizer="adam")

[42]:

net3.dataset.load(dataset)
net3.dataset.split(.33)

[44]:

dash3 = net3.dashboard()
dash

[45]:

if net3.saved():
    net3.load()
    net3.plot_results()
else:
    net3.train(100, batch_size=batch_size, shuffle=False, save=True)

========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00199 |   0.97849 |   0.01015 |   0.68085
Saving network... Saved!

[39]:

outputs = [net3.propagate(i) for i in net3.dataset.inputs]
plot([["Network output", outputs], ["Training data", net3.dataset.targets]])

3.15.3. LSTM - Stacked¶

[40]:

net4 = Network("LSTM with Window and State and Stacked")
net4.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net4.add(LSTMLayer("lstm-1", 4, stateful=True, return_sequences=True))
net4.add(LSTMLayer("lstm-2", 4, stateful=True))
net4.add(Layer("output", 1))
net4.connect()
net4.compile(error="mse", optimizer="adam")

[41]:

net4.dataset.load(dataset)
net4.dataset.split(.33)

[42]:

net4

[42]:

[43]:

net4.propagate([[0.1], [-0.2], [0.8]])

[43]:

[0.0064370655454695225]

[44]:

if net4.saved():
    net4.load()
    net4.plot_results()
else:
    net4.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)

[45]:

outputs = [net4.propagate(i) for i in net4.dataset.inputs]
plot([["Network output", outputs], ["Training data", net4.dataset.targets]])

[ ]: