3.14. LSTM - Long Short Term Memory

http://data.is/1bKs2mG

International airline passengers: monthly totals in thousands. Jan 49 – Dec 60

After https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

In [1]:
import conx as cx
Using TensorFlow backend.
Conx, version 3.6.0

For this experiment, we will use the monthly counts of international airline passengers on tickets between 1949 and 1961, in thousands:

In [2]:
data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115,
        126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140, 145, 150,
        178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 171, 180, 193,
        181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 196, 236, 235,
        229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 235, 227, 234,
        264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 269, 270, 315,
        364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 318, 374, 413,
        405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 422, 465, 467,
        404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 491, 505, 404,
        359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407,
        362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390,
        432]

Plotting the data shows a regular, but varying, cyclic pattern:

In [4]:
cx.plot(["", data],
        title="International airline passengers: monthly totals in thousands. Jan 49 – Dec 61",
        xlabel="year",
        ylabel="counts (thousands)",
        xs=cx.frange(1949, 1961, 1/12))
_images/LSTM_5_0.png

Let’s scale the counts into the range 0 - 1:

In [5]:
def scale(data):
    """
    Scale data to between 0 and 1
    """
    minv = min(data)
    maxv = max(data)
    span = maxv - minv
    return [(v - minv)/span for v in data]
In [6]:
scaled_data = scale(data)
In [8]:
cx.plot(["Scaled Data", scaled_data])
_images/LSTM_9_0.png

For our dataset, we will contruct a history sequence. First, we need to put each scaled value into a list. This is the list of features. In our case, we just have the one feature:

In [9]:
sequence = [[datum] for datum in scaled_data]

We wish that the inputs -> targets are constructed as follows:

  1. [S0] -> S1
  2. [S1] -> S2

where Sn is a list of features in the sequence.

We need to inform the network of the shape of the sequence. We need the:

  • time_steps - the length of the history
  • batch_size - how many vectors are the inputs composed of?
  • features - the length of each input bank vector
In [10]:
time_steps = 1  # history
batch_size = 1  # how many to load at once
features = 1    # features (length of input vector)
In [11]:
def create_dataset(sequence, time_steps):
    dataset = []
    for i in range(len(sequence)-time_steps-1):
        dataset.append([sequence[i:(i+time_steps)],
                       sequence[i + time_steps]])
    return dataset
In [12]:
dataset = create_dataset(sequence, time_steps)
In [13]:
print(dataset[0])
print(dataset[1])
[[[0.015444015444015444]], [0.02702702702702703]]
[[[0.02702702702702703]], [0.05405405405405406]]

Now we construct the network giving the batch_shape in terms of (look_back, banks, width):

In [14]:
net = cx.Network("LSTM")
net.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net.add(cx.LSTMLayer("lstm", 4))
net.add(cx.Layer("output", 1))
net.connect()
net.compile(error="mse", optimizer="adam")
In [15]:
net.dataset.load(dataset)
In [19]:
dash = net.dashboard()
dash
In [17]:
net.dataset.split(.33)
In [22]:
dash.propagate([[1]])
Out[22]:
[0.127245232462883]
In [23]:
outputs = [net.propagate(i) for i in net.dataset.inputs]
cx.plot([["Network", outputs], ["Training data", net.dataset.targets]])
_images/LSTM_24_0.png
In [24]:
if net.saved():
    net.load()
    net.plot_results()
else:
    net.train(100, batch_size=batch_size, shuffle=False, save=True)
_images/LSTM_25_0.svg
========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00200 |   0.97895 |   0.00900 |   0.72340
Saving network... Saved!
In [20]:
outputs = [net.propagate(i) for i in net.dataset.inputs]
plot([["Network output", outputs], ["Training data", net.dataset.targets]])
_images/LSTM_26_0.png

NOTE: Even though the above plot of the Network output appears to closely track the Training data, don’t be fooled! As can be seen in the accuracy plot after training, the trained network has about 70% accuracy. Why does it look so good in the plot? Take a moment to consider how it can look so good, and yet be so bad.

3.14.1. LSTM with Window

In [25]:
time_steps = 3
In [26]:
dataset = create_dataset(sequence, time_steps)
In [27]:
print(dataset[0])
print(dataset[1])
[[[0.015444015444015444], [0.02702702702702703], [0.05405405405405406]], [0.04826254826254826]]
[[[0.02702702702702703], [0.05405405405405406], [0.04826254826254826]], [0.032818532818532815]]
In [28]:
net2 = cx.Network("LSTM with Window")
net2.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net2.add(cx.LSTMLayer("lstm", 4))
net2.add(cx.Layer("output", 1))
net2.connect()
net2.compile(error="mse", optimizer="adam")
In [29]:
net2.dataset.load(dataset)
net2.dataset.split(.33)
In [33]:
dash2 = net2.dashboard()
dash2
In [35]:
dash2.propagate([[0.1], [0.2], [0.8]])
Out[35]:
[-0.07787566632032394]
In [37]:
outputs = [net2.propagate(i) for i in net2.dataset.inputs]
cx.plot([["Network output", outputs], ["Training data", net2.dataset.targets]])
_images/LSTM_36_0.png
In [38]:
if net2.saved():
    net2.load()
    net2.plot_results()
else:
    net2.train(100, batch_size=batch_size, shuffle=False, save=True)
_images/LSTM_37_0.svg
========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00220 |   0.96774 |   0.00993 |   0.72340
Saving network... Saved!
In [40]:
outputs = [net2.propagate(i) for i in net2.dataset.inputs]
cx.plot([["Network output", outputs], ["Training data", net2.dataset.targets]])
_images/LSTM_38_0.png

3.14.2. LSTM with State

In [41]:
net3 = cx.Network("LSTM with Window and State")
net3.add(cx.Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net3.add(cx.LSTMLayer("lstm", 4, stateful=True))
net3.add(cx.Layer("output", 1))
net3.connect()
net3.compile(error="mse", optimizer="adam")
In [42]:
net3.dataset.load(dataset)
net3.dataset.split(.33)
In [44]:
dash3 = net3.dashboard()
dash
In [45]:
if net3.saved():
    net3.load()
    net3.plot_results()
else:
    net3.train(100, batch_size=batch_size, shuffle=False, save=True)
_images/LSTM_43_0.svg
========================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00199 |   0.97849 |   0.01015 |   0.68085
Saving network... Saved!
In [39]:
outputs = [net3.propagate(i) for i in net3.dataset.inputs]
plot([["Network output", outputs], ["Training data", net3.dataset.targets]])
_images/LSTM_44_0.png

3.14.3. LSTM - Stacked

In [40]:
net4 = Network("LSTM with Window and State and Stacked")
net4.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net4.add(LSTMLayer("lstm-1", 4, stateful=True, return_sequences=True))
net4.add(LSTMLayer("lstm-2", 4, stateful=True))
net4.add(Layer("output", 1))
net4.connect()
net4.compile(error="mse", optimizer="adam")
In [41]:
net4.dataset.load(dataset)
net4.dataset.split(.33)
In [42]:
net4
Out[42]:
LSTM with Window and State and StackedLayer: output (output) shape = (1,) Keras class = DenseoutputWeights from lstm-2 to output output/kernel has shape (4, 1) output/bias has shape (1,)Layer: lstm-2 (hidden) Keras class = LSTM stateful = Truelstm-2Weights from lstm-1 to lstm-2 lstm-2/kernel has shape (4, 16) lstm-2/recurrent_kernel has shape (4, 16) lstm-2/bias has shape (16,)Layer: lstm-1 (hidden) Keras class = LSTM stateful = True return_sequences = Truelstm-1Weights from input to lstm-1 lstm-1/kernel has shape (1, 16) lstm-1/recurrent_kernel has shape (4, 16) lstm-1/bias has shape (16,)Layer: input (input) shape = (1,) Keras class = Input batch_shape = (1, 3, 1)input
In [43]:
net4.propagate([[0.1], [-0.2], [0.8]])
Out[43]:
[0.0064370655454695225]
In [44]:
if net4.saved():
    net4.load()
    net4.plot_results()
else:
    net4.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)
_images/LSTM_50_0.png
In [45]:
outputs = [net4.propagate(i) for i in net4.dataset.inputs]
plot([["Network output", outputs], ["Training data", net4.dataset.targets]])
_images/LSTM_51_0.png