3.13. LSTM - Long Short Term Memory

http://data.is/1bKs2mG

International airline passengers: monthly totals in thousands. Jan 49 – Dec 60

After https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

In [1]:
from conx import Network, Layer, LSTMLayer, plot, frange
Using Theano backend.
conx, version 3.5.4

For this experiment, we will use the monthly money spent by international airline passengers on tickets between 1949 and 1961:

In [2]:
data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115,
        126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140, 145, 150,
        178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 171, 180, 193,
        181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 196, 236, 235,
        229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 235, 227, 234,
        264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 269, 270, 315,
        364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 318, 374, 413,
        405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 422, 465, 467,
        404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 491, 505, 404,
        359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407,
        362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390,
        432]

Plotting the data shows a regular, but varying, cyclic pattern:

In [3]:
plot(["$", data],
     title="International airline passengers: monthly totals in thousands. Jan 49 – Dec 61",
     xlabel="year",
     ylabel="dollars (thousands)",
    xs=[x for x in frange(1949, 1961, 1/12)])
_images/LSTM_5_0.png

Let’s scale the dollar amounts into the range 0 - 1:

In [4]:
def scale(data):
    """
    Scale data to between 0 and 1
    """
    minv = min(data)
    maxv = max(data)
    span = maxv - minv
    return [(v - minv)/span for v in data]
In [5]:
scaled_data = scale(data)
In [6]:
plot(["Scaled Data", scaled_data])
_images/LSTM_9_0.png

For our dataset, we will contruct a history sequence. First, we need to put each scaled dollar amount into a list. This is the list of features. In our case, we just have the one feature:

In [7]:
sequence = [[datum] for datum in scaled_data]

We wish that the inputs -> targets are constructed as follows:

  1. [S0] -> S1
  2. [S1] -> S2

where Sn is a list of features in the sequence.

We need to inform the network of the shape of the sequence. We need the:

  • time_steps - the length of the history
  • batch_size - how many vectors are the inputs composed of?
  • features - the length of each input bank vector
In [8]:
time_steps = 1  # history
batch_size = 1  # how many to load at once
features = 1    # features (length of input vector)
In [9]:
def create_dataset(sequence, time_steps):
    dataset = []
    for i in range(len(sequence)-time_steps-1):
        dataset.append([sequence[i:(i+time_steps)],
                       sequence[i + time_steps]])
    return dataset
In [10]:
dataset = create_dataset(sequence, time_steps)
In [11]:
print(dataset[0])
print(dataset[1])
[[[0.015444015444015444]], [0.02702702702702703]]
[[[0.02702702702702703]], [0.05405405405405406]]

Now we construct the network giving the batch_shape in terms of (look_back, banks, width):

In [12]:
net = Network("LSTM")
net.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net.add(LSTMLayer("lstm", 4))
net.add(Layer("output", 1))
net.connect()
net.compile(error="mse", optimizer="adam")
In [13]:
net.dataset.load(dataset)
In [14]:
net.dashboard()
In [15]:
net.dataset.split(.33)
In [16]:
net.propagate([[.02]])
Out[16]:
[-0.0006034541293047369]
In [17]:
outputs = [net.propagate(i, visualize=False) for i in net.dataset.inputs]
plot([["Network", outputs], ["Training data", net.dataset.targets]])
_images/LSTM_24_0.png
In [19]:
if net.saved():
    net.load()
    net.plot_loss_acc()
else:
    net.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)
_images/LSTM_25_0.svg
========================================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00203 |   0.97895 |   0.00815 |   0.72340
Saving network... Saved!
In [20]:
outputs = [net.propagate(i) for i in net.dataset.inputs]
plot([["Network", outputs], ["Training data", net.dataset.targets]])
_images/LSTM_26_0.png

3.14. LSTM with Window

In [21]:
time_steps = 3
In [22]:
dataset = create_dataset(sequence, time_steps)
In [23]:
print(dataset[0])
print(dataset[1])
[[[0.015444015444015444], [0.02702702702702703], [0.05405405405405406]], [0.04826254826254826]]
[[[0.02702702702702703], [0.05405405405405406], [0.04826254826254826]], [0.032818532818532815]]
In [37]:
net2 = Network("LSTM with Window")
net2.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net2.add(LSTMLayer("lstm", 4))
net2.add(Layer("output", 1))
net2.connect()
net2.compile(error="mse", optimizer="adam")
In [38]:
net2.dataset.load(dataset)
net2.dataset.split(.33)
In [39]:
net2
Out[39]:
LSTM with WindowLayer: output (output) shape = (1,) Keras class = DenseoutputWeights from lstm to output output/kernel has shape (4, 1) output/bias has shape (1,)Layer: lstm (hidden) Keras class = LSTMlstmWeights from input to lstm lstm/kernel has shape (1, 16) lstm/recurrent_kernel has shape (4, 16) lstm/bias has shape (16,)Layer: input (input) shape = (1,) Keras class = Input batch_shape = (1, 3, 1)input
In [40]:
net2.propagate([[0.1], [0.2], [0.3]])
Out[40]:
[0.020247535780072212]
In [41]:
outputs = [net2.propagate(i, visualize=False) for i in net2.dataset.inputs]
plot([["Network", outputs], ["Training data", net2.dataset.targets]])
_images/LSTM_35_0.png
In [44]:
if net2.saved():
    net2.load()
    net2.plot_loss_acc()
else:
    net2.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)
_images/LSTM_36_0.svg
========================================================================
       |  Training |  Training |  Validate |  Validate
Epochs |     Error |  Accuracy |     Error |  Accuracy
------ | --------- | --------- | --------- | ---------
#  100 |   0.00264 |   0.93548 |   0.01089 |   0.63830
Saving network... Saved!
In [31]:
outputs = [net2.propagate(i, visualize=False) for i in net2.dataset.inputs]
plot([["Network", outputs], ["Training data", net2.dataset.targets]])
_images/LSTM_37_0.png

3.14.1. LSTM with State

In [32]:
net3 = Network("LSTM with Window and State")
net3.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net3.add(LSTMLayer("lstm", 4, stateful=True))
net3.add(Layer("output", 1))
net3.connect()
net3.compile(error="mse", optimizer="adam")
In [62]:
net3.dataset.load(dataset)
net3.dataset.split(.33)
In [63]:
net2
Out[63]:
LSTM with WindowLayer: output (output) shape = (1,) Keras class = DenseoutputWeights from lstm to output output/kernel has shape (4, 1) output/bias has shape (1,)Layer: lstm (hidden) Keras class = LSTMlstmWeights from input to lstm lstm/kernel has shape (1, 16) lstm/recurrent_kernel has shape (4, 16) lstm/bias has shape (16,)Layer: input (input) shape = (1,) Keras class = Input batch_shape = (1, 3, 1)input
In [64]:
if net3.saved():
    net3.load()
    net3.plot_loss_acc()
else:
    net3.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)
_images/LSTM_42_0.svg
========================================================================
       |  Training |  Training
Epochs |     Error |  Accuracy
------ | --------- | ---------
#  100 |   0.00376 |   0.90714
Saving network... Saved!
In [84]:
outputs = [net3.propagate(i, visualize=False) for i in net3.dataset.inputs]
plot([["Network", outputs], ["Training data", net3.dataset.targets]])
_images/LSTM_43_0.png

3.14.2. LSTM - Stacked

In [75]:
net4 = Network("LSTM with Window and State and Stacked")
net4.add(Layer("input", features, batch_shape=(batch_size, time_steps, features)))
net4.add(LSTMLayer("lstm-1", 4, stateful=True, return_sequences=True))
net4.add(LSTMLayer("lstm-2", 4, stateful=True))
net4.add(Layer("output", 1))
net4.connect()
net4.compile(error="mse", optimizer="adam")
In [76]:
net4.dataset.load(dataset)
net4.dataset.split(.33)
In [77]:
net4
Out[77]:
LSTM with Window and State and StackedLayer: output (output) shape = (1,) Keras class = DenseoutputWeights from lstm-2 to output lstm-2/output/kernel has shape (4, 1) lstm-2/output/bias has shape (1,)Layer: lstm-2 (hidden) Keras class = LSTM stateful = Truelstm-2Weights from lstm-1 to lstm-2 lstm-2/lstm-2/kernel has shape (4, 16) lstm-2/lstm-2/recurrent_kernel has shape (4, 16) lstm-2/lstm-2/bias has shape (16,)Layer: lstm-1 (hidden) Keras class = LSTM stateful = True return_sequences = Truelstm-1Weights from input to lstm-1 lstm-2/lstm-1/kernel has shape (1, 16) lstm-2/lstm-1/recurrent_kernel has shape (4, 16) lstm-2/lstm-1/bias has shape (16,)Layer: input (input) shape = (1,) Keras class = Input batch_shape = (1, 3, 1)input
In [78]:
net4.propagate([[0.1], [-0.2], [0.8]])
Out[78]:
[0.010831544175744057]
In [79]:
if net4.saved():
    net4.load()
    net4.plot_loss_acc()
else:
    net4.train(100, batch_size=batch_size, shuffle=False, plot=True, save=True)
_images/LSTM_49_0.svg
========================================================================
       |  Training |  Training
Epochs |     Error |  Accuracy
------ | --------- | ---------
#  100 |   0.00320 |   0.92143
Saving network... Saved!
In [82]:
outputs = [net4.propagate(i, visualize=False) for i in net4.dataset.inputs]
plot([["Network", outputs], ["Training data", net4.dataset.targets]])
_images/LSTM_50_0.png