Datasets

A dataset is a list of (input, target) pairs that can be further split into training and testing lists.

Let’s make an example network to use as demonstration:

In [1]:
from conx import Network, Layer

net = Network("Odd Network")
net.add(Layer("input", 5))
net.add(Layer("hidden", 2, activation="relu"))
net.add(Layer("output", 1, activation="sigmoid"))
net.connect()
net.compile(error="mse", optimizer="adam")
net.summary()
Using Theano backend.
Network Summary
---------------
Network name: Odd Network
    Layer name: 'input' (input)
        VShape: None
        Dropout: 0
        Connected to: ['hidden']
        Activation function: None
        Dropout percent: 0
    Layer name: 'hidden' (hidden)
        VShape: None
        Dropout: 0
        Connected to: ['output']
        Activation function: relu
        Dropout percent: 0
    Layer name: 'output' (output)
        VShape: None
        Dropout: 0
        Activation function: sigmoid
        Dropout percent: 0

As a list of (input, target) pairs

The most straightforward method of adding input, target vectors to train on is to use a list of (input, target) pairs, like so:

In [27]:
patterns = []

for i in range(2 ** 5):
    inputs = [int(s) for s in ("00000" + bin(i)[2:])[-5:]]
    targets = [int((i % 2 == 1))]
    patterns.append((inputs, targets))
In [30]:
patterns[5]
Out[30]:
([0, 0, 1, 0, 1], [1])
In [28]:
net.set_dataset(patterns)
In [29]:
net.dataset.summary()
Input Summary:
   count  : 32 (32 for training, 0 for testing)
   shape  : (5,)
   range  : (0.0, 1.0)
Target Summary:
   count  : 32 (32 for training, 0 for testing)
   shape  : (1,)
   range  : (0.0, 1.0)

Dataset.add()

You can use the default dataset and add one pattern at a time. Consider the task of training a network to determine if the number of inputs is even (0) or odd (1). We could add inputs one at a time:

In [3]:
net.dataset.clear()
In [2]:
net.dataset.add([0, 0, 0, 0, 1], [1])
net.dataset.add([0, 0, 0, 1, 1], [0])
net.dataset.add([0, 0, 1, 0, 0], [1])
In [ ]:
net.dataset.clear()
In [4]:
for i in range(2 ** 5):
    inputs = [int(s) for s in ("00000" + bin(i)[2:])[-5:]]
    targets = [int((i % 2 == 1))]
    net.dataset.add(inputs, targets)
In [5]:
net.dataset.summary()
Input Summary:
   count  : 32 (32 for training, 0 for testing)
   shape  : (5,)
   range  : (0.0, 1.0)
Target Summary:
   count  : 32 (32 for training, 0 for testing)
   shape  : (1,)
   range  : (0.0, 1.0)
In [6]:
net.dataset.inputs[13]
Out[6]:
[0.0, 1.0, 1.0, 0.0, 1.0]
In [7]:
net.dataset.targets[13]
Out[7]:
[1.0]
In [8]:
net.train(epochs=5000, accuracy=.75, tolerance=.2, report_rate=100)
Training...
Epoch #  100 | train error 0.25526 | train accuracy 0.40625 | validate% 0.00000
Epoch #  200 | train error 0.22707 | train accuracy 0.71875 | validate% 0.00000
Epoch #  300 | train error 0.20312 | train accuracy 0.78125 | validate% 0.00000
Epoch #  400 | train error 0.18106 | train accuracy 0.81250 | validate% 0.03125
Epoch #  500 | train error 0.15291 | train accuracy 0.84375 | validate% 0.06250
Epoch #  600 | train error 0.11817 | train accuracy 0.96875 | validate% 0.09375
Epoch #  700 | train error 0.08046 | train accuracy 1.00000 | validate% 0.25000
Epoch #  800 | train error 0.05537 | train accuracy 1.00000 | validate% 0.43750
Epoch #  900 | train error 0.03762 | train accuracy 1.00000 | validate% 0.65625
========================================================================
Epoch #  954 | train error 0.03062 | train accuracy 1.00000 | validate% 0.75000
In [9]:
net.test(tolerance=.2)
Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.3] | X
1 | [0.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | X
2 | [0.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | X
3 | [0.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
4 | [0.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.2] | X
5 | [0.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
6 | [0.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.2] | X
7 | [0.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
8 | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | correct
9 | [0.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.9] | correct
10 | [0.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
11 | [0.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.9] | correct
12 | [0.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
13 | [0.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
14 | [0.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
15 | [0.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
16 | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | X
17 | [1.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.7] | X
18 | [1.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
19 | [1.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | X
20 | [1.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
21 | [1.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
22 | [1.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
23 | [1.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
24 | [1.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.1] | correct
25 | [1.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | correct
26 | [1.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.1] | correct
27 | [1.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
28 | [1.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
29 | [1.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.8] | correct
30 | [1.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
31 | [1.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.8] | correct
Total count: 32
Total percentage correct: 0.75

Dataset()

You can also create, and switch between, independent datasets.

In [10]:
from conx import Dataset
In [11]:
ds = Dataset()
In [12]:
for i in range(2 ** 5):
    inputs = [int(s) for s in ("00000" + bin(i)[2:])[-5:]]
    targets = [int((i % 2 == 1))]
    ds.add(inputs, targets)

As before, you can set the dataset, but using the Datatset object rather than list of (input, target) pairs:

In [13]:
net.set_dataset(ds)
In [14]:
net.test(tolerance=.2)
Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.3] | X
1 | [0.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | X
2 | [0.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | X
3 | [0.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
4 | [0.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.2] | X
5 | [0.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
6 | [0.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.2] | X
7 | [0.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
8 | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | correct
9 | [0.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.9] | correct
10 | [0.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
11 | [0.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.9] | correct
12 | [0.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
13 | [0.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
14 | [0.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
15 | [0.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
16 | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | X
17 | [1.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.7] | X
18 | [1.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
19 | [1.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | X
20 | [1.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
21 | [1.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
22 | [1.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
23 | [1.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
24 | [1.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.1] | correct
25 | [1.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | correct
26 | [1.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.1] | correct
27 | [1.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
28 | [1.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
29 | [1.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.8] | correct
30 | [1.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
31 | [1.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.8] | correct
Total count: 32
Total percentage correct: 0.75

Dataset inputs and targets

Inputs and targets in the dataset are represented (both in accessing and in assignemnt) in the same format as given (as lists, or lists of lists). These formats are automattically converted into an internal format.

In [15]:
ds.inputs[17]
Out[15]:
[1.0, 0.0, 0.0, 0.0, 1.0]
In [16]:
ds.inputs[17] = [0.9, 0.1, 0.1, 0.1, 0.9]
In [17]:
net.test(tolerance=.2)
Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.3] | X
1 | [0.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | X
2 | [0.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | X
3 | [0.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
4 | [0.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.2] | X
5 | [0.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
6 | [0.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.2] | X
7 | [0.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
8 | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | correct
9 | [0.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.9] | correct
10 | [0.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
11 | [0.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.9] | correct
12 | [0.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
13 | [0.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
14 | [0.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
15 | [0.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
16 | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | X
17 | [0.9, 0.1, 0.1, 0.1, 0.9] | [1.0] | [0.7] | X
18 | [1.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
19 | [1.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | X
20 | [1.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
21 | [1.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
22 | [1.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
23 | [1.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
24 | [1.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.1] | correct
25 | [1.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | correct
26 | [1.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.1] | correct
27 | [1.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
28 | [1.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
29 | [1.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.8] | correct
30 | [1.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
31 | [1.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.8] | correct
Total count: 32
Total percentage correct: 0.75

To see/access the internal format, use the underscore before inputs or targets. This is a numpy array. conx is designed so that you need not have to use numpy for most network operations.

In [18]:
ds._inputs[17]
Out[18]:
array([ 0.89999998,  0.1       ,  0.1       ,  0.1       ,  0.89999998], dtype=float32)

Built-in datasets

In [19]:
mnist = Dataset.get_mnist()
In [20]:
cifar10 = Dataset.get_cifar10()
In [21]:
cifar100 = Dataset.get_cifar100()
Downloading data from http://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
168984576/169001437 [============================>.] - ETA: 0s

Dataset operations

Dataset.split() will divide the dataset between training and testing sets. You can provide split an integer (to divide at a specific point), or a floating-point value, to divide by a percentage.

In [ ]:
ds.split(20)
In [ ]:
ds.split(.5)
In [ ]:
ds.slice(10)
In [ ]:
ds.shuffle()

Additional operations

These functions are subject to change to an API which is more general:

In [ ]:
ds.set_targets_from_inputs()
In [ ]:
ds.set_inputs_from_targets()
In [ ]:
ds.set_targets_from_labels()
In [ ]:
ds.reshape_inputs(shape)

Dataset direct manipulation

You can also set the internal format directly, given that it is in the correct format:

  • use list of columns for multi-bank inputs or targets
  • use np.array(vectors) for single-bank inputs or targets
In [31]:
import numpy as np

inputs = []
targets = []

for i in range(2 ** 5):
    inputs.append([int(s) for s in ("00000" + bin(i)[2:])[-5:]])
    targets.append([int((i % 2 == 1))])

ds.load_direct(np.array(inputs), np.array(targets))
In [32]:
net.set_dataset(ds)
In [33]:
net.test(tolerance=.2)
Testing on training dataset...
# | inputs | targets | outputs | result
---------------------------------------
0 | [0.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.3] | X
1 | [0.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | X
2 | [0.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | X
3 | [0.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
4 | [0.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.2] | X
5 | [0.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
6 | [0.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.2] | X
7 | [0.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
8 | [0.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | correct
9 | [0.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.9] | correct
10 | [0.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
11 | [0.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.9] | correct
12 | [0.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
13 | [0.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
14 | [0.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
15 | [0.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
16 | [1.0, 0.0, 0.0, 0.0, 0.0] | [0.0] | [0.2] | X
17 | [1.0, 0.0, 0.0, 0.0, 1.0] | [1.0] | [0.7] | X
18 | [1.0, 0.0, 0.0, 1.0, 0.0] | [0.0] | [0.2] | correct
19 | [1.0, 0.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | X
20 | [1.0, 0.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
21 | [1.0, 0.0, 1.0, 0.0, 1.0] | [1.0] | [0.9] | correct
22 | [1.0, 0.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
23 | [1.0, 0.0, 1.0, 1.0, 1.0] | [1.0] | [0.9] | correct
24 | [1.0, 1.0, 0.0, 0.0, 0.0] | [0.0] | [0.1] | correct
25 | [1.0, 1.0, 0.0, 0.0, 1.0] | [1.0] | [0.8] | correct
26 | [1.0, 1.0, 0.0, 1.0, 0.0] | [0.0] | [0.1] | correct
27 | [1.0, 1.0, 0.0, 1.0, 1.0] | [1.0] | [0.8] | correct
28 | [1.0, 1.0, 1.0, 0.0, 0.0] | [0.0] | [0.1] | correct
29 | [1.0, 1.0, 1.0, 0.0, 1.0] | [1.0] | [0.8] | correct
30 | [1.0, 1.0, 1.0, 1.0, 0.0] | [0.0] | [0.1] | correct
31 | [1.0, 1.0, 1.0, 1.0, 1.0] | [1.0] | [0.8] | correct
Total count: 32
Total percentage correct: 0.75
In [ ]: