Introduction to Tensorflow and Keras

Alt text that describes the graphic

TensorFlow

TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. It is currently used for both research and production by different teams in many commercial Google products, such as speech recognition, Gmail, Google Photos, and search. TensorFlow was originally developed by the Google Brain team for Google's research and production purposes and later released under the Apache 2.0 open source license on November 9, 2015.

Why TensorFlow

  • Supported by Google
  • Works well on Windows, Linux, and Mac
  • Excellent GPU support
  • Has a Python API
  • Python is extremely popular in the data science community

What version of TensorFlow do you have?

TensorFlow is very new and changing rapidly.

In [1]:
import tensorflow as tf
print("Tensor Flow Version: {}".format(tf.__version__))
Tensor Flow Version: 2.0.0

Deep Learning Tools

TensorFlow also has some competition. The biggest competitor to TensorFlow/Keras is PyTorch. Listed below are some of the more popular deep learning libraries actively being supported:

  • TensorFlow: The most popular and Google's deep learning API.
  • Keras: Also by Google, higher level framework that allows the use of TensorFlow, MXNet and Theano interchangeably.
  • PyTorch: PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It is primarily developed by Facebook's AI Research lab.

Other deep learning tools:

  • MXNet: Apache foundation's deep learning API. Can be used through Keras.
  • Torch: Used by Google DeepMind, the Facebook AI Research Group, IBM, Yandex and the Idiap Research Institute. It has been used for some of the most advanced deep learning projects in the world. However, it requires the LUA) programming language. It is very advanced, but it is not mainstream.

Using TensorFlow Directly

TensorFlow is a low-level mathematics API, similar to Numpy. However, unlike Numpy, TensorFlow is built for deep learning. TensorFlow compiles these compute graphs into highly efficient C++/CUDA code.

TensorFlow Linear Algebra Examples

TensorFlow is a library for linear algebra. Keras is a higher-level abstraction for neural networks that you build upon TensorFlow. Here, we will do some basic linear algebra that employs TensorFlow directly and does not make use of Keras. First, we will multiply a row and column matrix.

In [2]:
# Create a Constant op that produces a 1x2 matrix.  
# The op is added as a node to the default graph.
#
# The value returned by the constructor represents 
# the output of the Constant op.
matrix1 = tf.constant([[3.0, 3.0]])

# Create another Constant that produces a 2x1 matrix.
matrix2 = tf.constant([[2.0],[2.0]])

# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
# The returned value, 'product', represents the result of the matrix
# multiplication.
product = tf.matmul(matrix1, matrix2)

print(f'Martix1 shape: {matrix1.shape}')
print(f'Martix2 shape: {matrix2.shape}')

print('\n',product)
print('\nProduct =',float(product))
Martix1 shape: (1, 2)
Martix2 shape: (2, 1)

 tf.Tensor([[12.]], shape=(1, 1), dtype=float32)

Product = 12.0

Here, we will see how to subtract a constant from a variable.

In [3]:
x = tf.Variable([1.0, 2.0])
a = tf.constant([3.0, 3.0])

# Add an op to subtract 'a' from 'x'. 
sub = tf.subtract(x, a)
print(sub)
print(sub.numpy())
tf.Tensor([-2. -1.], shape=(2,), dtype=float32)
[-2. -1.]

Variables are only useful if their values can be changed. This can be accomplished by calling the assign function.

In [4]:
print(x)
x.assign([4.0, 6.0])
print('\n',x)
<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([1., 2.], dtype=float32)>

 <tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([4., 6.], dtype=float32)>

Now we can perform the subtraction with this new value.

In [5]:
sub = tf.subtract(x, a)
print(sub)
print(sub.numpy())
tf.Tensor([1. 3.], shape=(2,), dtype=float32)
[1. 3.]

Introduction to Keras

Keras is a layer on top of Tensorflow that makes it much easier to create neural networks. Rather than define the graphs, as you see above, you set the individual layers of the network with a much more high-level API. Unless you are performing research into entirely new structures of deep neural networks, it is unlikely that you need to program TensorFlow directly.

Basic TensorFlow Keras Regression: MPG

This example shows how to encode the MPG dataset for regression. This dataset takes some preprocessing because:

  • Input has both numeric and categorical
  • Input has missing values
  • This example uses functions defined above in this notepad, the "helpful functions". These functions allow you to build the feature vector for a neural network. Consider the following:

Predictors/Inputs:

  • Fill any missing inputs with the median for that column. Use missing_median.
  • Encode textual/categorical values with encode_text_dummy.
  • Encode numeric values with encode_numeric_zscore.

Output:

  • Discard rows with missing outputs.
  • Encode textual/categorical values with encode_text_index.
  • Do not encode output numeric values.
  • Produce final feature vectors (x) and expected output (y) with to_xy.

To encode categorical values that are part of the feature vector, use the functions from above if the categorical value is the target.

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics
In [7]:
url = "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv"
df = pd.read_csv(url, na_values=['NA', '?'])
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
mpg             398 non-null float64
cylinders       398 non-null int64
displacement    398 non-null float64
horsepower      392 non-null float64
weight          398 non-null int64
acceleration    398 non-null float64
year            398 non-null int64
origin          398 non-null int64
name            398 non-null object
dtypes: float64(4), int64(4), object(1)
memory usage: 28.1+ KB
In [8]:
df.head()
Out[8]:
mpg cylinders displacement horsepower weight acceleration year origin name
0 18.0 8 307.0 130.0 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165.0 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150.0 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150.0 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140.0 3449 10.5 70 1 ford torino
In [9]:
cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df.drop(columns=['name','mpg']).values

# regression target
y = df['mpg'].values 
In [10]:
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x,y,verbose=1,epochs=100)
Train on 398 samples
Epoch 1/100
398/398 [==============================] - 0s 966us/sample - loss: 6606.7766
Epoch 2/100
398/398 [==============================] - 0s 33us/sample - loss: 931.7477
Epoch 3/100
398/398 [==============================] - 0s 32us/sample - loss: 332.6816
Epoch 4/100
398/398 [==============================] - 0s 30us/sample - loss: 168.5406
Epoch 5/100
398/398 [==============================] - 0s 32us/sample - loss: 109.4436
Epoch 6/100
398/398 [==============================] - 0s 28us/sample - loss: 75.5215
Epoch 7/100
398/398 [==============================] - 0s 32us/sample - loss: 62.5215
Epoch 8/100
398/398 [==============================] - 0s 29us/sample - loss: 60.1478
Epoch 9/100
398/398 [==============================] - 0s 31us/sample - loss: 58.2836
Epoch 10/100
398/398 [==============================] - 0s 30us/sample - loss: 57.2885
Epoch 11/100
398/398 [==============================] - 0s 32us/sample - loss: 56.8358
Epoch 12/100
398/398 [==============================] - 0s 30us/sample - loss: 56.4329
Epoch 13/100
398/398 [==============================] - 0s 29us/sample - loss: 55.3485
Epoch 14/100
398/398 [==============================] - 0s 29us/sample - loss: 55.7711
Epoch 15/100
398/398 [==============================] - 0s 35us/sample - loss: 52.7060
Epoch 16/100
398/398 [==============================] - 0s 36us/sample - loss: 52.2015
Epoch 17/100
398/398 [==============================] - 0s 30us/sample - loss: 52.1720
Epoch 18/100
398/398 [==============================] - 0s 31us/sample - loss: 49.5357
Epoch 19/100
398/398 [==============================] - 0s 31us/sample - loss: 49.7982
Epoch 20/100
398/398 [==============================] - 0s 33us/sample - loss: 47.8882
Epoch 21/100
398/398 [==============================] - 0s 33us/sample - loss: 47.8509
Epoch 22/100
398/398 [==============================] - 0s 26us/sample - loss: 45.6986
Epoch 23/100
398/398 [==============================] - 0s 31us/sample - loss: 45.3614
Epoch 24/100
398/398 [==============================] - 0s 31us/sample - loss: 44.8686
Epoch 25/100
398/398 [==============================] - 0s 32us/sample - loss: 43.8773
Epoch 26/100
398/398 [==============================] - 0s 32us/sample - loss: 43.1763
Epoch 27/100
398/398 [==============================] - 0s 29us/sample - loss: 42.4327
Epoch 28/100
398/398 [==============================] - 0s 31us/sample - loss: 43.7433
Epoch 29/100
398/398 [==============================] - 0s 32us/sample - loss: 44.1538
Epoch 30/100
398/398 [==============================] - 0s 26us/sample - loss: 40.8085
Epoch 31/100
398/398 [==============================] - 0s 31us/sample - loss: 43.2298
Epoch 32/100
398/398 [==============================] - 0s 34us/sample - loss: 44.3997
Epoch 33/100
398/398 [==============================] - 0s 31us/sample - loss: 39.0829
Epoch 34/100
398/398 [==============================] - 0s 30us/sample - loss: 38.5530
Epoch 35/100
398/398 [==============================] - 0s 29us/sample - loss: 38.8678
Epoch 36/100
398/398 [==============================] - 0s 33us/sample - loss: 40.2063
Epoch 37/100
398/398 [==============================] - 0s 30us/sample - loss: 38.0803
Epoch 38/100
398/398 [==============================] - 0s 29us/sample - loss: 35.8147
Epoch 39/100
398/398 [==============================] - 0s 26us/sample - loss: 35.9085
Epoch 40/100
398/398 [==============================] - 0s 28us/sample - loss: 37.0778
Epoch 41/100
398/398 [==============================] - 0s 33us/sample - loss: 36.3134
Epoch 42/100
398/398 [==============================] - 0s 29us/sample - loss: 36.2765
Epoch 43/100
398/398 [==============================] - 0s 27us/sample - loss: 33.8937
Epoch 44/100
398/398 [==============================] - 0s 30us/sample - loss: 32.7919
Epoch 45/100
398/398 [==============================] - 0s 32us/sample - loss: 32.8527
Epoch 46/100
398/398 [==============================] - 0s 33us/sample - loss: 33.3897
Epoch 47/100
398/398 [==============================] - 0s 30us/sample - loss: 31.5147
Epoch 48/100
398/398 [==============================] - 0s 32us/sample - loss: 30.6892
Epoch 49/100
398/398 [==============================] - 0s 31us/sample - loss: 29.8219
Epoch 50/100
398/398 [==============================] - 0s 32us/sample - loss: 30.1172
Epoch 51/100
398/398 [==============================] - 0s 30us/sample - loss: 30.4699
Epoch 52/100
398/398 [==============================] - 0s 28us/sample - loss: 30.8613
Epoch 53/100
398/398 [==============================] - 0s 35us/sample - loss: 30.8735
Epoch 54/100
398/398 [==============================] - 0s 30us/sample - loss: 28.7385
Epoch 55/100
398/398 [==============================] - 0s 29us/sample - loss: 29.1882
Epoch 56/100
398/398 [==============================] - 0s 24us/sample - loss: 29.5343
Epoch 57/100
398/398 [==============================] - 0s 26us/sample - loss: 31.0591
Epoch 58/100
398/398 [==============================] - 0s 33us/sample - loss: 30.7792
Epoch 59/100
398/398 [==============================] - 0s 28us/sample - loss: 27.3902
Epoch 60/100
398/398 [==============================] - 0s 26us/sample - loss: 25.7218
Epoch 61/100
398/398 [==============================] - 0s 29us/sample - loss: 25.3512
Epoch 62/100
398/398 [==============================] - 0s 29us/sample - loss: 26.0253
Epoch 63/100
398/398 [==============================] - 0s 28us/sample - loss: 26.2373
Epoch 64/100
398/398 [==============================] - 0s 28us/sample - loss: 23.9487
Epoch 65/100
398/398 [==============================] - 0s 30us/sample - loss: 23.5959
Epoch 66/100
398/398 [==============================] - 0s 26us/sample - loss: 23.0724
Epoch 67/100
398/398 [==============================] - 0s 26us/sample - loss: 23.5263
Epoch 68/100
398/398 [==============================] - 0s 27us/sample - loss: 23.9986
Epoch 69/100
398/398 [==============================] - 0s 25us/sample - loss: 24.1033
Epoch 70/100
398/398 [==============================] - 0s 29us/sample - loss: 25.1242
Epoch 71/100
398/398 [==============================] - 0s 27us/sample - loss: 23.3761
Epoch 72/100
398/398 [==============================] - 0s 25us/sample - loss: 21.4543
Epoch 73/100
398/398 [==============================] - 0s 27us/sample - loss: 21.0430
Epoch 74/100
398/398 [==============================] - 0s 28us/sample - loss: 21.4034
Epoch 75/100
398/398 [==============================] - 0s 30us/sample - loss: 20.4156
Epoch 76/100
398/398 [==============================] - 0s 29us/sample - loss: 20.3481
Epoch 77/100
398/398 [==============================] - 0s 29us/sample - loss: 19.8204
Epoch 78/100
398/398 [==============================] - 0s 27us/sample - loss: 20.6510
Epoch 79/100
398/398 [==============================] - 0s 25us/sample - loss: 20.2274
Epoch 80/100
398/398 [==============================] - 0s 30us/sample - loss: 21.8183
Epoch 81/100
398/398 [==============================] - 0s 31us/sample - loss: 19.2361
Epoch 82/100
398/398 [==============================] - 0s 25us/sample - loss: 18.7114
Epoch 83/100
398/398 [==============================] - 0s 26us/sample - loss: 18.4457
Epoch 84/100
398/398 [==============================] - 0s 27us/sample - loss: 18.3212
Epoch 85/100
398/398 [==============================] - 0s 24us/sample - loss: 18.7687
Epoch 86/100
398/398 [==============================] - 0s 29us/sample - loss: 18.7056
Epoch 87/100
398/398 [==============================] - 0s 27us/sample - loss: 23.0535
Epoch 88/100
398/398 [==============================] - 0s 25us/sample - loss: 17.1449
Epoch 89/100
398/398 [==============================] - 0s 26us/sample - loss: 17.1896
Epoch 90/100
398/398 [==============================] - 0s 36us/sample - loss: 18.1976
Epoch 91/100
398/398 [==============================] - 0s 30us/sample - loss: 17.5546
Epoch 92/100
398/398 [==============================] - 0s 27us/sample - loss: 16.9299
Epoch 93/100
398/398 [==============================] - 0s 28us/sample - loss: 17.3063
Epoch 94/100
398/398 [==============================] - 0s 28us/sample - loss: 16.1572
Epoch 95/100
398/398 [==============================] - 0s 29us/sample - loss: 16.4370
Epoch 96/100
398/398 [==============================] - 0s 30us/sample - loss: 16.1726
Epoch 97/100
398/398 [==============================] - 0s 28us/sample - loss: 17.1707
Epoch 98/100
398/398 [==============================] - 0s 28us/sample - loss: 19.8813
Epoch 99/100
398/398 [==============================] - 0s 27us/sample - loss: 16.3860
Epoch 100/100
398/398 [==============================] - 0s 26us/sample - loss: 15.7857
Out[10]:
<tensorflow.python.keras.callbacks.History at 0x7f2bbe835278>

Intro to Neural Network Hyperparameters

In the above code, the neural network contains $4$ layers. The first layer is the input layer because it contains the input_dim parameter that sets the number of inputs that the dataset has. The network needs one input neuron for every column in the data set (including dummy variables).

There are also $2$ hidden layers, with $25$ and $10$ neurons each. You might be wondering to chose these numbers. Selecting a hidden neuron structure is one of the most common questions about neural networks. Unfortunately, there is no right answer. These are hyperparameters. They are settings that can affect neural network performance, yet there is no clearly defined means of setting them.

In general, more hidden neurons means more capability to fit complex problems. However, too many neurons can lead to overfitting and lengthy training times. Too few can lead to underfitting the problem and will sacrifice accuracy. Also, how many layers you have is another hyperparameter. In general, more layers allow the neural network to be able to perform more of its feature engineering and data preprocessing. But this also comes at the expense of training times and the risk of overfitting. In general, you will see that neuron counts start larger near the input layer and tend to shrink towards the output layer in a sort of triangular fashion.

Some techniques use machine learning to optimize these values.

Controlling the Amount of Output

The program produces one line of output for each training epoch. You can eliminate this output by setting the verbose setting of the fit command:

  • verbose=0 - No progress output (use with Jupyter if you do not want output)
  • verbose=1 - Display progress bar, does not work well with Jupyter
  • verbose=2 - Summary progress output (use with Jupyter if you want to know the loss at each epoch)

Regression Prediction

Next, we will perform actual predictions. We assign these predictions to the pred variable. These are all MPG predictions from the neural network. Notice that this is a 2D array? You can always see the dimensions of what Keras returns by printing out pred.shape. Neural networks can return multiple values, so the result is always an array. Here the neural network only returns one value per prediction (there are $398$ cars, so $398$ predictions). However, a 2D range is needed because the neural network has the potential of returning more than one value.

In [11]:
pred = model.predict(x)
print(f"Shape: {pred.shape}")
print(pred[0:10])
Shape: (398, 1)
[[14.891457 ]
 [13.78644  ]
 [15.051363 ]
 [15.794383 ]
 [15.2123   ]
 [ 9.941297 ]
 [ 9.4377165]
 [ 9.810662 ]
 [ 9.589317 ]
 [12.046617 ]]

We would like to see how good these predictions are. We know what the correct MPG is for each car, so we can measure how close the neural network was.

In [12]:
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y))
print(f"RMSE: {score}")
RMSE: 3.8875174720463037

The number printed above is the average amount that the predictions were above or below the expected output. We can also print out the first ten cars, with predictions and actual MPG.

In [13]:
# Sample predictions
for i in range(10):
    print(f"{i+1}. Car name: {cars[i]}, MPG: {y[i]}, predicted MPG: {pred[i]}")
1. Car name: chevrolet chevelle malibu, MPG: 18.0, predicted MPG: [14.891457]
2. Car name: buick skylark 320, MPG: 15.0, predicted MPG: [13.78644]
3. Car name: plymouth satellite, MPG: 18.0, predicted MPG: [15.051363]
4. Car name: amc rebel sst, MPG: 16.0, predicted MPG: [15.794383]
5. Car name: ford torino, MPG: 17.0, predicted MPG: [15.2123]
6. Car name: ford galaxie 500, MPG: 15.0, predicted MPG: [9.941297]
7. Car name: chevrolet impala, MPG: 14.0, predicted MPG: [9.4377165]
8. Car name: plymouth fury iii, MPG: 14.0, predicted MPG: [9.810662]
9. Car name: pontiac catalina, MPG: 14.0, predicted MPG: [9.589317]
10. Car name: amc ambassador dpl, MPG: 15.0, predicted MPG: [12.046617]

Basic TensorFlow Keras Classification: Iris

Classification is the process by which a neural network attempts to classify the input into one or more classes. The simplest way of evaluating a classification network is to track the percentage of training set items that were classified incorrectly. We typically score human results in this manner. For example, you might have taken multiple-choice exams in which you had choices A, B, C, or D. If you chose the wrong letter on a $10$-question exam, you would earn a $90\%$. In the same way, we can grade a algorithm; however, most classification algorithms do not merely choose A, B, C, or D. Computers typically report a classification as their percent confidence or an approximated probability of class membership. The figure below shows how a computer and a human might both respond to question number $1$ on an exam.

Classification Neural Network Output

Alt text that describes the graphic

As you can see, the human test taker marked the first question as "B." However, the algorithm test taker had an $80\%$ $(0.8)$ confidence in "B" and was also somewhat sure with $10\%$ $(0.1)$ on "A." The computer then distributed the remaining points on the other two. In the simplest sense, the machine would get $80\%$ of the score for this question if the correct answer were "B." The computer would get only $5\%$ $(0.05)$ of the points if the correct answer were "D."

What we just saw is a straightforward example of how to perform classification using TensorFlow. Now the Iris dataset example.

In [14]:
url = "https://data.heatonresearch.com/data/t81-558/iris.csv"

df = pd.read_csv(url, na_values=['NA', '?'])
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_l    150 non-null float64
sepal_w    150 non-null float64
petal_l    150 non-null float64
petal_w    150 non-null float64
species    150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
In [15]:
df.head()
Out[15]:
sepal_l sepal_w petal_l petal_w species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
In [16]:
x = df.drop('species', axis=1).values
dummies = pd.get_dummies(df['species']) 
species = dummies.columns
y = dummies.values

print(f'Secies {species}')
print(f'\nTarget:') 
print(y[:5])
Secies Index(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype='object')

Target:
[[1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]]
In [17]:
# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x,y,verbose=1,epochs=100)
Train on 150 samples
Epoch 1/100
150/150 [==============================] - 0s 1ms/sample - loss: 2.5935
Epoch 2/100
150/150 [==============================] - 0s 31us/sample - loss: 2.0179
Epoch 3/100
150/150 [==============================] - 0s 31us/sample - loss: 1.4899
Epoch 4/100
150/150 [==============================] - 0s 41us/sample - loss: 1.0991
Epoch 5/100
150/150 [==============================] - 0s 30us/sample - loss: 0.8531
Epoch 6/100
150/150 [==============================] - 0s 428us/sample - loss: 0.8095
Epoch 7/100
150/150 [==============================] - 0s 32us/sample - loss: 0.8007
Epoch 8/100
150/150 [==============================] - 0s 39us/sample - loss: 0.7524
Epoch 9/100
150/150 [==============================] - 0s 34us/sample - loss: 0.6931
Epoch 10/100
150/150 [==============================] - 0s 33us/sample - loss: 0.6589
Epoch 11/100
150/150 [==============================] - 0s 43us/sample - loss: 0.6384
Epoch 12/100
150/150 [==============================] - 0s 38us/sample - loss: 0.6140
Epoch 13/100
150/150 [==============================] - 0s 34us/sample - loss: 0.5915
Epoch 14/100
150/150 [==============================] - 0s 35us/sample - loss: 0.5677
Epoch 15/100
150/150 [==============================] - 0s 30us/sample - loss: 0.5482
Epoch 16/100
150/150 [==============================] - 0s 38us/sample - loss: 0.5295
Epoch 17/100
150/150 [==============================] - 0s 42us/sample - loss: 0.5134
Epoch 18/100
150/150 [==============================] - 0s 31us/sample - loss: 0.4965
Epoch 19/100
150/150 [==============================] - 0s 42us/sample - loss: 0.4813
Epoch 20/100
150/150 [==============================] - 0s 40us/sample - loss: 0.4681
Epoch 21/100
150/150 [==============================] - 0s 35us/sample - loss: 0.4541
Epoch 22/100
150/150 [==============================] - 0s 48us/sample - loss: 0.4434
Epoch 23/100
150/150 [==============================] - 0s 35us/sample - loss: 0.4314
Epoch 24/100
150/150 [==============================] - 0s 37us/sample - loss: 0.4201
Epoch 25/100
150/150 [==============================] - 0s 38us/sample - loss: 0.4111
Epoch 26/100
150/150 [==============================] - 0s 36us/sample - loss: 0.4012
Epoch 27/100
150/150 [==============================] - 0s 49us/sample - loss: 0.3911
Epoch 28/100
150/150 [==============================] - 0s 45us/sample - loss: 0.3822
Epoch 29/100
150/150 [==============================] - 0s 48us/sample - loss: 0.3739
Epoch 30/100
150/150 [==============================] - 0s 39us/sample - loss: 0.3652
Epoch 31/100
150/150 [==============================] - 0s 40us/sample - loss: 0.3569
Epoch 32/100
150/150 [==============================] - 0s 38us/sample - loss: 0.3487
Epoch 33/100
150/150 [==============================] - 0s 33us/sample - loss: 0.3353
Epoch 34/100
150/150 [==============================] - 0s 35us/sample - loss: 0.3233
Epoch 35/100
150/150 [==============================] - 0s 49us/sample - loss: 0.3140
Epoch 36/100
150/150 [==============================] - 0s 44us/sample - loss: 0.3078
Epoch 37/100
150/150 [==============================] - 0s 44us/sample - loss: 0.2995
Epoch 38/100
150/150 [==============================] - 0s 35us/sample - loss: 0.2911
Epoch 39/100
150/150 [==============================] - 0s 48us/sample - loss: 0.2842
Epoch 40/100
150/150 [==============================] - 0s 39us/sample - loss: 0.2795
Epoch 41/100
150/150 [==============================] - 0s 36us/sample - loss: 0.2702
Epoch 42/100
150/150 [==============================] - 0s 42us/sample - loss: 0.2668
Epoch 43/100
150/150 [==============================] - 0s 37us/sample - loss: 0.2605
Epoch 44/100
150/150 [==============================] - 0s 41us/sample - loss: 0.2524
Epoch 45/100
150/150 [==============================] - 0s 32us/sample - loss: 0.2462
Epoch 46/100
150/150 [==============================] - 0s 36us/sample - loss: 0.2410
Epoch 47/100
150/150 [==============================] - 0s 45us/sample - loss: 0.2360
Epoch 48/100
150/150 [==============================] - 0s 36us/sample - loss: 0.2299
Epoch 49/100
150/150 [==============================] - 0s 35us/sample - loss: 0.2309
Epoch 50/100
150/150 [==============================] - 0s 39us/sample - loss: 0.2215
Epoch 51/100
150/150 [==============================] - 0s 34us/sample - loss: 0.2161
Epoch 52/100
150/150 [==============================] - 0s 33us/sample - loss: 0.2118
Epoch 53/100
150/150 [==============================] - 0s 46us/sample - loss: 0.2071
Epoch 54/100
150/150 [==============================] - 0s 35us/sample - loss: 0.2035
Epoch 55/100
150/150 [==============================] - 0s 46us/sample - loss: 0.1985
Epoch 56/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1957
Epoch 57/100
150/150 [==============================] - 0s 39us/sample - loss: 0.1907
Epoch 58/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1886
Epoch 59/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1836
Epoch 60/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1803
Epoch 61/100
150/150 [==============================] - 0s 30us/sample - loss: 0.1765
Epoch 62/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1737
Epoch 63/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1726
Epoch 64/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1664
Epoch 65/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1680
Epoch 66/100
150/150 [==============================] - 0s 31us/sample - loss: 0.1609
Epoch 67/100
150/150 [==============================] - 0s 33us/sample - loss: 0.1629
Epoch 68/100
150/150 [==============================] - 0s 42us/sample - loss: 0.1597
Epoch 69/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1539
Epoch 70/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1518
Epoch 71/100
150/150 [==============================] - 0s 43us/sample - loss: 0.1500
Epoch 72/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1483
Epoch 73/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1447
Epoch 74/100
150/150 [==============================] - 0s 37us/sample - loss: 0.1417
Epoch 75/100
150/150 [==============================] - 0s 32us/sample - loss: 0.1397
Epoch 76/100
150/150 [==============================] - 0s 38us/sample - loss: 0.1381
Epoch 77/100
150/150 [==============================] - 0s 38us/sample - loss: 0.1372
Epoch 78/100
150/150 [==============================] - 0s 32us/sample - loss: 0.1344
Epoch 79/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1326
Epoch 80/100
150/150 [==============================] - 0s 35us/sample - loss: 0.1326
Epoch 81/100
150/150 [==============================] - 0s 71us/sample - loss: 0.1278
Epoch 82/100
150/150 [==============================] - 0s 38us/sample - loss: 0.1271
Epoch 83/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1253
Epoch 84/100
150/150 [==============================] - 0s 30us/sample - loss: 0.1260
Epoch 85/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1224
Epoch 86/100
150/150 [==============================] - 0s 31us/sample - loss: 0.1208
Epoch 87/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1194
Epoch 88/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1172
Epoch 89/100
150/150 [==============================] - 0s 30us/sample - loss: 0.1212
Epoch 90/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1160
Epoch 91/100
150/150 [==============================] - 0s 31us/sample - loss: 0.1136
Epoch 92/100
150/150 [==============================] - 0s 34us/sample - loss: 0.1134
Epoch 93/100
150/150 [==============================] - 0s 40us/sample - loss: 0.1114
Epoch 94/100
150/150 [==============================] - 0s 30us/sample - loss: 0.1117
Epoch 95/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1087
Epoch 96/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1079
Epoch 97/100
150/150 [==============================] - 0s 38us/sample - loss: 0.1064
Epoch 98/100
150/150 [==============================] - 0s 36us/sample - loss: 0.1063
Epoch 99/100
150/150 [==============================] - 0s 43us/sample - loss: 0.1042
Epoch 100/100
150/150 [==============================] - 0s 32us/sample - loss: 0.1039
Out[17]:
<tensorflow.python.keras.callbacks.History at 0x7f2bbc1c3eb8>

Now we have a trained neural network, we would like to be able to use it. The following code makes use of our neural network. Exactly like before, we will generate predictions. Notice that three values come back for each of the $150$ iris flowers. There were three types of iris (Iris-setosa, Iris-versicolor, and Iris-virginica).

In [18]:
pred = model.predict(x)
print(f"Shape: {pred.shape}")
print(pred[0:10])
Shape: (150, 3)
[[9.9695468e-01 3.0451624e-03 1.4113792e-07]
 [9.9220347e-01 7.7957255e-03 8.1985411e-07]
 [9.9503493e-01 4.9646357e-03 4.9762201e-07]
 [9.9029142e-01 9.7070076e-03 1.4767769e-06]
 [9.9726498e-01 2.7349552e-03 1.2947523e-07]
 [9.9609226e-01 3.9075725e-03 1.3927595e-07]
 [9.9425197e-01 5.7472931e-03 6.7194486e-07]
 [9.9524057e-01 4.7591310e-03 3.0676313e-07]
 [9.8798639e-01 1.2010939e-02 2.7187311e-06]
 [9.9322379e-01 6.7756008e-03 5.9396223e-07]]

If you would like to turn off scientific notation, the following line can be used:

In [19]:
np.set_printoptions(suppress=True)

Now we see these values rounded up.

In [20]:
print(pred[0:5])
[[0.9969547  0.00304516 0.00000014]
 [0.9922035  0.00779573 0.00000082]
 [0.99503493 0.00496464 0.0000005 ]
 [0.9902914  0.00970701 0.00000148]
 [0.997265   0.00273496 0.00000013]]

Usually, the model considers the column with the highest prediction to be the prediction of the neural network. It is easy to convert the predictions to the expected iris species. The argmax function finds the index of the maximum prediction for each row.

In [21]:
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y,axis=1)
print(f"Predictions: {predict_classes}")
print(f"\nExpected: {expected_classes}")
Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1
 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

Expected: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

It is straightforward to turn these indexes back into iris species. We use the species list that we created earlier.

In [22]:
print(species[predict_classes[1:10]])
Index(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa'],
      dtype='object')

Here we score with accuracy. It is essentially a test score. For all of the iris predictions, what percent were correct? The downside is it does not consider how confident the neural network was in each prediction.

In [23]:
correct = metrics.accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")
Accuracy: 0.98

The code below performs two ad hoc predictions. The first prediction is simply a single iris flower, and the second predicts two iris flowers. Notice that the argmax in the second prediction requires axis=1? Since we have a 2D array now, we must specify which axis to take the argmax over. The value axis=1 specifies we want the max column index for each row.

One sample flower

In [24]:
sample_flower = np.array( [[5.0,3.0,4.0,2.0]], dtype=float)
pred = model.predict(sample_flower)
print(f'Predictions: \n\n{pred}')
pred = np.argmax(pred)
print(f"\nPredict that {sample_flower} is: {species[pred]}")
Predictions: 

[[0.00137724 0.26192608 0.73669666]]

Predict that [[5. 3. 4. 2.]] is: Iris-virginica

Two sample flowers

In [25]:
sample_flower = np.array( [[5.0,3.0,4.0,2.0],[5.2,3.5,1.5,0.8]], dtype=float)
pred = model.predict(sample_flower)
print(f'Predictions: \n\n{pred}')
pred = np.argmax(pred,axis=1)
print(f"\nPredict that these two flowers: \n\n{sample_flower} \n\nare: {list(species[pred])}")
Predictions: 

[[0.00137724 0.2619261  0.7366966 ]
 [0.98810416 0.01189453 0.00000135]]

Predict that these two flowers: 

[[5.  3.  4.  2. ]
 [5.2 3.5 1.5 0.8]] 

are: ['Iris-virginica', 'Iris-setosa']

Saving and Loading a Keras Neural Network

Complex neural networks will take a long time to fit/train. It is helpful to be able to save these neural networks so that they can be reloaded later. A reloaded neural network will not require retraining. Keras provides three formats for neural network saving.

  • YAML: Stores the neural network structure (no weights) in the YAML file format.
  • JSON: Stores the neural network structure (no weights) in the JSON file format.
  • HDF5: Stores the complete neural network (with weights) in the HDF5 file format. Do not confuse HDF5 with weights) in the HDF5 file format. Do not confuse HDF5 with HDFS. They are different.

Usually you will want to save in HDF5.

In [26]:
save_path = "." # save to current directory 

# save neural network structure to JSON (no weights)
model_json = model.to_json()
with open(os.path.join(save_path,"network.json"), "w") as json_file:
    json_file.write(model_json)

# save neural network structure to YAML (no weights)
model_yaml = model.to_yaml()
with open(os.path.join(save_path,"network.yaml"), "w") as yaml_file:
    yaml_file.write(model_yaml)

# save entire network to HDF5 (save everything, suggested)
model.save(os.path.join(save_path,"network.h5"))

The code below sets up a neural network and reads the data (for predictions), but it does not clear the model directory or fit the neural network. The weights from the previous fit are used.

Now we reload the network and perform another prediction. The Accuracy should match the previous one exactly if the neural network was really saved and reloaded.

In [27]:
from tensorflow.keras.models import load_model

saved_model = load_model(os.path.join(save_path,"network.h5"))

pred_saved = saved_model.predict(x)

predict_classes = np.argmax(pred_saved,axis=1)
correct_saved = metrics.accuracy_score(expected_classes,predict_classes)

print(f"Saved Model Accuracy: {correct_saved}")
print(f"Before Saving Accuracy: {correct}")
Saved Model Accuracy: 0.98
Before Saving Accuracy: 0.98
In [ ]: