Deep Learning Performance 2 Weight/Activity Regularization, Weight Constraints, Early Stopping and Checkpoints

Weight Regularization

The longer we train the network, the more specialized the weights will become to the training data, overfitting the training data. The weights will grow in size in order to handle the specifics of the examples seen in the training data.

Large weights make the network unstable. Although the weight will be specialized to the training dataset, minor variation or statistical noise on the expected inputs will result in large differences in the output.

The learning algorithm can be updated to encourage the network toward using small weights. One way to do this is to change the calculation of loss used in the optimization of the network to also consider the size of the weights.

In calculating the loss between the predicted and expected values in a batch, we can add the current size of all weights in the network or add in a layer to this calculation. This is called a penalty because we are penalizing the model proportional to the size of the weights in the model.

Larger weights result in a larger penalty, in the form of a larger loss score. The optimization algorithm will then push the model to have smaller weights, i.e. weights no larger than needed to perform well on the training dataset.

Smaller weights are considered more regular or less specialized and as such, we refer to this penalty as weight regularization.

Alt text that describes the graphic

Types of weight regularization:

  • l1 (Lasso): Sum of the weights $$ \lambda \Sigma_{i=1}^{k} |w_{i}| $$
  • l2 (Ridge): Sum of the square of weights $$ \lambda \Sigma_{i=1}^{k} w_{i}^{2} $$
  • l1 + l2 (Elastic Net): Sum of the absolute and the squared weights. $$ \frac{\Sigma_{i=1}^{n}(y_{i} - x_{i}^{J} \hat{\beta})^{2}}{2n} + \lambda \left( \frac{1 - a}{2} \sum_{j=1}^{m} \hat{\beta_{j}^{2}} + a \sum_{j=1}^{m} |\hat{\beta_{j}}| \right) $$

Each requires a hyperparameter that must be configured.

Weight Regularization API in Keras

Keras provides a weight regularization API that allows you to add a penalty for weight size to the loss function.

By default, no regularizer is used in any layers.

A weight regularizer can be added to each layer when the layer is defined in a Keras model.

This is achieved by setting the kernel_regularizer argument on each layer. A separate regularizer can also be used for the bias via the bias_regularizer argument, although this is less often used.

The regularizers are provided under keras.regularizers and have the names l1, l2 and l1_l2. Each takes the regularizer hyperparameter as an argument. For example:

In [29]:
import keras

keras.regularizers.l1(0.01)
keras.regularizers.l2(0.01)
keras.regularizers.l1_l2(l1=0.01, l2=0.01)
Out[29]:
<keras.regularizers.L1L2 at 0x7f3ad992de48>

Weight Regularization for Dense Layers

In [30]:
# example of l2 on a dense layer
from keras.regularizers import l1, l2, l1_l2
from keras.layers import Dense

Dense(32, kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01))
Out[30]:
<keras.layers.core.Dense at 0x7f3ad98f02e8>

Weight Regularization for Convolutional Layers

Like the Dense layer, the Convolutional layers (e.g. Conv1D and Conv2D) also use the kernel_regularizer and bias_regularizer arguments to define a regularizer.

The example below sets an l2 regularizer on a Conv2D convolutional layer:

In [31]:
# example of l2 on a convolutional layer
from keras.layers import Conv2D

Conv2D(32, (3,3), kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01))
Out[31]:
<keras.layers.convolutional.Conv2D at 0x7f3ad9848e80>

Weight Regularization for Recurrent Layers

Recurrent layers like the LSTM offer more flexibility in regularizing the weights.

The input, recurrent, and bias weights can all be regularized separately via the kernel_regularizer, recurrent_regularizer, and bias_regularizer arguments.

The example below sets an l2 regularizer on an LSTM recurrent layer:

In [32]:
# example of l2 on an lstm layer
from keras.layers import LSTM

LSTM(32, kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01))
Out[32]:
<keras.layers.recurrent.LSTM at 0x7f3ad9837198>

Weight Regularization for Embedding Layers

Embedding layers use the embeddings_regularizer argument to define a regularizer.

The example below sets an l2 regularizer on an Embedding layer:

In [33]:
# example of l2 on an embedding layer
from keras.layers import Embedding

Embedding(input_dim=10, output_dim=5, embeddings_regularizer=l2(0.01))
Out[33]:
<keras.layers.embeddings.Embedding at 0x7f3ad97cc7b8>

Examples of Weight Regularization

It can be helpful to look at some examples of weight regularization configurations reported in the literature.

It is important to select and tune a regularization technique specific to your network and dataset, although real examples can also give an idea of common configurations that may be a useful starting point.

MLP Weight Regularization

  • The most common type of regularization is L2, also called simply weight decay, with values often on a logarithmic scale between $0$ and $0.1$, such as $0.1, 0.001, 0.0001$, etc.

CNN Weight Regularization

  • Weight regularization does not seem widely used in CNN models, or if it is used, its use is not widely reported.

  • L2 weight regularization with very small regularization hyperparameters such as (e.g. $0.0005$ or $5 x 10^{−4}$) may be a good starting point.

LSTM Weight Regularization

  • It is common to use weight regularization with LSTM models.

  • An often used configuration is L2 (weight decay) and very small hyperparameters (e.g. $10^{−6}$). It is often not reported what weights are regularized (input, recurrent, and/or bias), although one would assume that both input and recurrent weights are regularized only.

Binary Classification Regularization Example

We will use a standard binary classification problem that defines two semi-circles of observations: one semi-circle for each class.

Each observation has two input variables with the same scale and a class output value of either $0$ or $1$. This dataset is called the “moons” dataset because of the shape of the observations in each class.

We will use the sklearn make_moons() function to generate observations for this problem. We will add noise to the data and seed the random number generator so that the same samples are generated each time the code is run.

In [34]:
# generate two moons dataset
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt  
import pandas as pd

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)

# scatter plot, dots colored by class value
df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
grouped = df.groupby('label')

fig, ax = plt.subplots(figsize=(10,5))

for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key], s=50, alpha=0.5)
plt.show()

This is a good test problem because the classes cannot be separated by a line, e.g. are not linearly separable, requiring a nonlinear method such as a neural network to address.

We have only generated $100$ samples, which is small for a neural network, providing the opportunity to overfit the training dataset and have higher error on the test dataset: a good case for using regularization. Further, the samples have noise, giving the model an opportunity to learn aspects of the samples that don’t generalize.

Overfit Multilayer Perceptron Model

The model will have one hidden layer with more nodes that may be required to solve this problem, providing an opportunity to overfit. We will also train the model for longer than is required to ensure the model overfits.

Before we define the model, we will split the dataset into train and test sets, using $30$ examples to train the model and $70$ to evaluate the fit model’s performance.

In [35]:
# overfit mlp for the moons dataset
from keras.models import Sequential

# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['accuracy'], label=f'train, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_accuracy'], label=f'test, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()

We can see an expected shape of an overfit model where test accuracy increases to a point and then begins to decrease again.

MLP Model With Weight Regularization

We can add weight regularization to the hidden layer to reduce the overfitting of the model to the training dataset and improve the performance on the holdout set.

We will use the L2 vector norm also called weight decay with a regularization parameter (called alpha or lambda) of $0.001$, chosen arbitrarily.

This can be done by adding the kernel_regularizer argument to the layer and setting it to an instance of l2.

In [36]:
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(0.001)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['accuracy'], label=f'train, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_accuracy'], label=f'test, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()

As expected, we see the learning curve on the test dataset rise and then plateau, indicating that the model may not have overfit the training dataset.

Grid Search Regularization Hyperparameter

Once you can confirm that weight regularization may improve your overfit model, you can test different values of the regularization parameter.

It is a good practice to first grid search through some orders of magnitude between $0.0$ and $0.1$, then once a level is found, to grid search on that level.

We can grid search through the orders of magnitude by defining the values to test, looping through each and recording the train and test performance.

In [37]:
# grid search values
values = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]

all_train, all_test = list(), list()
for param in values:
    
    # define model
    model = Sequential()
    model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(param)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    # fit model
    model.fit(trainX, trainy, epochs=4000, verbose=0)
    
    # evaluate the model
    _, train_acc = model.evaluate(trainX, trainy, verbose=0)
    _, test_acc = model.evaluate(testX, testy, verbose=0)
    print('Param: %f, Train: %.3f, Test: %.3f' % (param, train_acc, test_acc))
    all_train.append(train_acc)
    all_test.append(test_acc)
Param: 0.100000, Train: 0.967, Test: 0.829
Param: 0.010000, Train: 1.000, Test: 0.929
Param: 0.001000, Train: 1.000, Test: 0.943
Param: 0.000100, Train: 1.000, Test: 0.929
Param: 0.000010, Train: 1.000, Test: 0.914
Param: 0.000001, Train: 1.000, Test: 0.914
In [38]:
# plot train and test means
plt.semilogx(values, all_train, label='train', marker='o')
plt.semilogx(values, all_test, label='test', marker='o')
plt.xlabel('regularization strength')
plt.ylabel('accuracy')
plt.legend()
plt.show()

The results suggest that $0.01$ or $0.001$ may be sufficient and may provide good bounds for further grid searching.

Activity Regularization

Deep learning models are capable of automatically learning a rich internal representation from raw input data.

This is called feature or representation learning. Better learned representations, in turn, can lead to better insights into the domain, e.g. via visualization of learned features, and to better predictive models that make use of the learned features.

A problem with learned features is that they can be too specialized to the training data, or overfit, and not generalize well to new examples. Large values in the learned representation can be a sign of the representation being overfit. Activity or representation regularization provides a technique to encourage the learned representations, the output or activation of the hidden layer or layers of the network, to stay small and sparse.

Problem With Learned Features

There is a field of study focused on the efficient and effective automatic learning of features, often investigated by having a network reduce an input to a small learned feature before using a second network to reconstruct the original input from the learned feature. Models of this type are called auto-encoders, or encoder-decoders, and their learned features can be useful to learn more about the domain (e.g. via visualization) and in predictive models.

The learned features, or encoded inputs, must be large enough to capture the salient features of the input but also focused enough to not over-fit the specific examples in the training dataset. As such, there is a tension between the expressiveness and the generalization of the learned features.

Encourage Small Activations

The loss function of the network can be updated to penalize models in proportion to the magnitude of their activation.

This is similar to weight regularization where the loss function is updated to penalize the model in proportion to the magnitude of the weights. The output of a layer is referred to as its activation, as such, this form of penalty or regularization is referred to as activation regularization or activity regularization.

Tips for Using Activation Regularization

Use With Autoencoders and Encoder-Decoders

  • Activity regularization may be best suited to those model types that explicitly seek an efficient learned representation.

  • These include models such as autoencoders (i.e. sparse autoencoders) and encoder-decoder models, such as encoder-decoder LSTMs used for sequence-to-sequence prediction problems.

Use Rectified Linear

  • Unlike classical activation functions such as tanh (hyperbolic tangent function) and sigmoid (logistic function), the relu function allows exact zero values easily. This makes it a good candidate when learning sparse representations, such as with the l1 vector norm activation regularization.

Use an Overcomplete Representation

  • Configure the layer chosen to be the learned features, e.g. the output of the encoder or the bottleneck in the autoencoder, to have more nodes that may be required.

  • This is called an overcomplete representation that will encourage the network to overfit the training examples. This can be countered with a strong activation regularization in order to encourage a rich learned representation that is also sparse.

Activity Regularization in Keras

Keras supports activity regularization.

Just like weight regularization it accepts l1, l2 and l1_l2 regularizers.

Activity Regularization on Layers

Activity regularization is specified on a layer in Keras.

This can be achieved by setting the activity_regularizer argument on the layer to an instantiated and configured regularizer class.

The regularizer is applied to the output of the layer, but you have control over what the “output” of the layer actually means. Specifically, you have flexibility as to whether the layer output means that the regularization is applied before or after the activation function.

For example, you can specify the function and the regularization on the layer, in which case activation regularization is applied to the output of the activation function, in this case, rectified linear activation function or ReLU.

In [39]:
Dense(32, activation='relu', activity_regularizer=l1(0.001))
Out[39]:
<keras.layers.core.Dense at 0x7f3b401efcc0>

Alternately, you can specify a linear activation function (the default, that does not perform any transform) which means that the activation regularization is applied on the raw outputs, then, the activation function can be added as a subsequent layer.

In [40]:
from keras.layers import Activation

Dense(32, activation='linear', activity_regularizer=l1(0.001))
Activation('relu')
Out[40]:
<keras.layers.core.Activation at 0x7f3ad87b94a8>

The latter is the preferred usage of activation regularization as described in “Deep Sparse Rectifier Neural Networks” in order to allow the model to learn to take activations to a true zero value in conjunction with the rectified linear activation function. Nevertheless, the two possible uses of activation regularization may be explored in order to discover what works best for your specific model and dataset.

MLP Activity Regularization

The example below sets l1 norm activity regularization on a Dense fully connected layer.

In [41]:
Dense(32, activity_regularizer=l1(0.001))
Out[41]:
<keras.layers.core.Dense at 0x7f3ad87b9518>

CNN Activity Regularization

The example below sets l1 norm activity regularization on a Conv2D convolutional layer.

In [42]:
Conv2D(32, (3,3), activity_regularizer=l1(0.001))
Out[42]:
<keras.layers.convolutional.Conv2D at 0x7f3ad8d21f28>

RNN Activity Regularization

The example below sets l1 norm activity regularization on an LSTM recurrent layer.

In [43]:
LSTM(32, activity_regularizer=l1(0.001))
Out[43]:
<keras.layers.recurrent.LSTM at 0x7f3ad87b3a58>

Embedding Layer Activity Regularization

The example below sets l1 norm activity regularization on an embedding layer.

In [44]:
Embedding(10, 5, activity_regularizer=l1(0.001))
Out[44]:
<keras.layers.embeddings.Embedding at 0x7f3ad87b3dd8>

Activity Regularization Binary Classification Example

In this section, we will demonstrate how to use activity regularization to reduce overfitting of an MLP on the same binary classification problem in the previous example.

Although activity regularization is most often used to encourage sparse learned representations in autoencoder and encoder-decoder models, it can also be used directly within normal neural networks to achieve the same effect and improve the generalization of the model.

In [45]:
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='linear', activity_regularizer=l1(0.0001)))
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['accuracy'], label=f'train, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_accuracy'], label=f'test, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()

Model accuracy on both the train and test sets continues to increase to a plateau.

Weight Constraints

Weight constraints provide an approach to reduce the overfitting. A weight constraint is an update to the network that checks the size of the weights, and if the size exceeds a predefined limit, the weights are rescaled so that their size is below the limit or between a range.

You can think of a weight constraint as an if-then rule checking the size of the weights while the network is being trained and only coming into effect and making weights small when required. Note, for efficiency, it does not have to be implemented as an if-then rule and often is not.

Unlike adding a penalty to the loss function, a weight constraint ensures the weights of the network are small, instead of mearly encouraging them to be small.

There are multiple types of weight constraints, such as maximum and unit vector norms, and some require a hyperparameter that must be configured.

Vector Norm:

  • Calculating the size or length of a vector is often required either directly or as part of a broader vector or vector-matrix operation.

  • The length of the vector is referred to as the vector norm or the vector’s magnitude.

Weight Constraints in Keras

The Keras API supports weight constraints.

The constraints are specified per-layer, but applied and enforced per-node within the layer.

Using a constraint generally involves setting the kernel_constraint argument on the layer for the input weights and the bias_constraint for the bias weights.

Generally, weight constraints are not used on the bias weights.

A suite of different vector norms can be used as constraints, provided as classes in the keras.constraints module. They are:

  • Maximum norm (max_norm), to force weights to have a magnitude at or below a given limit.
  • Non-negative norm (non_neg), to force weights to have a positive magnitude.
  • Unit norm (unit_norm), to force weights to have a magnitude of $1.0$.
  • Min-Max norm (min_max_norm), to force weights to have a magnitude between a range.

For example, a constraint can imported and instantiated:

In [46]:
# import norm
from keras.constraints import max_norm

# instantiate norm
norm = max_norm(3.0)

Weight Constraints on Layers

The weight norms can be used with most layers in Keras.

MLP Weight Constraint

The example below sets a maximum norm weight constraint on a Dense fully connected layer.

In [47]:
Dense(32, kernel_constraint=max_norm(3), bias_constraint=max_norm(3))
Out[47]:
<keras.layers.core.Dense at 0x7f3ad8e7a208>

CNN Weight Constraint

The example below sets a maximum norm weight constraint on a convolutional layer.

In [48]:
Conv2D(32, (3,3), kernel_constraint=max_norm(3), bias_constraint=max_norm(3))
Out[48]:
<keras.layers.convolutional.Conv2D at 0x7f3ad9f6f860>

RNN Weight Constraint

Unlike other layer types, recurrent neural networks allow you to set a weight constraint on both the input weights and bias, as well as the recurrent input weights.

The constraint for the recurrent weights is set via the recurrent_constraint argument to the layer.

The example below sets a maximum norm weight constraint on an LSTM layer.

In [49]:
LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint=max_norm(3))
Out[49]:
<keras.layers.recurrent.LSTM at 0x7f3ad9f6ff60>

Embedding Layer Weight Constraint

The example below sets a maximum norm weight constraint on a embedding layer.

In [50]:
Embedding(10, 5, embeddings_constraint=max_norm(3))
Out[50]:
<keras.layers.embeddings.Embedding at 0x7f3b401dcba8>

Weight Constraint Binary Classification Example

In this section, we will demonstrate how to use weight constraints to reduce overfitting of an MLP on the same binary classification problem in the previous example.

There are a few different weight constraints to choose from. A good simple constraint for this model is to simply normalize the weights so that the norm is equal to $1.0$.

This constraint has the effect of forcing all incoming weights to be small.

We can do this by using the unit_norm in Keras. This constraint can be added to the first hidden layer.

In [51]:
from keras.constraints import unit_norm

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['accuracy'], label=f'train, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_accuracy'], label=f'test, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()

Model accuracy on both the train and test sets continues to increase to a plateau.

Early Stopping in Keras

A problem with training neural networks is in the choice of the number of training epochs to use.

Too many epochs can lead to overfitting of the training dataset, whereas too few may result in an underfit model. Early stopping is a method that allows you to specify an arbitrary large number of training epochs and stop training once the model performance stops improving on a hold out validation dataset.

Keras supports the early stopping of training via a callback called EarlyStopping.

This callback allows you to specify the performance measure to monitor, the trigger, and once triggered, it will stop the training process.

The EarlyStopping callback is configured when instantiated via arguments.

There are a number of parameters that are specified to the EarlyStopping object.

  • min_delta This value should be kept small. It simply means the minimum change in error to be registered as an improvement. Setting it even smaller will not likely have a great deal of impact.
  • patience How long should the training wait for the validation error to improve?
  • verbose How much progress information do you want?
  • mode In general, always set this to "auto". This allows you to specify if the error should be minimized or maximized. Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
  • restore_best_weights This should always be set to true. This restores the weights to the values they were at when the validation set is the highest.

Early Stopping Classification Example

First we build a model without early stopping with a large number of epochs to encourage overfitting with the same dataset used in the previous examples.

In [52]:
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['loss'], label=f'train loss, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_loss'], label=f'test loss, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()

Reviewing the figure, we can also see flat spots in the ups and downs in the validation loss. Any early stopping will have to account for these behaviors. We would also expect that a good time to stop training might be around epoch $800$.

Overfit MLP With Early Stopping

In [53]:
from keras.callbacks import EarlyStopping

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# simple early stopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1)

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0, callbacks=[es])

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot history
plt.plot(history.history['loss'], label=f'train loss, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_loss'], label=f'test loss, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()
Epoch 00224: early stopping

Reviewing the line plot of train and test loss, we can indeed see that training was stopped at the point when validation loss began to plateau for the first time.

We can improve the trigger for early stopping by waiting a while before stopping.

This can be achieved by setting the “patience” argument.

In this case, we will wait $200$ epochs before training is stopped. Specifically, this means that we will allow training to continue for up to an additional $200$ epochs after the point that validation loss started to degrade, giving the training process an opportunity to get across flat spots or find some additional improvement.

In [54]:
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# patient early stopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=200)

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0, callbacks=[es])

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)

# plot training history
plt.plot(history.history['loss'], label=f'train loss, accuracy {round(train_acc, 3)}')
plt.plot(history.history['val_loss'], label=f'test loss, accuracy {round(test_acc, 3)}')
plt.legend()
plt.show()
Epoch 01078: early stopping

We can also see that test loss started to increase again in the last approximately $100$ epochs.

Model Checkpointing in Keras

Although the performance of the model has improved, we may not have the best performing or most stable model at the end of training. We can address this by using a ModelChecckpoint callback.

There are a number of parameters that are specified to the ModelChecckpoint object.

  • filepath Path to save the model file.
  • save_weights_only If true, then only the model's weights will be saved
  • verbose How much progress information do you want?
  • mode In general, always set this to "auto". This allows you to specify if the error should be minimized or maximized. Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
  • save_best_only This should always be set to true. This means the latest best model according to the quantity monitored will be saved.

In this case, we are interested in saving the model with the best accuracy on the test dataset. We could also seek the model with the best loss on the test dataset, but this may or may not correspond to the model with the best accuracy.

This highlights an important concept in model selection. The notion of the “best” model during training may conflict when evaluated using different performance measures. Try to choose models based on the metric by which they will be evaluated and presented in the domain. In a balanced binary classification problem, this will most likely be classification accuracy. Therefore, we will use accuracy on the validation in the ModelCheckpoint callback to save the best model observed during training.

In [55]:
from keras.callbacks import ModelCheckpoint
from keras.models import load_model

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# simple early stopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=200)
mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0, callbacks=[es, mc])

# load the saved model
saved_model = load_model('best_model.h5')

# evaluate the model
_, train_acc = saved_model.evaluate(trainX, trainy, verbose=0)
_, test_acc = saved_model.evaluate(testX, testy, verbose=0)
Epoch 00001: val_accuracy improved from -inf to 0.62857, saving model to best_model.h5

Epoch 00002: val_accuracy improved from 0.62857 to 0.64286, saving model to best_model.h5

Epoch 00003: val_accuracy improved from 0.64286 to 0.67143, saving model to best_model.h5

Epoch 00004: val_accuracy did not improve from 0.67143

Epoch 00005: val_accuracy did not improve from 0.67143

Epoch 00006: val_accuracy did not improve from 0.67143

Epoch 00007: val_accuracy improved from 0.67143 to 0.68571, saving model to best_model.h5

Epoch 00008: val_accuracy did not improve from 0.68571

Epoch 00009: val_accuracy did not improve from 0.68571

Epoch 00010: val_accuracy did not improve from 0.68571

Epoch 00011: val_accuracy did not improve from 0.68571

Epoch 00012: val_accuracy did not improve from 0.68571

Epoch 00013: val_accuracy did not improve from 0.68571

Epoch 00014: val_accuracy improved from 0.68571 to 0.71429, saving model to best_model.h5

Epoch 00015: val_accuracy did not improve from 0.71429

Epoch 00016: val_accuracy did not improve from 0.71429

Epoch 00017: val_accuracy did not improve from 0.71429

Epoch 00018: val_accuracy did not improve from 0.71429

Epoch 00019: val_accuracy did not improve from 0.71429

Epoch 00020: val_accuracy did not improve from 0.71429

Epoch 00021: val_accuracy did not improve from 0.71429

Epoch 00022: val_accuracy did not improve from 0.71429

Epoch 00023: val_accuracy did not improve from 0.71429

Epoch 00024: val_accuracy did not improve from 0.71429

Epoch 00025: val_accuracy did not improve from 0.71429

Epoch 00026: val_accuracy did not improve from 0.71429

Epoch 00027: val_accuracy did not improve from 0.71429

Epoch 00028: val_accuracy did not improve from 0.71429

Epoch 00029: val_accuracy did not improve from 0.71429

Epoch 00030: val_accuracy did not improve from 0.71429

Epoch 00031: val_accuracy did not improve from 0.71429

Epoch 00032: val_accuracy did not improve from 0.71429

Epoch 00033: val_accuracy did not improve from 0.71429

Epoch 00034: val_accuracy did not improve from 0.71429

Epoch 00035: val_accuracy improved from 0.71429 to 0.72857, saving model to best_model.h5

Epoch 00036: val_accuracy did not improve from 0.72857

Epoch 00037: val_accuracy did not improve from 0.72857

Epoch 00038: val_accuracy did not improve from 0.72857

Epoch 00039: val_accuracy did not improve from 0.72857

Epoch 00040: val_accuracy did not improve from 0.72857

Epoch 00041: val_accuracy did not improve from 0.72857

Epoch 00042: val_accuracy did not improve from 0.72857

Epoch 00043: val_accuracy did not improve from 0.72857

Epoch 00044: val_accuracy did not improve from 0.72857

Epoch 00045: val_accuracy did not improve from 0.72857

Epoch 00046: val_accuracy did not improve from 0.72857

Epoch 00047: val_accuracy did not improve from 0.72857

Epoch 00048: val_accuracy did not improve from 0.72857

Epoch 00049: val_accuracy did not improve from 0.72857

Epoch 00050: val_accuracy did not improve from 0.72857

Epoch 00051: val_accuracy did not improve from 0.72857

Epoch 00052: val_accuracy did not improve from 0.72857

Epoch 00053: val_accuracy did not improve from 0.72857

Epoch 00054: val_accuracy did not improve from 0.72857

Epoch 00055: val_accuracy did not improve from 0.72857

Epoch 00056: val_accuracy did not improve from 0.72857

Epoch 00057: val_accuracy did not improve from 0.72857

Epoch 00058: val_accuracy did not improve from 0.72857

Epoch 00059: val_accuracy did not improve from 0.72857

Epoch 00060: val_accuracy did not improve from 0.72857

Epoch 00061: val_accuracy did not improve from 0.72857

Epoch 00062: val_accuracy did not improve from 0.72857

Epoch 00063: val_accuracy did not improve from 0.72857

Epoch 00064: val_accuracy did not improve from 0.72857

Epoch 00065: val_accuracy improved from 0.72857 to 0.74286, saving model to best_model.h5

Epoch 00066: val_accuracy did not improve from 0.74286

Epoch 00067: val_accuracy did not improve from 0.74286

Epoch 00068: val_accuracy did not improve from 0.74286

Epoch 00069: val_accuracy did not improve from 0.74286

Epoch 00070: val_accuracy did not improve from 0.74286

Epoch 00071: val_accuracy did not improve from 0.74286

Epoch 00072: val_accuracy did not improve from 0.74286

Epoch 00073: val_accuracy did not improve from 0.74286

Epoch 00074: val_accuracy did not improve from 0.74286

Epoch 00075: val_accuracy did not improve from 0.74286

Epoch 00076: val_accuracy did not improve from 0.74286

Epoch 00077: val_accuracy did not improve from 0.74286

Epoch 00078: val_accuracy did not improve from 0.74286

Epoch 00079: val_accuracy did not improve from 0.74286

Epoch 00080: val_accuracy did not improve from 0.74286

Epoch 00081: val_accuracy did not improve from 0.74286

Epoch 00082: val_accuracy did not improve from 0.74286

Epoch 00083: val_accuracy did not improve from 0.74286

Epoch 00084: val_accuracy did not improve from 0.74286

Epoch 00085: val_accuracy did not improve from 0.74286

Epoch 00086: val_accuracy did not improve from 0.74286

Epoch 00087: val_accuracy did not improve from 0.74286

Epoch 00088: val_accuracy did not improve from 0.74286

Epoch 00089: val_accuracy did not improve from 0.74286

Epoch 00090: val_accuracy did not improve from 0.74286

Epoch 00091: val_accuracy did not improve from 0.74286

Epoch 00092: val_accuracy did not improve from 0.74286

Epoch 00093: val_accuracy did not improve from 0.74286

Epoch 00094: val_accuracy did not improve from 0.74286

Epoch 00095: val_accuracy did not improve from 0.74286

Epoch 00096: val_accuracy did not improve from 0.74286

Epoch 00097: val_accuracy did not improve from 0.74286

Epoch 00098: val_accuracy did not improve from 0.74286

Epoch 00099: val_accuracy did not improve from 0.74286

Epoch 00100: val_accuracy did not improve from 0.74286

Epoch 00101: val_accuracy did not improve from 0.74286

Epoch 00102: val_accuracy improved from 0.74286 to 0.75714, saving model to best_model.h5

Epoch 00103: val_accuracy did not improve from 0.75714

Epoch 00104: val_accuracy did not improve from 0.75714

Epoch 00105: val_accuracy did not improve from 0.75714

Epoch 00106: val_accuracy did not improve from 0.75714

Epoch 00107: val_accuracy did not improve from 0.75714

Epoch 00108: val_accuracy did not improve from 0.75714

Epoch 00109: val_accuracy improved from 0.75714 to 0.77143, saving model to best_model.h5

Epoch 00110: val_accuracy did not improve from 0.77143

Epoch 00111: val_accuracy did not improve from 0.77143

Epoch 00112: val_accuracy improved from 0.77143 to 0.78571, saving model to best_model.h5

Epoch 00113: val_accuracy did not improve from 0.78571

Epoch 00114: val_accuracy did not improve from 0.78571

Epoch 00115: val_accuracy did not improve from 0.78571

Epoch 00116: val_accuracy did not improve from 0.78571

Epoch 00117: val_accuracy did not improve from 0.78571

Epoch 00118: val_accuracy improved from 0.78571 to 0.80000, saving model to best_model.h5

Epoch 00119: val_accuracy improved from 0.80000 to 0.81429, saving model to best_model.h5

Epoch 00120: val_accuracy did not improve from 0.81429

Epoch 00121: val_accuracy did not improve from 0.81429

Epoch 00122: val_accuracy did not improve from 0.81429

Epoch 00123: val_accuracy did not improve from 0.81429

Epoch 00124: val_accuracy improved from 0.81429 to 0.82857, saving model to best_model.h5

Epoch 00125: val_accuracy did not improve from 0.82857

Epoch 00126: val_accuracy did not improve from 0.82857

Epoch 00127: val_accuracy did not improve from 0.82857

Epoch 00128: val_accuracy did not improve from 0.82857

Epoch 00129: val_accuracy did not improve from 0.82857

Epoch 00130: val_accuracy did not improve from 0.82857

Epoch 00131: val_accuracy did not improve from 0.82857

Epoch 00132: val_accuracy did not improve from 0.82857

Epoch 00133: val_accuracy did not improve from 0.82857

Epoch 00134: val_accuracy did not improve from 0.82857

Epoch 00135: val_accuracy did not improve from 0.82857

Epoch 00136: val_accuracy did not improve from 0.82857

Epoch 00137: val_accuracy did not improve from 0.82857

Epoch 00138: val_accuracy did not improve from 0.82857

Epoch 00139: val_accuracy did not improve from 0.82857

Epoch 00140: val_accuracy did not improve from 0.82857

Epoch 00141: val_accuracy did not improve from 0.82857

Epoch 00142: val_accuracy did not improve from 0.82857

Epoch 00143: val_accuracy did not improve from 0.82857

Epoch 00144: val_accuracy did not improve from 0.82857

Epoch 00145: val_accuracy did not improve from 0.82857

Epoch 00146: val_accuracy did not improve from 0.82857

Epoch 00147: val_accuracy did not improve from 0.82857

Epoch 00148: val_accuracy did not improve from 0.82857

Epoch 00149: val_accuracy did not improve from 0.82857

Epoch 00150: val_accuracy did not improve from 0.82857

Epoch 00151: val_accuracy did not improve from 0.82857

Epoch 00152: val_accuracy did not improve from 0.82857

Epoch 00153: val_accuracy did not improve from 0.82857

Epoch 00154: val_accuracy did not improve from 0.82857

Epoch 00155: val_accuracy did not improve from 0.82857

Epoch 00156: val_accuracy did not improve from 0.82857

Epoch 00157: val_accuracy did not improve from 0.82857

Epoch 00158: val_accuracy did not improve from 0.82857

Epoch 00159: val_accuracy did not improve from 0.82857

Epoch 00160: val_accuracy did not improve from 0.82857

Epoch 00161: val_accuracy did not improve from 0.82857

Epoch 00162: val_accuracy did not improve from 0.82857

Epoch 00163: val_accuracy did not improve from 0.82857

Epoch 00164: val_accuracy did not improve from 0.82857

Epoch 00165: val_accuracy did not improve from 0.82857

Epoch 00166: val_accuracy did not improve from 0.82857

Epoch 00167: val_accuracy did not improve from 0.82857

Epoch 00168: val_accuracy did not improve from 0.82857

Epoch 00169: val_accuracy did not improve from 0.82857

Epoch 00170: val_accuracy did not improve from 0.82857

Epoch 00171: val_accuracy did not improve from 0.82857

Epoch 00172: val_accuracy did not improve from 0.82857

Epoch 00173: val_accuracy did not improve from 0.82857

Epoch 00174: val_accuracy did not improve from 0.82857

Epoch 00175: val_accuracy did not improve from 0.82857

Epoch 00176: val_accuracy did not improve from 0.82857

Epoch 00177: val_accuracy did not improve from 0.82857

Epoch 00178: val_accuracy did not improve from 0.82857

Epoch 00179: val_accuracy did not improve from 0.82857

Epoch 00180: val_accuracy did not improve from 0.82857

Epoch 00181: val_accuracy did not improve from 0.82857

Epoch 00182: val_accuracy did not improve from 0.82857

Epoch 00183: val_accuracy did not improve from 0.82857

Epoch 00184: val_accuracy did not improve from 0.82857

Epoch 00185: val_accuracy did not improve from 0.82857

Epoch 00186: val_accuracy did not improve from 0.82857

Epoch 00187: val_accuracy did not improve from 0.82857

Epoch 00188: val_accuracy did not improve from 0.82857

Epoch 00189: val_accuracy did not improve from 0.82857

Epoch 00190: val_accuracy did not improve from 0.82857

Epoch 00191: val_accuracy did not improve from 0.82857

Epoch 00192: val_accuracy did not improve from 0.82857

Epoch 00193: val_accuracy did not improve from 0.82857

Epoch 00194: val_accuracy did not improve from 0.82857

Epoch 00195: val_accuracy did not improve from 0.82857

Epoch 00196: val_accuracy did not improve from 0.82857

Epoch 00197: val_accuracy did not improve from 0.82857

Epoch 00198: val_accuracy did not improve from 0.82857

Epoch 00199: val_accuracy did not improve from 0.82857

Epoch 00200: val_accuracy did not improve from 0.82857

Epoch 00201: val_accuracy did not improve from 0.82857

Epoch 00202: val_accuracy did not improve from 0.82857

Epoch 00203: val_accuracy did not improve from 0.82857

Epoch 00204: val_accuracy did not improve from 0.82857

Epoch 00205: val_accuracy did not improve from 0.82857

Epoch 00206: val_accuracy did not improve from 0.82857

Epoch 00207: val_accuracy did not improve from 0.82857

Epoch 00208: val_accuracy did not improve from 0.82857

Epoch 00209: val_accuracy did not improve from 0.82857

Epoch 00210: val_accuracy did not improve from 0.82857

Epoch 00211: val_accuracy did not improve from 0.82857

Epoch 00212: val_accuracy did not improve from 0.82857

Epoch 00213: val_accuracy did not improve from 0.82857

Epoch 00214: val_accuracy did not improve from 0.82857

Epoch 00215: val_accuracy did not improve from 0.82857

Epoch 00216: val_accuracy did not improve from 0.82857

Epoch 00217: val_accuracy did not improve from 0.82857

Epoch 00218: val_accuracy did not improve from 0.82857

Epoch 00219: val_accuracy did not improve from 0.82857

Epoch 00220: val_accuracy did not improve from 0.82857

Epoch 00221: val_accuracy did not improve from 0.82857

Epoch 00222: val_accuracy did not improve from 0.82857

Epoch 00223: val_accuracy did not improve from 0.82857

Epoch 00224: val_accuracy did not improve from 0.82857

Epoch 00225: val_accuracy did not improve from 0.82857

Epoch 00226: val_accuracy did not improve from 0.82857

Epoch 00227: val_accuracy did not improve from 0.82857

Epoch 00228: val_accuracy did not improve from 0.82857

Epoch 00229: val_accuracy did not improve from 0.82857

Epoch 00230: val_accuracy did not improve from 0.82857

Epoch 00231: val_accuracy did not improve from 0.82857

Epoch 00232: val_accuracy did not improve from 0.82857

Epoch 00233: val_accuracy did not improve from 0.82857

Epoch 00234: val_accuracy did not improve from 0.82857

Epoch 00235: val_accuracy did not improve from 0.82857

Epoch 00236: val_accuracy did not improve from 0.82857

Epoch 00237: val_accuracy did not improve from 0.82857

Epoch 00238: val_accuracy did not improve from 0.82857

Epoch 00239: val_accuracy did not improve from 0.82857

Epoch 00240: val_accuracy did not improve from 0.82857

Epoch 00241: val_accuracy did not improve from 0.82857

Epoch 00242: val_accuracy did not improve from 0.82857

Epoch 00243: val_accuracy did not improve from 0.82857

Epoch 00244: val_accuracy did not improve from 0.82857

Epoch 00245: val_accuracy did not improve from 0.82857

Epoch 00246: val_accuracy did not improve from 0.82857

Epoch 00247: val_accuracy did not improve from 0.82857

Epoch 00248: val_accuracy did not improve from 0.82857

Epoch 00249: val_accuracy did not improve from 0.82857

Epoch 00250: val_accuracy did not improve from 0.82857

Epoch 00251: val_accuracy did not improve from 0.82857

Epoch 00252: val_accuracy did not improve from 0.82857

Epoch 00253: val_accuracy did not improve from 0.82857

Epoch 00254: val_accuracy did not improve from 0.82857

Epoch 00255: val_accuracy did not improve from 0.82857

Epoch 00256: val_accuracy did not improve from 0.82857

Epoch 00257: val_accuracy did not improve from 0.82857

Epoch 00258: val_accuracy did not improve from 0.82857

Epoch 00259: val_accuracy did not improve from 0.82857

Epoch 00260: val_accuracy did not improve from 0.82857

Epoch 00261: val_accuracy did not improve from 0.82857

Epoch 00262: val_accuracy did not improve from 0.82857

Epoch 00263: val_accuracy did not improve from 0.82857

Epoch 00264: val_accuracy did not improve from 0.82857

Epoch 00265: val_accuracy did not improve from 0.82857

Epoch 00266: val_accuracy did not improve from 0.82857

Epoch 00267: val_accuracy did not improve from 0.82857

Epoch 00268: val_accuracy did not improve from 0.82857

Epoch 00269: val_accuracy did not improve from 0.82857

Epoch 00270: val_accuracy did not improve from 0.82857

Epoch 00271: val_accuracy did not improve from 0.82857

Epoch 00272: val_accuracy did not improve from 0.82857

Epoch 00273: val_accuracy did not improve from 0.82857

Epoch 00274: val_accuracy did not improve from 0.82857

Epoch 00275: val_accuracy did not improve from 0.82857

Epoch 00276: val_accuracy did not improve from 0.82857

Epoch 00277: val_accuracy did not improve from 0.82857

Epoch 00278: val_accuracy did not improve from 0.82857

Epoch 00279: val_accuracy did not improve from 0.82857

Epoch 00280: val_accuracy did not improve from 0.82857

Epoch 00281: val_accuracy did not improve from 0.82857

Epoch 00282: val_accuracy improved from 0.82857 to 0.84286, saving model to best_model.h5

Epoch 00283: val_accuracy did not improve from 0.84286

Epoch 00284: val_accuracy did not improve from 0.84286

Epoch 00285: val_accuracy did not improve from 0.84286

Epoch 00286: val_accuracy did not improve from 0.84286

Epoch 00287: val_accuracy did not improve from 0.84286

Epoch 00288: val_accuracy did not improve from 0.84286

Epoch 00289: val_accuracy did not improve from 0.84286

Epoch 00290: val_accuracy did not improve from 0.84286

Epoch 00291: val_accuracy did not improve from 0.84286

Epoch 00292: val_accuracy did not improve from 0.84286

Epoch 00293: val_accuracy did not improve from 0.84286

Epoch 00294: val_accuracy did not improve from 0.84286

Epoch 00295: val_accuracy did not improve from 0.84286

Epoch 00296: val_accuracy did not improve from 0.84286

Epoch 00297: val_accuracy did not improve from 0.84286

Epoch 00298: val_accuracy did not improve from 0.84286

Epoch 00299: val_accuracy did not improve from 0.84286

Epoch 00300: val_accuracy did not improve from 0.84286

Epoch 00301: val_accuracy did not improve from 0.84286

Epoch 00302: val_accuracy did not improve from 0.84286

Epoch 00303: val_accuracy did not improve from 0.84286

Epoch 00304: val_accuracy did not improve from 0.84286

Epoch 00305: val_accuracy did not improve from 0.84286

Epoch 00306: val_accuracy did not improve from 0.84286

Epoch 00307: val_accuracy did not improve from 0.84286

Epoch 00308: val_accuracy did not improve from 0.84286

Epoch 00309: val_accuracy did not improve from 0.84286

Epoch 00310: val_accuracy did not improve from 0.84286

Epoch 00311: val_accuracy did not improve from 0.84286

Epoch 00312: val_accuracy did not improve from 0.84286

Epoch 00313: val_accuracy did not improve from 0.84286

Epoch 00314: val_accuracy did not improve from 0.84286

Epoch 00315: val_accuracy did not improve from 0.84286

Epoch 00316: val_accuracy did not improve from 0.84286

Epoch 00317: val_accuracy did not improve from 0.84286

Epoch 00318: val_accuracy did not improve from 0.84286

Epoch 00319: val_accuracy did not improve from 0.84286

Epoch 00320: val_accuracy did not improve from 0.84286

Epoch 00321: val_accuracy did not improve from 0.84286

Epoch 00322: val_accuracy did not improve from 0.84286

Epoch 00323: val_accuracy did not improve from 0.84286

Epoch 00324: val_accuracy did not improve from 0.84286

Epoch 00325: val_accuracy did not improve from 0.84286

Epoch 00326: val_accuracy did not improve from 0.84286

Epoch 00327: val_accuracy did not improve from 0.84286

Epoch 00328: val_accuracy did not improve from 0.84286

Epoch 00329: val_accuracy did not improve from 0.84286

Epoch 00330: val_accuracy did not improve from 0.84286

Epoch 00331: val_accuracy did not improve from 0.84286

Epoch 00332: val_accuracy did not improve from 0.84286

Epoch 00333: val_accuracy did not improve from 0.84286

Epoch 00334: val_accuracy did not improve from 0.84286

Epoch 00335: val_accuracy did not improve from 0.84286

Epoch 00336: val_accuracy did not improve from 0.84286

Epoch 00337: val_accuracy did not improve from 0.84286

Epoch 00338: val_accuracy did not improve from 0.84286

Epoch 00339: val_accuracy did not improve from 0.84286

Epoch 00340: val_accuracy did not improve from 0.84286

Epoch 00341: val_accuracy did not improve from 0.84286

Epoch 00342: val_accuracy did not improve from 0.84286

Epoch 00343: val_accuracy did not improve from 0.84286

Epoch 00344: val_accuracy did not improve from 0.84286

Epoch 00345: val_accuracy did not improve from 0.84286

Epoch 00346: val_accuracy did not improve from 0.84286

Epoch 00347: val_accuracy did not improve from 0.84286

Epoch 00348: val_accuracy did not improve from 0.84286

Epoch 00349: val_accuracy did not improve from 0.84286

Epoch 00350: val_accuracy did not improve from 0.84286

Epoch 00351: val_accuracy did not improve from 0.84286

Epoch 00352: val_accuracy did not improve from 0.84286

Epoch 00353: val_accuracy did not improve from 0.84286

Epoch 00354: val_accuracy did not improve from 0.84286

Epoch 00355: val_accuracy did not improve from 0.84286

Epoch 00356: val_accuracy did not improve from 0.84286

Epoch 00357: val_accuracy did not improve from 0.84286

Epoch 00358: val_accuracy did not improve from 0.84286

Epoch 00359: val_accuracy did not improve from 0.84286

Epoch 00360: val_accuracy did not improve from 0.84286

Epoch 00361: val_accuracy did not improve from 0.84286

Epoch 00362: val_accuracy did not improve from 0.84286

Epoch 00363: val_accuracy did not improve from 0.84286

Epoch 00364: val_accuracy did not improve from 0.84286

Epoch 00365: val_accuracy did not improve from 0.84286

Epoch 00366: val_accuracy did not improve from 0.84286

Epoch 00367: val_accuracy did not improve from 0.84286

Epoch 00368: val_accuracy did not improve from 0.84286

Epoch 00369: val_accuracy did not improve from 0.84286

Epoch 00370: val_accuracy did not improve from 0.84286

Epoch 00371: val_accuracy did not improve from 0.84286

Epoch 00372: val_accuracy did not improve from 0.84286

Epoch 00373: val_accuracy did not improve from 0.84286

Epoch 00374: val_accuracy did not improve from 0.84286

Epoch 00375: val_accuracy did not improve from 0.84286

Epoch 00376: val_accuracy did not improve from 0.84286

Epoch 00377: val_accuracy did not improve from 0.84286

Epoch 00378: val_accuracy did not improve from 0.84286

Epoch 00379: val_accuracy did not improve from 0.84286

Epoch 00380: val_accuracy did not improve from 0.84286

Epoch 00381: val_accuracy did not improve from 0.84286

Epoch 00382: val_accuracy did not improve from 0.84286

Epoch 00383: val_accuracy did not improve from 0.84286

Epoch 00384: val_accuracy did not improve from 0.84286

Epoch 00385: val_accuracy did not improve from 0.84286

Epoch 00386: val_accuracy did not improve from 0.84286

Epoch 00387: val_accuracy did not improve from 0.84286

Epoch 00388: val_accuracy did not improve from 0.84286

Epoch 00389: val_accuracy did not improve from 0.84286

Epoch 00390: val_accuracy did not improve from 0.84286

Epoch 00391: val_accuracy did not improve from 0.84286

Epoch 00392: val_accuracy did not improve from 0.84286

Epoch 00393: val_accuracy did not improve from 0.84286

Epoch 00394: val_accuracy did not improve from 0.84286

Epoch 00395: val_accuracy did not improve from 0.84286

Epoch 00396: val_accuracy did not improve from 0.84286

Epoch 00397: val_accuracy did not improve from 0.84286

Epoch 00398: val_accuracy did not improve from 0.84286

Epoch 00399: val_accuracy did not improve from 0.84286

Epoch 00400: val_accuracy did not improve from 0.84286

Epoch 00401: val_accuracy did not improve from 0.84286

Epoch 00402: val_accuracy did not improve from 0.84286

Epoch 00403: val_accuracy did not improve from 0.84286

Epoch 00404: val_accuracy did not improve from 0.84286

Epoch 00405: val_accuracy did not improve from 0.84286

Epoch 00406: val_accuracy did not improve from 0.84286

Epoch 00407: val_accuracy did not improve from 0.84286

Epoch 00408: val_accuracy did not improve from 0.84286

Epoch 00409: val_accuracy did not improve from 0.84286

Epoch 00410: val_accuracy did not improve from 0.84286

Epoch 00411: val_accuracy did not improve from 0.84286

Epoch 00412: val_accuracy did not improve from 0.84286

Epoch 00413: val_accuracy did not improve from 0.84286

Epoch 00414: val_accuracy did not improve from 0.84286

Epoch 00415: val_accuracy did not improve from 0.84286

Epoch 00416: val_accuracy did not improve from 0.84286

Epoch 00417: val_accuracy did not improve from 0.84286

Epoch 00418: val_accuracy did not improve from 0.84286

Epoch 00419: val_accuracy did not improve from 0.84286

Epoch 00420: val_accuracy did not improve from 0.84286

Epoch 00421: val_accuracy did not improve from 0.84286

Epoch 00422: val_accuracy did not improve from 0.84286

Epoch 00423: val_accuracy did not improve from 0.84286

Epoch 00424: val_accuracy did not improve from 0.84286

Epoch 00425: val_accuracy did not improve from 0.84286

Epoch 00426: val_accuracy did not improve from 0.84286

Epoch 00427: val_accuracy did not improve from 0.84286

Epoch 00428: val_accuracy did not improve from 0.84286

Epoch 00429: val_accuracy did not improve from 0.84286

Epoch 00430: val_accuracy did not improve from 0.84286

Epoch 00431: val_accuracy did not improve from 0.84286

Epoch 00432: val_accuracy did not improve from 0.84286

Epoch 00433: val_accuracy did not improve from 0.84286

Epoch 00434: val_accuracy did not improve from 0.84286

Epoch 00435: val_accuracy did not improve from 0.84286

Epoch 00436: val_accuracy did not improve from 0.84286

Epoch 00437: val_accuracy did not improve from 0.84286

Epoch 00438: val_accuracy did not improve from 0.84286

Epoch 00439: val_accuracy did not improve from 0.84286

Epoch 00440: val_accuracy did not improve from 0.84286

Epoch 00441: val_accuracy did not improve from 0.84286

Epoch 00442: val_accuracy did not improve from 0.84286

Epoch 00443: val_accuracy did not improve from 0.84286

Epoch 00444: val_accuracy did not improve from 0.84286

Epoch 00445: val_accuracy did not improve from 0.84286

Epoch 00446: val_accuracy did not improve from 0.84286

Epoch 00447: val_accuracy did not improve from 0.84286

Epoch 00448: val_accuracy did not improve from 0.84286

Epoch 00449: val_accuracy did not improve from 0.84286

Epoch 00450: val_accuracy did not improve from 0.84286

Epoch 00451: val_accuracy did not improve from 0.84286

Epoch 00452: val_accuracy did not improve from 0.84286

Epoch 00453: val_accuracy did not improve from 0.84286

Epoch 00454: val_accuracy did not improve from 0.84286

Epoch 00455: val_accuracy did not improve from 0.84286

Epoch 00456: val_accuracy did not improve from 0.84286

Epoch 00457: val_accuracy did not improve from 0.84286

Epoch 00458: val_accuracy did not improve from 0.84286

Epoch 00459: val_accuracy did not improve from 0.84286

Epoch 00460: val_accuracy did not improve from 0.84286

Epoch 00461: val_accuracy did not improve from 0.84286

Epoch 00462: val_accuracy did not improve from 0.84286

Epoch 00463: val_accuracy did not improve from 0.84286

Epoch 00464: val_accuracy did not improve from 0.84286

Epoch 00465: val_accuracy did not improve from 0.84286

Epoch 00466: val_accuracy did not improve from 0.84286

Epoch 00467: val_accuracy did not improve from 0.84286

Epoch 00468: val_accuracy did not improve from 0.84286

Epoch 00469: val_accuracy did not improve from 0.84286

Epoch 00470: val_accuracy did not improve from 0.84286

Epoch 00471: val_accuracy did not improve from 0.84286

Epoch 00472: val_accuracy did not improve from 0.84286

Epoch 00473: val_accuracy did not improve from 0.84286

Epoch 00474: val_accuracy did not improve from 0.84286

Epoch 00475: val_accuracy did not improve from 0.84286

Epoch 00476: val_accuracy did not improve from 0.84286

Epoch 00477: val_accuracy did not improve from 0.84286

Epoch 00478: val_accuracy did not improve from 0.84286

Epoch 00479: val_accuracy did not improve from 0.84286

Epoch 00480: val_accuracy did not improve from 0.84286

Epoch 00481: val_accuracy did not improve from 0.84286

Epoch 00482: val_accuracy did not improve from 0.84286

Epoch 00483: val_accuracy did not improve from 0.84286

Epoch 00484: val_accuracy did not improve from 0.84286

Epoch 00485: val_accuracy did not improve from 0.84286

Epoch 00486: val_accuracy did not improve from 0.84286

Epoch 00487: val_accuracy did not improve from 0.84286

Epoch 00488: val_accuracy did not improve from 0.84286

Epoch 00489: val_accuracy did not improve from 0.84286

Epoch 00490: val_accuracy did not improve from 0.84286

Epoch 00491: val_accuracy did not improve from 0.84286

Epoch 00492: val_accuracy did not improve from 0.84286

Epoch 00493: val_accuracy did not improve from 0.84286

Epoch 00494: val_accuracy did not improve from 0.84286

Epoch 00495: val_accuracy did not improve from 0.84286

Epoch 00496: val_accuracy did not improve from 0.84286

Epoch 00497: val_accuracy did not improve from 0.84286

Epoch 00498: val_accuracy did not improve from 0.84286

Epoch 00499: val_accuracy did not improve from 0.84286

Epoch 00500: val_accuracy did not improve from 0.84286

Epoch 00501: val_accuracy did not improve from 0.84286

Epoch 00502: val_accuracy did not improve from 0.84286

Epoch 00503: val_accuracy did not improve from 0.84286

Epoch 00504: val_accuracy did not improve from 0.84286

Epoch 00505: val_accuracy did not improve from 0.84286

Epoch 00506: val_accuracy did not improve from 0.84286

Epoch 00507: val_accuracy did not improve from 0.84286

Epoch 00508: val_accuracy did not improve from 0.84286

Epoch 00509: val_accuracy did not improve from 0.84286

Epoch 00510: val_accuracy did not improve from 0.84286

Epoch 00511: val_accuracy did not improve from 0.84286

Epoch 00512: val_accuracy did not improve from 0.84286

Epoch 00513: val_accuracy did not improve from 0.84286

Epoch 00514: val_accuracy did not improve from 0.84286

Epoch 00515: val_accuracy did not improve from 0.84286

Epoch 00516: val_accuracy did not improve from 0.84286

Epoch 00517: val_accuracy did not improve from 0.84286

Epoch 00518: val_accuracy did not improve from 0.84286

Epoch 00519: val_accuracy did not improve from 0.84286

Epoch 00520: val_accuracy did not improve from 0.84286

Epoch 00521: val_accuracy did not improve from 0.84286

Epoch 00522: val_accuracy did not improve from 0.84286

Epoch 00523: val_accuracy did not improve from 0.84286

Epoch 00524: val_accuracy did not improve from 0.84286

Epoch 00525: val_accuracy did not improve from 0.84286

Epoch 00526: val_accuracy did not improve from 0.84286

Epoch 00527: val_accuracy did not improve from 0.84286

Epoch 00528: val_accuracy did not improve from 0.84286

Epoch 00529: val_accuracy did not improve from 0.84286

Epoch 00530: val_accuracy did not improve from 0.84286

Epoch 00531: val_accuracy did not improve from 0.84286

Epoch 00532: val_accuracy did not improve from 0.84286

Epoch 00533: val_accuracy did not improve from 0.84286

Epoch 00534: val_accuracy did not improve from 0.84286

Epoch 00535: val_accuracy did not improve from 0.84286

Epoch 00536: val_accuracy did not improve from 0.84286

Epoch 00537: val_accuracy did not improve from 0.84286

Epoch 00538: val_accuracy did not improve from 0.84286

Epoch 00539: val_accuracy did not improve from 0.84286

Epoch 00540: val_accuracy did not improve from 0.84286

Epoch 00541: val_accuracy did not improve from 0.84286

Epoch 00542: val_accuracy improved from 0.84286 to 0.85714, saving model to best_model.h5

Epoch 00543: val_accuracy did not improve from 0.85714

Epoch 00544: val_accuracy did not improve from 0.85714

Epoch 00545: val_accuracy did not improve from 0.85714

Epoch 00546: val_accuracy did not improve from 0.85714

Epoch 00547: val_accuracy did not improve from 0.85714

Epoch 00548: val_accuracy did not improve from 0.85714

Epoch 00549: val_accuracy did not improve from 0.85714

Epoch 00550: val_accuracy did not improve from 0.85714

Epoch 00551: val_accuracy improved from 0.85714 to 0.87143, saving model to best_model.h5

Epoch 00552: val_accuracy did not improve from 0.87143

Epoch 00553: val_accuracy did not improve from 0.87143

Epoch 00554: val_accuracy did not improve from 0.87143

Epoch 00555: val_accuracy did not improve from 0.87143

Epoch 00556: val_accuracy did not improve from 0.87143

Epoch 00557: val_accuracy improved from 0.87143 to 0.88571, saving model to best_model.h5

Epoch 00558: val_accuracy did not improve from 0.88571

Epoch 00559: val_accuracy did not improve from 0.88571

Epoch 00560: val_accuracy did not improve from 0.88571

Epoch 00561: val_accuracy did not improve from 0.88571

Epoch 00562: val_accuracy did not improve from 0.88571

Epoch 00563: val_accuracy did not improve from 0.88571

Epoch 00564: val_accuracy did not improve from 0.88571

Epoch 00565: val_accuracy did not improve from 0.88571

Epoch 00566: val_accuracy did not improve from 0.88571

Epoch 00567: val_accuracy did not improve from 0.88571

Epoch 00568: val_accuracy did not improve from 0.88571

Epoch 00569: val_accuracy did not improve from 0.88571

Epoch 00570: val_accuracy did not improve from 0.88571

Epoch 00571: val_accuracy did not improve from 0.88571

Epoch 00572: val_accuracy did not improve from 0.88571

Epoch 00573: val_accuracy did not improve from 0.88571

Epoch 00574: val_accuracy did not improve from 0.88571

Epoch 00575: val_accuracy did not improve from 0.88571

Epoch 00576: val_accuracy did not improve from 0.88571

Epoch 00577: val_accuracy did not improve from 0.88571

Epoch 00578: val_accuracy improved from 0.88571 to 0.90000, saving model to best_model.h5

Epoch 00579: val_accuracy did not improve from 0.90000

Epoch 00580: val_accuracy did not improve from 0.90000

Epoch 00581: val_accuracy did not improve from 0.90000

Epoch 00582: val_accuracy did not improve from 0.90000

Epoch 00583: val_accuracy did not improve from 0.90000

Epoch 00584: val_accuracy did not improve from 0.90000

Epoch 00585: val_accuracy did not improve from 0.90000

Epoch 00586: val_accuracy did not improve from 0.90000

Epoch 00587: val_accuracy did not improve from 0.90000

Epoch 00588: val_accuracy did not improve from 0.90000

Epoch 00589: val_accuracy did not improve from 0.90000

Epoch 00590: val_accuracy did not improve from 0.90000

Epoch 00591: val_accuracy did not improve from 0.90000

Epoch 00592: val_accuracy did not improve from 0.90000

Epoch 00593: val_accuracy did not improve from 0.90000

Epoch 00594: val_accuracy did not improve from 0.90000

Epoch 00595: val_accuracy did not improve from 0.90000

Epoch 00596: val_accuracy did not improve from 0.90000

Epoch 00597: val_accuracy did not improve from 0.90000

Epoch 00598: val_accuracy did not improve from 0.90000

Epoch 00599: val_accuracy did not improve from 0.90000

Epoch 00600: val_accuracy did not improve from 0.90000

Epoch 00601: val_accuracy did not improve from 0.90000

Epoch 00602: val_accuracy did not improve from 0.90000

Epoch 00603: val_accuracy did not improve from 0.90000

Epoch 00604: val_accuracy did not improve from 0.90000

Epoch 00605: val_accuracy did not improve from 0.90000

Epoch 00606: val_accuracy did not improve from 0.90000

Epoch 00607: val_accuracy did not improve from 0.90000

Epoch 00608: val_accuracy did not improve from 0.90000

Epoch 00609: val_accuracy did not improve from 0.90000

Epoch 00610: val_accuracy did not improve from 0.90000

Epoch 00611: val_accuracy did not improve from 0.90000

Epoch 00612: val_accuracy did not improve from 0.90000

Epoch 00613: val_accuracy did not improve from 0.90000

Epoch 00614: val_accuracy did not improve from 0.90000

Epoch 00615: val_accuracy did not improve from 0.90000

Epoch 00616: val_accuracy did not improve from 0.90000

Epoch 00617: val_accuracy did not improve from 0.90000

Epoch 00618: val_accuracy did not improve from 0.90000

Epoch 00619: val_accuracy did not improve from 0.90000

Epoch 00620: val_accuracy did not improve from 0.90000

Epoch 00621: val_accuracy did not improve from 0.90000

Epoch 00622: val_accuracy did not improve from 0.90000

Epoch 00623: val_accuracy did not improve from 0.90000

Epoch 00624: val_accuracy did not improve from 0.90000

Epoch 00625: val_accuracy did not improve from 0.90000

Epoch 00626: val_accuracy did not improve from 0.90000

Epoch 00627: val_accuracy did not improve from 0.90000

Epoch 00628: val_accuracy did not improve from 0.90000

Epoch 00629: val_accuracy did not improve from 0.90000

Epoch 00630: val_accuracy did not improve from 0.90000

Epoch 00631: val_accuracy did not improve from 0.90000

Epoch 00632: val_accuracy did not improve from 0.90000

Epoch 00633: val_accuracy improved from 0.90000 to 0.91429, saving model to best_model.h5

Epoch 00634: val_accuracy improved from 0.91429 to 0.92857, saving model to best_model.h5

Epoch 00635: val_accuracy did not improve from 0.92857

Epoch 00636: val_accuracy did not improve from 0.92857

Epoch 00637: val_accuracy did not improve from 0.92857

Epoch 00638: val_accuracy did not improve from 0.92857

Epoch 00639: val_accuracy did not improve from 0.92857

Epoch 00640: val_accuracy did not improve from 0.92857

Epoch 00641: val_accuracy did not improve from 0.92857

Epoch 00642: val_accuracy did not improve from 0.92857

Epoch 00643: val_accuracy did not improve from 0.92857

Epoch 00644: val_accuracy did not improve from 0.92857

Epoch 00645: val_accuracy did not improve from 0.92857

Epoch 00646: val_accuracy did not improve from 0.92857

Epoch 00647: val_accuracy did not improve from 0.92857

Epoch 00648: val_accuracy did not improve from 0.92857

Epoch 00649: val_accuracy did not improve from 0.92857

Epoch 00650: val_accuracy did not improve from 0.92857

Epoch 00651: val_accuracy did not improve from 0.92857

Epoch 00652: val_accuracy did not improve from 0.92857

Epoch 00653: val_accuracy did not improve from 0.92857

Epoch 00654: val_accuracy did not improve from 0.92857

Epoch 00655: val_accuracy did not improve from 0.92857

Epoch 00656: val_accuracy did not improve from 0.92857

Epoch 00657: val_accuracy did not improve from 0.92857

Epoch 00658: val_accuracy did not improve from 0.92857

Epoch 00659: val_accuracy did not improve from 0.92857

Epoch 00660: val_accuracy did not improve from 0.92857

Epoch 00661: val_accuracy did not improve from 0.92857

Epoch 00662: val_accuracy did not improve from 0.92857

Epoch 00663: val_accuracy did not improve from 0.92857

Epoch 00664: val_accuracy did not improve from 0.92857

Epoch 00665: val_accuracy did not improve from 0.92857

Epoch 00666: val_accuracy did not improve from 0.92857

Epoch 00667: val_accuracy did not improve from 0.92857

Epoch 00668: val_accuracy did not improve from 0.92857

Epoch 00669: val_accuracy did not improve from 0.92857

Epoch 00670: val_accuracy did not improve from 0.92857

Epoch 00671: val_accuracy did not improve from 0.92857

Epoch 00672: val_accuracy did not improve from 0.92857

Epoch 00673: val_accuracy did not improve from 0.92857

Epoch 00674: val_accuracy did not improve from 0.92857

Epoch 00675: val_accuracy did not improve from 0.92857

Epoch 00676: val_accuracy did not improve from 0.92857

Epoch 00677: val_accuracy did not improve from 0.92857

Epoch 00678: val_accuracy did not improve from 0.92857

Epoch 00679: val_accuracy did not improve from 0.92857

Epoch 00680: val_accuracy did not improve from 0.92857

Epoch 00681: val_accuracy did not improve from 0.92857

Epoch 00682: val_accuracy did not improve from 0.92857

Epoch 00683: val_accuracy did not improve from 0.92857

Epoch 00684: val_accuracy did not improve from 0.92857

Epoch 00685: val_accuracy did not improve from 0.92857

Epoch 00686: val_accuracy did not improve from 0.92857

Epoch 00687: val_accuracy did not improve from 0.92857

Epoch 00688: val_accuracy did not improve from 0.92857

Epoch 00689: val_accuracy did not improve from 0.92857

Epoch 00690: val_accuracy did not improve from 0.92857

Epoch 00691: val_accuracy did not improve from 0.92857

Epoch 00692: val_accuracy did not improve from 0.92857

Epoch 00693: val_accuracy did not improve from 0.92857

Epoch 00694: val_accuracy did not improve from 0.92857

Epoch 00695: val_accuracy did not improve from 0.92857

Epoch 00696: val_accuracy did not improve from 0.92857

Epoch 00697: val_accuracy did not improve from 0.92857

Epoch 00698: val_accuracy did not improve from 0.92857

Epoch 00699: val_accuracy did not improve from 0.92857

Epoch 00700: val_accuracy did not improve from 0.92857

Epoch 00701: val_accuracy did not improve from 0.92857

Epoch 00702: val_accuracy did not improve from 0.92857

Epoch 00703: val_accuracy did not improve from 0.92857

Epoch 00704: val_accuracy did not improve from 0.92857

Epoch 00705: val_accuracy did not improve from 0.92857

Epoch 00706: val_accuracy did not improve from 0.92857

Epoch 00707: val_accuracy did not improve from 0.92857

Epoch 00708: val_accuracy did not improve from 0.92857

Epoch 00709: val_accuracy did not improve from 0.92857

Epoch 00710: val_accuracy did not improve from 0.92857

Epoch 00711: val_accuracy did not improve from 0.92857

Epoch 00712: val_accuracy did not improve from 0.92857

Epoch 00713: val_accuracy did not improve from 0.92857

Epoch 00714: val_accuracy did not improve from 0.92857

Epoch 00715: val_accuracy did not improve from 0.92857

Epoch 00716: val_accuracy did not improve from 0.92857

Epoch 00717: val_accuracy did not improve from 0.92857

Epoch 00718: val_accuracy did not improve from 0.92857

Epoch 00719: val_accuracy did not improve from 0.92857

Epoch 00720: val_accuracy did not improve from 0.92857

Epoch 00721: val_accuracy did not improve from 0.92857

Epoch 00722: val_accuracy did not improve from 0.92857

Epoch 00723: val_accuracy did not improve from 0.92857

Epoch 00724: val_accuracy did not improve from 0.92857

Epoch 00725: val_accuracy did not improve from 0.92857

Epoch 00726: val_accuracy did not improve from 0.92857

Epoch 00727: val_accuracy did not improve from 0.92857

Epoch 00728: val_accuracy did not improve from 0.92857

Epoch 00729: val_accuracy did not improve from 0.92857

Epoch 00730: val_accuracy did not improve from 0.92857

Epoch 00731: val_accuracy did not improve from 0.92857

Epoch 00732: val_accuracy did not improve from 0.92857

Epoch 00733: val_accuracy did not improve from 0.92857

Epoch 00734: val_accuracy did not improve from 0.92857

Epoch 00735: val_accuracy did not improve from 0.92857

Epoch 00736: val_accuracy did not improve from 0.92857

Epoch 00737: val_accuracy did not improve from 0.92857

Epoch 00738: val_accuracy did not improve from 0.92857

Epoch 00739: val_accuracy did not improve from 0.92857

Epoch 00740: val_accuracy did not improve from 0.92857

Epoch 00741: val_accuracy did not improve from 0.92857

Epoch 00742: val_accuracy did not improve from 0.92857

Epoch 00743: val_accuracy did not improve from 0.92857

Epoch 00744: val_accuracy did not improve from 0.92857

Epoch 00745: val_accuracy did not improve from 0.92857

Epoch 00746: val_accuracy did not improve from 0.92857

Epoch 00747: val_accuracy did not improve from 0.92857

Epoch 00748: val_accuracy did not improve from 0.92857

Epoch 00749: val_accuracy did not improve from 0.92857

Epoch 00750: val_accuracy did not improve from 0.92857

Epoch 00751: val_accuracy did not improve from 0.92857

Epoch 00752: val_accuracy did not improve from 0.92857

Epoch 00753: val_accuracy did not improve from 0.92857

Epoch 00754: val_accuracy did not improve from 0.92857

Epoch 00755: val_accuracy did not improve from 0.92857

Epoch 00756: val_accuracy did not improve from 0.92857

Epoch 00757: val_accuracy did not improve from 0.92857

Epoch 00758: val_accuracy did not improve from 0.92857

Epoch 00759: val_accuracy did not improve from 0.92857

Epoch 00760: val_accuracy did not improve from 0.92857

Epoch 00761: val_accuracy did not improve from 0.92857

Epoch 00762: val_accuracy did not improve from 0.92857

Epoch 00763: val_accuracy did not improve from 0.92857

Epoch 00764: val_accuracy did not improve from 0.92857

Epoch 00765: val_accuracy did not improve from 0.92857

Epoch 00766: val_accuracy did not improve from 0.92857

Epoch 00767: val_accuracy did not improve from 0.92857

Epoch 00768: val_accuracy did not improve from 0.92857

Epoch 00769: val_accuracy did not improve from 0.92857

Epoch 00770: val_accuracy did not improve from 0.92857

Epoch 00771: val_accuracy did not improve from 0.92857

Epoch 00772: val_accuracy did not improve from 0.92857

Epoch 00773: val_accuracy did not improve from 0.92857

Epoch 00774: val_accuracy did not improve from 0.92857

Epoch 00775: val_accuracy did not improve from 0.92857

Epoch 00776: val_accuracy did not improve from 0.92857

Epoch 00777: val_accuracy did not improve from 0.92857

Epoch 00778: val_accuracy did not improve from 0.92857

Epoch 00779: val_accuracy did not improve from 0.92857

Epoch 00780: val_accuracy did not improve from 0.92857

Epoch 00781: val_accuracy did not improve from 0.92857

Epoch 00782: val_accuracy did not improve from 0.92857

Epoch 00783: val_accuracy did not improve from 0.92857

Epoch 00784: val_accuracy did not improve from 0.92857

Epoch 00785: val_accuracy did not improve from 0.92857

Epoch 00786: val_accuracy did not improve from 0.92857

Epoch 00787: val_accuracy did not improve from 0.92857

Epoch 00788: val_accuracy did not improve from 0.92857

Epoch 00789: val_accuracy did not improve from 0.92857

Epoch 00790: val_accuracy did not improve from 0.92857

Epoch 00791: val_accuracy did not improve from 0.92857

Epoch 00792: val_accuracy did not improve from 0.92857

Epoch 00793: val_accuracy did not improve from 0.92857

Epoch 00794: val_accuracy did not improve from 0.92857

Epoch 00795: val_accuracy did not improve from 0.92857

Epoch 00796: val_accuracy did not improve from 0.92857

Epoch 00797: val_accuracy did not improve from 0.92857

Epoch 00798: val_accuracy did not improve from 0.92857

Epoch 00799: val_accuracy did not improve from 0.92857

Epoch 00800: val_accuracy did not improve from 0.92857

Epoch 00801: val_accuracy did not improve from 0.92857

Epoch 00802: val_accuracy did not improve from 0.92857

Epoch 00803: val_accuracy did not improve from 0.92857

Epoch 00804: val_accuracy did not improve from 0.92857

Epoch 00805: val_accuracy did not improve from 0.92857

Epoch 00806: val_accuracy did not improve from 0.92857

Epoch 00807: val_accuracy did not improve from 0.92857

Epoch 00808: val_accuracy did not improve from 0.92857

Epoch 00809: val_accuracy did not improve from 0.92857

Epoch 00810: val_accuracy did not improve from 0.92857

Epoch 00811: val_accuracy did not improve from 0.92857

Epoch 00812: val_accuracy did not improve from 0.92857

Epoch 00813: val_accuracy did not improve from 0.92857

Epoch 00814: val_accuracy did not improve from 0.92857

Epoch 00815: val_accuracy did not improve from 0.92857

Epoch 00816: val_accuracy did not improve from 0.92857

Epoch 00817: val_accuracy did not improve from 0.92857

Epoch 00818: val_accuracy did not improve from 0.92857

Epoch 00819: val_accuracy did not improve from 0.92857

Epoch 00820: val_accuracy did not improve from 0.92857

Epoch 00821: val_accuracy did not improve from 0.92857

Epoch 00822: val_accuracy did not improve from 0.92857

Epoch 00823: val_accuracy did not improve from 0.92857

Epoch 00824: val_accuracy did not improve from 0.92857

Epoch 00825: val_accuracy did not improve from 0.92857

Epoch 00826: val_accuracy did not improve from 0.92857

Epoch 00827: val_accuracy did not improve from 0.92857

Epoch 00828: val_accuracy did not improve from 0.92857

Epoch 00829: val_accuracy did not improve from 0.92857

Epoch 00830: val_accuracy did not improve from 0.92857

Epoch 00831: val_accuracy did not improve from 0.92857

Epoch 00832: val_accuracy did not improve from 0.92857

Epoch 00833: val_accuracy did not improve from 0.92857

Epoch 00834: val_accuracy did not improve from 0.92857

Epoch 00835: val_accuracy did not improve from 0.92857

Epoch 00836: val_accuracy did not improve from 0.92857

Epoch 00837: val_accuracy did not improve from 0.92857

Epoch 00838: val_accuracy did not improve from 0.92857

Epoch 00839: val_accuracy did not improve from 0.92857

Epoch 00840: val_accuracy did not improve from 0.92857

Epoch 00841: val_accuracy did not improve from 0.92857

Epoch 00842: val_accuracy did not improve from 0.92857

Epoch 00843: val_accuracy did not improve from 0.92857

Epoch 00844: val_accuracy did not improve from 0.92857

Epoch 00845: val_accuracy did not improve from 0.92857

Epoch 00846: val_accuracy did not improve from 0.92857

Epoch 00847: val_accuracy did not improve from 0.92857

Epoch 00848: val_accuracy did not improve from 0.92857

Epoch 00849: val_accuracy did not improve from 0.92857

Epoch 00850: val_accuracy did not improve from 0.92857

Epoch 00851: val_accuracy did not improve from 0.92857

Epoch 00852: val_accuracy did not improve from 0.92857

Epoch 00853: val_accuracy did not improve from 0.92857

Epoch 00854: val_accuracy did not improve from 0.92857

Epoch 00855: val_accuracy did not improve from 0.92857

Epoch 00856: val_accuracy did not improve from 0.92857

Epoch 00857: val_accuracy did not improve from 0.92857

Epoch 00858: val_accuracy did not improve from 0.92857

Epoch 00859: val_accuracy did not improve from 0.92857

Epoch 00860: val_accuracy did not improve from 0.92857

Epoch 00861: val_accuracy did not improve from 0.92857

Epoch 00862: val_accuracy did not improve from 0.92857

Epoch 00863: val_accuracy did not improve from 0.92857

Epoch 00864: val_accuracy did not improve from 0.92857

Epoch 00865: val_accuracy improved from 0.92857 to 0.94286, saving model to best_model.h5

Epoch 00866: val_accuracy did not improve from 0.94286

Epoch 00867: val_accuracy did not improve from 0.94286

Epoch 00868: val_accuracy did not improve from 0.94286

Epoch 00869: val_accuracy did not improve from 0.94286

Epoch 00870: val_accuracy did not improve from 0.94286

Epoch 00871: val_accuracy did not improve from 0.94286

Epoch 00872: val_accuracy did not improve from 0.94286

Epoch 00873: val_accuracy did not improve from 0.94286

Epoch 00874: val_accuracy did not improve from 0.94286

Epoch 00875: val_accuracy did not improve from 0.94286

Epoch 00876: val_accuracy did not improve from 0.94286

Epoch 00877: val_accuracy did not improve from 0.94286

Epoch 00878: val_accuracy did not improve from 0.94286

Epoch 00879: val_accuracy did not improve from 0.94286

Epoch 00880: val_accuracy did not improve from 0.94286

Epoch 00881: val_accuracy did not improve from 0.94286

Epoch 00882: val_accuracy did not improve from 0.94286

Epoch 00883: val_accuracy did not improve from 0.94286

Epoch 00884: val_accuracy did not improve from 0.94286

Epoch 00885: val_accuracy did not improve from 0.94286

Epoch 00886: val_accuracy did not improve from 0.94286

Epoch 00887: val_accuracy did not improve from 0.94286

Epoch 00888: val_accuracy did not improve from 0.94286

Epoch 00889: val_accuracy did not improve from 0.94286

Epoch 00890: val_accuracy did not improve from 0.94286

Epoch 00891: val_accuracy did not improve from 0.94286

Epoch 00892: val_accuracy did not improve from 0.94286

Epoch 00893: val_accuracy did not improve from 0.94286

Epoch 00894: val_accuracy did not improve from 0.94286

Epoch 00895: val_accuracy did not improve from 0.94286

Epoch 00896: val_accuracy did not improve from 0.94286

Epoch 00897: val_accuracy did not improve from 0.94286

Epoch 00898: val_accuracy did not improve from 0.94286

Epoch 00899: val_accuracy did not improve from 0.94286

Epoch 00900: val_accuracy did not improve from 0.94286

Epoch 00901: val_accuracy did not improve from 0.94286

Epoch 00902: val_accuracy did not improve from 0.94286

Epoch 00903: val_accuracy did not improve from 0.94286

Epoch 00904: val_accuracy did not improve from 0.94286

Epoch 00905: val_accuracy did not improve from 0.94286

Epoch 00906: val_accuracy did not improve from 0.94286

Epoch 00907: val_accuracy did not improve from 0.94286

Epoch 00908: val_accuracy did not improve from 0.94286

Epoch 00909: val_accuracy did not improve from 0.94286

Epoch 00910: val_accuracy did not improve from 0.94286

Epoch 00911: val_accuracy did not improve from 0.94286

Epoch 00912: val_accuracy did not improve from 0.94286

Epoch 00913: val_accuracy did not improve from 0.94286

Epoch 00914: val_accuracy did not improve from 0.94286

Epoch 00915: val_accuracy did not improve from 0.94286

Epoch 00916: val_accuracy did not improve from 0.94286

Epoch 00917: val_accuracy did not improve from 0.94286

Epoch 00918: val_accuracy did not improve from 0.94286

Epoch 00919: val_accuracy did not improve from 0.94286

Epoch 00920: val_accuracy did not improve from 0.94286

Epoch 00921: val_accuracy did not improve from 0.94286

Epoch 00922: val_accuracy did not improve from 0.94286

Epoch 00923: val_accuracy did not improve from 0.94286

Epoch 00924: val_accuracy did not improve from 0.94286

Epoch 00925: val_accuracy did not improve from 0.94286

Epoch 00926: val_accuracy did not improve from 0.94286

Epoch 00927: val_accuracy did not improve from 0.94286

Epoch 00928: val_accuracy did not improve from 0.94286

Epoch 00929: val_accuracy did not improve from 0.94286

Epoch 00930: val_accuracy did not improve from 0.94286

Epoch 00931: val_accuracy did not improve from 0.94286

Epoch 00932: val_accuracy did not improve from 0.94286

Epoch 00933: val_accuracy did not improve from 0.94286

Epoch 00934: val_accuracy did not improve from 0.94286

Epoch 00935: val_accuracy did not improve from 0.94286

Epoch 00936: val_accuracy did not improve from 0.94286

Epoch 00937: val_accuracy did not improve from 0.94286

Epoch 00938: val_accuracy did not improve from 0.94286

Epoch 00939: val_accuracy did not improve from 0.94286

Epoch 00940: val_accuracy did not improve from 0.94286

Epoch 00941: val_accuracy did not improve from 0.94286

Epoch 00942: val_accuracy did not improve from 0.94286

Epoch 00943: val_accuracy did not improve from 0.94286

Epoch 00944: val_accuracy did not improve from 0.94286

Epoch 00945: val_accuracy did not improve from 0.94286

Epoch 00946: val_accuracy did not improve from 0.94286

Epoch 00947: val_accuracy did not improve from 0.94286

Epoch 00948: val_accuracy did not improve from 0.94286

Epoch 00949: val_accuracy did not improve from 0.94286

Epoch 00950: val_accuracy did not improve from 0.94286

Epoch 00951: val_accuracy did not improve from 0.94286

Epoch 00952: val_accuracy did not improve from 0.94286

Epoch 00953: val_accuracy did not improve from 0.94286

Epoch 00954: val_accuracy did not improve from 0.94286

Epoch 00955: val_accuracy did not improve from 0.94286

Epoch 00956: val_accuracy did not improve from 0.94286

Epoch 00957: val_accuracy did not improve from 0.94286

Epoch 00958: val_accuracy did not improve from 0.94286

Epoch 00959: val_accuracy did not improve from 0.94286

Epoch 00960: val_accuracy did not improve from 0.94286

Epoch 00961: val_accuracy did not improve from 0.94286

Epoch 00962: val_accuracy did not improve from 0.94286

Epoch 00963: val_accuracy did not improve from 0.94286

Epoch 00964: val_accuracy did not improve from 0.94286

Epoch 00965: val_accuracy did not improve from 0.94286

Epoch 00966: val_accuracy did not improve from 0.94286

Epoch 00967: val_accuracy did not improve from 0.94286

Epoch 00968: val_accuracy did not improve from 0.94286

Epoch 00969: val_accuracy did not improve from 0.94286

Epoch 00970: val_accuracy did not improve from 0.94286

Epoch 00971: val_accuracy did not improve from 0.94286

Epoch 00972: val_accuracy did not improve from 0.94286

Epoch 00973: val_accuracy did not improve from 0.94286

Epoch 00974: val_accuracy did not improve from 0.94286

Epoch 00975: val_accuracy did not improve from 0.94286

Epoch 00976: val_accuracy did not improve from 0.94286

Epoch 00977: val_accuracy did not improve from 0.94286

Epoch 00978: val_accuracy did not improve from 0.94286

Epoch 00979: val_accuracy did not improve from 0.94286

Epoch 00980: val_accuracy did not improve from 0.94286

Epoch 00981: val_accuracy did not improve from 0.94286

Epoch 00982: val_accuracy did not improve from 0.94286

Epoch 00983: val_accuracy did not improve from 0.94286

Epoch 00984: val_accuracy did not improve from 0.94286

Epoch 00985: val_accuracy did not improve from 0.94286

Epoch 00986: val_accuracy did not improve from 0.94286

Epoch 00987: val_accuracy did not improve from 0.94286

Epoch 00988: val_accuracy did not improve from 0.94286

Epoch 00989: val_accuracy did not improve from 0.94286

Epoch 00990: val_accuracy did not improve from 0.94286

Epoch 00991: val_accuracy did not improve from 0.94286

Epoch 00992: val_accuracy did not improve from 0.94286

Epoch 00993: val_accuracy did not improve from 0.94286

Epoch 00994: val_accuracy did not improve from 0.94286

Epoch 00995: val_accuracy did not improve from 0.94286

Epoch 00996: val_accuracy did not improve from 0.94286

Epoch 00997: val_accuracy did not improve from 0.94286
Epoch 00997: early stopping
In [56]:
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
Train: 1.000, Test: 0.943

In this case, we don’t see any further improvement in model accuracy on the test dataset. Nevertheless, we have followed a good practice.

Why not monitor validation accuracy for early stopping?

The main reason is that accuracy is a coarse measure of model performance during training and that loss provides more nuance when using early stopping with classification problems. The same measure may be used for early stopping and model checkpointing in the case of regression, such as mean squared error.

In [ ]: