Image Classifier Using Tensorflow and Keras

Image classification refers to the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in Computer Vision that, despite its simplicity, has a large variety of practical applications.

In this post we will learn how one of the most influential innovations in the field of computer vision (Convolutional Neural Network) can understand a natural image well-enough to solve image classification problem.

We will build an image classifiers from scratch, starting from JPEG images files on disk, without leveraging pre-trained weights or a pre-made keras application model, you can find all the code in my Github Repository

The Convolutional Classifier

A Convolutional Network used for image classification has two parts: a convolutional base and a dense head.

The convolutional classifier

The base is used to extract the features from an image. It is formed primarily of layers performing the convolution operation, but often includes other kinds of layers as well.

feature could be a line, a color, a texture, a shape, a pattern or some complicated combination.

The head is used to determinate the class of the image. It is formed primarily of dense layers, but might include other layers like dropout.

Example of the whole process

Setup and Data preparation

Import TensorFlow and other libraries

import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

Download the data

We are using a custom dataset created for this post to classificate Identification Cards, we will clone the dataset repository that contains 3 sub-directories inside training folder, one per class.

!git clone https://github.com/edinsonjim/ids-dataset.git
ids-dataset
├── readme.md
├── test
   ├── 0001.jpeg
   ├── 0002.jpg
   ├── 0003.jpg
   ├── 0004.jpg
   └── 0005.png
└── training
    ├── italy
   ├── 0000.jpg
   ├── 0001.jpg
   ├── 0003.jpg
   ├── 0004.jpg
   ├── 0005.jpg
   ├── 0006.jpg
   ├── 0007.jpg
   └── 0008.jpg
    ├── peru
   ├── 0001.jpg
   ├── 0002.jpg
   ├── 0003.jpg
   ├── 0004.jpg
   ├── 0005.jpg
   ├── 0006.jpg
   ├── 0007.jpg
   ├── 0008.jpg
   ├── 0009.jpg
   ├── 0010.jpg
   └── 0011.jpg
    └── spain
        ├── 0001.jpg
        ├── 0002.jpg
        ├── 0003.jpg
        ├── 0004.jpg
        ├── 0005.jpg
        └── 0006.jpg

Generate a dataset

After clone the repository, we can load the dataset using the helpful image_dataset_from_directory utility and create our training and validation dataset.

Let’s define some parameters for the loader.

WORK_DIR = "./ids-dataset/training"

IMAGE_HEIGHT = 180
IMAGE_WIDTH = 180

IMAGE_SIZE = (IMAGE_HEIGHT, IMAGE_WIDTH)
BATCH_SIZE = 10
  • WORK_DIR: Path to training data.
  • IMAGE_SIZE: Size to resize images to after they are read from disk.
  • BATCH_SIZE: Number of samples that will be passed through to the network at one time.

Training dataset

Machine Learning algorithms require training data to achieve an objective. The algorithm will analyze this training dataset, classify the inputs and outputs. These operations allow the model to learn the relationship between the data.

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    WORK_DIR,
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
)
Found 25 files belonging to 3 classes.
Using 20 files for training.

Validation dataset

A trained algorithm will essentially memorize all of the inputs and outputs in a training dataset, this becomes a problem when it need to consider data from other sources.

Here is where validation data is useful. Validation data provides an initial check that the model can return useful predictions in a real-world setting.

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    WORK_DIR,
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
)
Found 25 files belonging to 3 classes.
Using 5 files for validation.

It’s a good practice to use a validation split when developing your model. Let’s use 80% of the images for training, and 20% for validation.

We can find the class names in the class_names attribute on these datasets, that correspond to the directory names in alphabetical order.

class_names = train_ds.class_names
print(class_names)
['italy', 'peru', 'spain']

Visualize the data

Let’s visualize the first images in the training dataset.

plt.figure(figsize=(10, 10))

for images, labels in train_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

First images in the training dataset

Configure the dataset for performance

Let’s make sure to use buffered prefetching so we can yield data from disk without having I/O becoming blocking. These are two important methods you should use when loading data.

Dataset.cache() keeps the images in memory after they’re loaded off disk during the first each. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache.

Dataset.prefetch() overlaps data preprocessing and model execution while training.

If you are interested, you can learn more about how to cache data in the data performance guide.

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

Data augmentation

When you don’t have a large image dataset, it’s a good practice to artificially introduce sample diversity by applying random yet realistic transformations to the training images, such as random horizontal flipping or small random rotations. This helps expose the model to different aspects of the training data while slowing down overfitting.

COLOR_CHANNELS = 3 # 3 channels (RGB)

data_augmentation = keras.Sequential(
    [
     layers.experimental.preprocessing.RandomFlip("horizontal",
        input_shape=(IMAGE_HEIGHT,
                    IMAGE_WIDTH,
                    COLOR_CHANNELS)),
     layers.experimental.preprocessing.RandomRotation(0.01),
     layers.experimental.preprocessing.RandomZoom(0.01)
    ]
)
  • input_shape: Initial tensor that we send to the first hidden layer. This tensor should have the same shape as our training data.

Let’s visualize what the augmented samples looks like, by applying data_augmentation repeatedly to the first images in the dataset.

plt.figure(figsize=(10, 10))

for images, _ in train_ds.take(1):
  for i in range(9):
    augmented_images = data_augmentation(images)
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(augmented_images[0].numpy().astype("uint8"))
    plt.axis("off")

First images in the dataset with data augmentation

Standardizing the data

Our image are already in a standard size (180x180), as they are being yielded as contiguos float32 batches by our dataset. However, their RGB channel values are in the [0, 255] range. This is not ideal for a neural network; in general you should seek to make your input values small. Here, we will standardize values to be in the [0, 1] by using a Rescaling layer at the start of our model.

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

Build a model

Now we’ll define a small model. See how our model consist of six blocks of Conv2D and MaxPool2D layers(the base) followed by a head of Dense layers.

Model architecture diagram

Notice in this definition how the number of filters doubled block by block: 8, 16, …, 512. This is a common pattern. Since the MaxPool2D layer is reducing the size of the feature maps, we can afford to increase the quantity if we wish.

We have three ways to create Keras models, for this post we use the Sequential model, wich is limited to single-input and single-output layer stacks. To know other methods, you can visit Keras Models API.

num_classes = len(class_names)

model = Sequential([
    # preprocessing
    data_augmentation,
    normalization_layer,

    # first convolutional block
    layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),

    # second convolutional block
    layers.Conv2D(filters=16, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),

    # third convolutional block
    layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),

    # fourth convolutional block
    layers.Conv2D(filters=64, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),

    # fifth convolutional block
    layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),

    # sixth convolutional block
    layers.Conv2D(filters=256, kernel_size=3, padding='same', activation='relu'),
    layers.MaxPool2D(),
    layers.Dropout(rate=0.2), # apply 20% dropout to the next layer to prevent overfitting

    # classifier Head
    layers.Flatten(),
    layers.Dense(units=512, activation='relu'),
    layers.Dense(units=num_classes)
])

Note that:

  • We start the model with the data_augmentation preprocesor, followed by a normalization layer previously detailed.
  • We include a Dropout layer before the final classification layer, used to simulate different network architectures by randomly removing nodes during training, helping to prevent overfitting.

Compile the model

Choose the optimizers.Adam optimizer and losses.SparseCategoricalCrossentropy loss function. To view training and validation accuracy for each training epoch, pass the metrics argument.

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)
  • optimizers.Adam: Is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. Used to perform optimization and is one of the best optimizer at preset. You can read this paper for more details.
  • losses.SparseCategoricalCrossentropy: Use this function when there are two or more label classes. We expect labels to be provided as integers.

Model summary

If you want to view all the layers of the network, you can use the model’s summary method.

model.summary()
Model: "sequential_95"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
sequential_90 (Sequential)   (None, 180, 180, 3)       0
_________________________________________________________________
rescaling_8 (Rescaling)      (None, 180, 180, 3)       0
_________________________________________________________________
conv2d_352 (Conv2D)          (None, 180, 180, 8)       224
_________________________________________________________________
max_pooling2d_400 (MaxPoolin (None, 90, 90, 8)         0
_________________________________________________________________
conv2d_353 (Conv2D)          (None, 90, 90, 16)        1168
_________________________________________________________________
max_pooling2d_401 (MaxPoolin (None, 45, 45, 16)        0
_________________________________________________________________
conv2d_354 (Conv2D)          (None, 45, 45, 32)        4640
_________________________________________________________________
max_pooling2d_402 (MaxPoolin (None, 22, 22, 32)        0
_________________________________________________________________
conv2d_355 (Conv2D)          (None, 22, 22, 64)        18496
_________________________________________________________________
max_pooling2d_403 (MaxPoolin (None, 11, 11, 64)        0
_________________________________________________________________
conv2d_356 (Conv2D)          (None, 11, 11, 128)       73856
_________________________________________________________________
max_pooling2d_404 (MaxPoolin (None, 5, 5, 128)         0
_________________________________________________________________
conv2d_357 (Conv2D)          (None, 5, 5, 256)         295168
_________________________________________________________________
max_pooling2d_405 (MaxPoolin (None, 2, 2, 256)         0
_________________________________________________________________
dropout_68 (Dropout)         (None, 2, 2, 256)         0
_________________________________________________________________
flatten_87 (Flatten)         (None, 1024)              0
_________________________________________________________________
dense_177 (Dense)            (None, 512)               524800
_________________________________________________________________
dense_178 (Dense)            (None, 3)                 1539
=================================================================
Total params: 919,891
Trainable params: 919,891
Non-trainable params: 0

Train the model

epochs = 50

history = model.fit(
    train_ds,
    epochs=epochs,
    validation_data=val_ds,
)

An epoch is one pass through an entire dataset. This can be in random order.

Visualize training results

The keras fit() method returns an object containing the training history, including the value of metrics at the end of each epoch.

Let’s plot the metrics(loss and accuracy) from the training and validation sets.

# training and validation accuracy
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

# training and validation loss
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

# plot settings for Accuracy
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label="Training Accuracy")
plt.plot(epochs_range, val_acc, label="Validation Accuracy")
plt.legend(loc="lowe right")
plt.title("Training and Validation Accuracy")

# plot settings for Loss
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label="Training Loss")
plt.plot(epochs_range, val_loss, label="Validation Loss")
plt.legend(loc="upper right")
plt.title("Training and Validation Loss")
plt.show()

Training and validation metrics

Predict on new data

Finally, let’s use our model to classify an image that wasn’t included in the training or validation sets, but first we create a show_and_predict helper method to download, prepare and show the model predictions.

def show_and_predict(img_path):
  """
  Download, prepare and show model predictions
  """
  img = keras.preprocessing.image.load_img(
      img_path, target_size=IMAGE_SIZE
  )

  img_array = keras.preprocessing.image.img_to_array(img)
  img_expanded = tf.expand_dims(img_array, 0) # create a batch

  predictions = model.predict(img_expanded)
  score = tf.nn.softmax(predictions[0])

  plt.imshow(img_array/255)
  plt.axis("off")

  class_name_predicted = class_names[np.argmax(score)]
  score_predicted = 100 * np.max(score)

  print(
      f"This image is most likely a document from {class_name_predicted} with a {score_predicted:.2f} percent confidence."
  )

Now, we can use our helper method and classify new images.

show_and_predict('./ids-dataset/test/0001.jpeg')

First prediction example

show_and_predict('./ids-dataset/test/0005.png')

Second prediction example