<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=160269078105920&amp;ev=PageView&amp;noscript=1">
This service has been removed!
Service successfully added!
Sign up for our blog, talk to a specialist, or just send us an email

Machine Learning for Image Classification

Wednesday 13 of February, 2019./ Reading Time: 8 minutes./ By Jason Solano

We are living in a digital era where the billions of people around the world are making transactions that result in the transfer of more data than we can comprehend, per second. Simply put, this data contains information regarding people’s decisions. More than the data we transfer, we also possess a ridiculous amounts of data as a result of everything from medical images to videos or even scientists’ datasets such as images of space and earth.


In order to analyze this data effectively and efficiently, it is important to implement algorithms to detect patterns in datasets. By doing so, on one end of the spectrum, enterprises can find which products are more attractive for a specific group of people. On the other, these algorithms can be used to help doctors obtain improved interpretations of medical images. The possibilities are limitless, with the right algorithms.

Deep Learning

Previously known as cybernetics, this category has existed since the 1940s. In the 1990s, it became known as connectionism. It wasn’t until 2006 that Deep Learning was officially adopted as a name. But what does Deep Learning actually mean? It is a series of mathematical models based on biological learning, known as artificial neural networks (ANNs), named as such due to the neuroscientific perspective applied to the current breed of machine learning models. 

This article will demonstrate how to classify means of transportation using a convolutional neural network with keras framework in Python. The chosen means of transportation consist of motorcycles, cars (sedan), and buses. The dataset used is based on images extracted from the page ImageNet.com.

Clean dataset

When you have a dataset, the first thing you need to do is instruct the dataset to be input in the machine learning model. In this case, it’s a neural network, which means the images are going to have the same shape; for example, 300x300 and using the same segmentation technique.

The segmentation technique used in this article is Otsu’s Method. This segmentation transforms a specific image in grayscale to a binary image. This means the image is going to have only two colors, black and white. But what is important about this segmentation is the Thresholding method that replaces each pixel in an image with a black pixel according to the constant obtained by Otsu’s method. The colors white and black are represented by 0 and 1; hence, the matrix is formed with 300x300 using only ones and zeros. This makes it easier for the neural network to detect the corresponding patterns when an image is a bus, car or motorcycle.


Screen Shot 2019-02-12 at 2.23.23 PM


Screen Shot 2019-02-12 at 2.24.44 PM

Training Data

Each neural network needs a training procedure implemented by the dataset. The training data has labels that are represented by one hot coding. For this example, we will try to classify them into 3 differents classes: motorcycles, vehicles (sedans), and buses. Each object is represented by the following code:

Screen Shot 2019-02-12 at 2.26.05 PM

Each np.array (array of numpy library) with shape(1,3) represents the number assigned to the specific label. For example:

  • Car [1,0,0] = 0
  • Bus [0,1,0] = 1
  • motorcycle [0,0,1] = 2

Function for the training dataset, built for the neural network:

Screen Shot 2019-02-12 at 2.27.17 PM

Convolutional Neural Networks

The Convolutional Neural Networks are used to classify images and to detect objects in images; for instance, to detect the words present in a determined image. Another important aspect about CNN (Convolutional Neural Networks) is the use of filters and layers with its own activation function that detects when some patterns are present in a determined image.

For example:

Screen Shot 2019-02-12 at 2.28.00 PM

  • Pooling:
Pooling consists of obtaining the highest values of the filter (3x3 matrix fig:4), using a matrix with a 2x2 shape that moves two columns or two rows depending on the steps indicated by parameters. The single depth slice represents the filter, and the 2x2 matrix is the pooling matrix. (fig:3)

Screen Shot 2019-02-12 at 2.29.40 PM

Screen Shot 2019-02-12 at 2.31.54 PM

  • Feature maps:

Feature maps (fig:2) are created by using the results chosen by the applied filter. With these filters and the pooling matrix, the CNN attempts to obtain patterns of the image, for example: learn that the wheels of a motorcycle are different from the wheels of buses and cars, learn that car windows are different from those of buses, and detect what edges correspond to each label.

  • Convolutional layer:

The 3x3 matrix or filter represents 9 weights that are updated by the activation function on the CNN. For example, the matrix (fig:4) that moves around the blue matrix represents the weights updated by the Relu activation function.

Suppose that the blue matrix (fig:4) is I300x300 . Each iteration on it represents the 3x3 matrix multiplied with the filter. The filter is F 3x3 and the green matrix is G mxm . The equation is F[i] I[0][i] +B[i] the result is one cell of G matrix.

Bias: this feature in neural networks is very important because it permits the variation in the result of the equation in the activation function. The bias starts at zero and the variation depends on the training; in this example the bias is represented by the B matrix.
  • Fully-Connected:

As we previously stated, each layer detects different patterns in the image for the correct classification, but we need a final layer to decide if the input image has the correct pattern to be a bus, car or motorcycle. For this process, the final feature maps are flattened to array with the values of each cell, and passed by the softmax activation function. They decide what label corresponds to the input image.

In this article, we use two extra layers: the Fully Connected that uses the relu activation function and the final layer with the softmax activation function.

Implement CNN with Keras

Keras is a powerful library with back-end tensorflow that allows us to set up with a few codes of a CNN.


Screen Shot 2019-02-12 at 4.02.43 PM

Layers and Pooling

Screen Shot 2019-02-12 at 4.48.17 PM

Each Conv2D means one layer for the CNN. The parameters represent the following information:

  1. The first parameter refers to the amount of neurons used.
  2. Kernel size: refers to the filter in (fig:2) and (fig:4)
  3. Activation: refers to the activation function used in each convolutional layer (fig:2)
  4. Input_shape: the first input image that is analysed. 

Each MaxPooling2D refers to the pooling technique applied to the feature map matrix:

  1. pool_size: the first value is the matrix shape. For example, 2 means a square 2x2 matrix. The second parameter is the step move. For example, 2 means the matrix is moving two columns or rows depending on position.

The method Batch Normalization:

Normalizes the activations of the previous layer at each batch, applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.


Screen Shot 2019-02-12 at 4.49.23 PM

Flatten: transforms the final matrix to array for evaluation by the fully connected layers. 

Dense: adds a new layer with the number of neurons and the activation function. Note: the matrix needs to be flattened after setting this feature.

Dropout: with this method, some neurons are ignored in the training step. It is used to avoid overfitting.The selection of neurons is random.

Screen Shot 2019-02-12 at 4.54.02 PM

Loss: the loss function is very important because we use it to check if the predictions of our model are fine. Keras has different loss functions. In this case, we use categorical_crossentropy.

Optimizer: in simple words, optimizer is an algorithm used to find the best way to reduce the total loss function. There are different types of optimizers, in this article we use Adam.

Metrics: metrics give us the measure of efficiency applied in the predictions. For this example, we only use the accuracy metric, but different metrics exist that can help to understand if the predictions are right.

Train the CNN

Screen Shot 2019-02-12 at 4.57.57 PM

Batch Size: the number of samples that will be used for training in one iteration.

Epoch: the number of times that all of the training data are used. For example, in this CNN the training dataset used in the neural network is used 5 times. 

Verbose: for printing the information in each epoch. 

Keras shows the information of accuracy and loss in each epoch. It is very important to pay attention to this information because, with it, we know if the hyperparameters explained in the section layers and pooling are correct. Also, with this information we can analyse if the model presents high accuracy and low loss, meaning the model has good training.

Test Model

Load test data

Screen Shot 2019-02-12 at 4.59.28 PM

Separate test images and their labels

Screen Shot 2019-02-12 at 5.00.35 PM

Evaluate Keras function

Screen Shot 2019-02-12 at 5.01.44 PM

For this case, we tested 100 images with labels of buses, motorcycles, and cars. The results show that 82 are correct but 18 are incorrect predictions according to the model. This test is very important because we can verify if the model can predict images that do not correspond with the trained dataset: in others words, images that are new for our model.


The model developed in this article shows an abstract way of how CNN can be implemented. Although the model may be improved by adding or removing layers, by changing the number of neurons in each layer, and by changing other hyperparameters, in this article we focused on how to implement a CNN. Another important aspect is the training dataset. For training, we only used 2541 images of buses, cars (sedans), and motorcycles. To achieve the best dataset training, the use of more images is recommended; for instance, a mnist dataset for classifying handwritten numbers uses 70000 images. 

Check and test all the algorithms present in this article. You can download the repository in the github link. Additionally, you can use this model for classifying other categories with images. This model is based on the architecture of the article “Image Classification Keras Tutorial: Kaggle Dog Breed Challenge” by Connor Shorten. Also, feel free to check out the references of this article so you can better understand CNN and their features.

About Avantica

If you are looking for a software partner who will work towards your own business goals and success, then Avantica is your solution. We offer dedicated teams, team augmentation, and individual projects to our clients, and are constantly looking for the best methodologies in order to give you the best results.

Let's Start a Project Together



About the Author








Natural Language Processing Using iOS and Azure
Have You Considered AWS Fargate? A practical use case scenario