Image Classification is basis and important task in computer vision. It is supervised learning problem. It has some of the target classes and we train a model to recognize them using training data. This train data some time can be large or can be small for very complex problems. What if we don’t have enough data to train or image classifier? Transfer learning is the solution for limited training data.
This paper will provide brief understanding about how the image classifier works. Going forward there will be design and implementation of effective convolutional neural network for image classification on dataset of 10 animals. Then there will be an implementation of another image classifier based in transfer learning where will be use AlexNet pretrained network model.
Finally, both implementations will be evaluated bases on different parameter like accuracy, number of images used in training, network architecture. Index Terms – Deep Learning, Image Classification, Transfer Learning, Convolutional neural network
Image Classification is the well-known problem in computer vision. It is a task to categorize all the pixels in a digital image into one of the predefined classes. It forms the basis for other computer vision tasks such as localization, detection, and segmentation. Although the task can be considered second nature for humans, it is much more challenging for an automated system.
This can be done using machine learning algorithms which are supervise learning or unsupervised learning algorithms, for this task we are going to use supervise learning algorithms. In supervise learning we are given an input image and label of that input image and we have to create an algorithmic function which learns from the various features of input image.
One of the algorithms which can achieve this task with good accuracy is Neural Network. It is an algorithm biologically-inspired programming paradigm which enables a computer to learn from observational data. Deep learning is the technique for learning in neural network. In the recent years deep learning models are used in nonlinear information processing, for feature extraction and transformation as well as for pattern analysis and classification.
One of the deep learning framework that got famous is Convolutional neural network (CNN) for most image recognition, classification, and detection tasks. CNN are built of multilayer neural network which compromises various convolutional layers and then followed by fully connected layers. This framework can learn to perform tasks without being programmed with any task-specific rules for example CNN might learn to identify images that contain dogs by analyzing the input images that are labeled as dog or no dog. CNN do this without any prior knowledge about the different features of dog like its face structure, eye size, color etc.
Main problem with CNN’s are they are require more data and processing power which every body might not have. Additionally, the image classifiers build from CNN can classify only the given images into the categories in which it was train. If we want to use that classifier to work on other categories we have to train it from the scratch which might take many hours or days. To overcome this problem we will use transfer learning, It is the method in which we use pretrained model which is trained by someone else for the similar task instead of building a model from scratch to solve similar model we just replace the first few layers and tweak the model according to our use eg AlexNet(2012),GoogleNet(2014),Inveption-V3(2015).
In this paper we are going to design and implement a image classifier using Convolutional neural network on data set of 10 animals( ‘dog’ ,’cat’, ‘butterfly’ ,’chicken’, ‘cow’,’elephant’,’horse’,’sheep’,’spider’,’squirrel’). Then we are going apply transfer learning using AlexNet to improve our classification results.
CONVOLUTIONAL NEURAL NETWORK(CNN)
BASIC OF CNN Neural Networks are the mathematical model to solve an optimization problem. There are basic computation units in the neural network called neurons. Neurons take some input and perform some computation on it to produce a value. This is passed to a function called activation function to produce the final output of a neuron. There are many kinds of activation function one of them in sigmoid. Neurons which uses sigmoid function are called sigmoid neurons. There are many kinds of neurons like RELU,TanH etc.
One neuron can be connected to multiple neurons. Fig. 1 . One Neuron(it’s the function) Xi represent the input and Wi represent weight of each input connection. Each connection has a different weight value while bias (B) is the property of the neuron.
Layers are made up of stacks of neurons. The collection of layers are organize into three main part
- Input layers
- Hidden layers
- Output layers
Deep learning imply that there can be n hidden layers. Fig. 2 Layers Structure of CNN
TYPES OF LAYERS
Neurons in each layers performs the similar mathematical operations. That is how each layer get its name. Few layers that are commonly used.
Convolution is an operation which is used in signal processing to filter signal, find patterns in a signals etc. It is the main building block of CNN in the convolutional layer. In this layer we apply convolution on the input image using convolution filter to produce feature map. Fig.3 Showing convolution operation I * K is the feature map. In the above image we slide our window by 1 pixel at a time. In some cases it can be more than 1 pixel number of pixels that we slides at the time is called stride.
After convolution we can see the feature map is smaller than the input. Since we have maintained the same dimensionality to do that, we will use padding into input image. If we apply convolution on padded image, then the result after convolution will be of same dimension as of input.
This layer is used after convolutional layer to reduce the size(I.e height and width). This layer reduce the number of parameters and thus reduce the computation. The most common pooling technique is Max pooling where we take a filter of N*N and apply maximum operation over N*N sized part of the image. Fig. 4 Max pooling example In the above Fig. We used 2 X 2 filter for max pooling and got the output as shown above.
FULLY CONNECTED LAYERS
It is layers after convolution and pooling layer. In this layer every pervious neurons are connected to each neurons of fully connected layers. This layer combines the features learned by previous convolution and poling layers and feed it to next fully connected layer. The last fully connected layer combines all learned features to classify the images. Therefor last layer will be having the same number of neurons as the number of classes in classification. Fig. 5 Fully Connected Layers Image source – http://machinethink.net/images/mps-matrix-multiplication/FullyConnectedLayer.png
Batch Normalization Layer: Batch normalization layer normalize the activations and gradients propagating through a neural network, thus making network training easier optimization problem. It uses batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers, to speed up network training and reduce the sensitivity to network initialization.
Softmax Layer: This activation function normalizes the output of the fully connected layer. The output of the SoftMax layer consists of positive number that sum to one, which can then be used as classification probabilities by the classification layer.
The final layer is the classification layer. This layer uses the probabilities returned by the softmax activation function for each input to assign the input to one of the mutually exclusive classes and compute the loss.
To build the image classifier for 10 different classes of animals( ‘dog’ ,’cat’, ‘butterfly’ ,’chicken’, ‘cow’,’elephant’,’horse’,’sheep’,’spider’,’squirrel’). We are going to preprocess the image before giving it to neural network. We gather data from kaggle which as of arbitrary size so we resize it to 100 X 100 pixels now we are having all images with same dimension. There are 1500 images per categories I.e There are 15000 images in training for our classifier. Fig. 6 Training Set
There are few parameters that need to setup for training the neural network. We will be using stochastic gradient descent with momentum (SGDM) for optimizing neural networks. learning rate = 0.01 No of epochs = 10 We will shuffle the training data at every epoch.
Transfer learning is reusing the knowledge gained while solving one problem and applying to different but related problem. It is very popular in the field of deep learning because it enable deep neural network to get train faster with much less data. This is very useful for real world problem because we don’t have many data labels to train complex models.
There are two approaches to Transfer learning.
- Develop model Approach In this we develop the predictive model for the first task then we use that model for the second or consecutive tasks.
- Pre-Trained model Approach In this we used already trained model which are trained on large dataset then finetune its some parameter to use for our predictive model We are going to use second approach.
Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initializes weights from scratch. AlexNet is neural network model trained on more than million images and can classify images in 1000 categories such as person, road, coffee mug etc. It is neural network model which are less deep than other available models. Fig. 7 Diagram to show transfer learning
NEW NETWORK ARCHITECTURE
In the new model we will replace last three layers of the network and set fully connected layers to have the same size as the number of classes in the new data sets. To learn faster will increase the learning rate in newly fully connected layer.
To build the image classifier for 10 different classes of animals( ‘dog’ ,’cat’, ‘butterfly’ ,’chicken’, ‘cow’,’elephant’,’horse’,’sheep’,’spider’,’squirrel’). We are going to preprocess the image before giving it to neural network. We gather data from kaggle which as of arbitrary size, so we resize it to 227 X 227 pixels now we are having all images with same dimension. There are 500 images per categories I.e. There are 5000 images in training for our classifier. Since we are going to use predefined model it will give use better accuracy on fewer images.
There are few parameters that need to setup for training the neural network. We will be using stochastic gradient descent with momentum (SGDM) for optimizing neural networks. learning rate = 0.0001 Learning rate here is too small because we need to slow down the leaning in the transferred layer. We have already increased the learning speed in fully connected layer and this combination of leeds to faster learning in newer layer and slower in other layers. No of epochs = 6 We try to keep our number of epoch (it is full training cycle in full data set) low because we don’t want to change weights of our pretrained neural network.
- Iteration vs Accuracy and Iteration vs Loss for CNN model. Fig. 8 MATLAB plot of number of iteration vs accuracy and loss for CNN model form scratch
- Iteration vs Accuracy and Iteration vs Loss of Transfer Learning using AlexNet. Fig. 9 MATLAB plot of number of iteration vs accuracy and loss for Transfer learning model In the above plot we can see how the training and validation accuracy increases on every epoch.
- Predicted output of CNN model Fig. 10 CNN model Prediction We can see from above image CNN model did few wrong predictions.
- Predicted output from Transfer Learning. 4.5 Confusion Matrix for CNN Validation Data Fig 12 Confusion Matrix of CNN model In the Fig 12 horizontal
- This model is able the learn feature of butterfly class more accurately(95.5% highest true positive rate) then other classes in the model
- Model face difficulty in learning the features of cat (highest error rate 38.6%(false positive rate) ). 4.6 Confusion Matrix for fine-tuned network Fig.13 confusion matrix for fine-tuned model There are 150 images in validation set.
- This model was able to learn of chicken category more accuraty(highest True positive rate 95.7%)
- Model face difficulty in learning the features of sheep(highest error rate 25.0%(false positive rate)
Fine-tuning the pretrained model give better accuracy than the model that we build and trained form scratch using various convolution layers. Below are there few stats about both models. It can be deduce from above experimental result that Transfer learning is an optimization, a shortcut to saving time and getting the better performance.
CNN Training Fine-tuned Model Validation Accuray 55.57% 85.27 Training Accuracy 76.26 96.71% Number of images used per class 1500 500 Image Size 100X100X3(size with highest accuracy) 227X227X3 Learning rate 0.01 0.0001 Number of epochs 10 6 6.
In this paper, we develop two important methods of image classification using neural networks to classify 10 different animals. In the first method we implemented deep convolutional neural network of 25 layers starting from input layers to final classification layer.
In the Second method we used Pre-trained neural network, AlexNet and train our dataset after fine-tuning few parameters. Moving forward we evaluate two method bases on different parameter like accuracy, number of images used in training etc. There is also a confusion matrix for each method to give insight of how two method effected the classification results of the given data set. Ideal approach to classification problem is that if you have abundant data and recourse to develop a model then its better to create your own model (neural network model) with our own parameters. If we don’t have than we can use pre-trained model available that can use as starting point for your model.
- F. Giannini , V. Laveglia, A. Rossi,D. Zanca,A. Zugarini “Neural Networks for Beginners”
- Jiang Wang,Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu1 “CNN-RNN: A Unified Framework for Multi-label Image Classification”
- Zhongling Huang, Zongxu Pan, and Bin Lei “Transfer Learning with Deep Convolutional Neural Network for SAR Target Classiﬁcation with Limited Labeled Data”
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson “How transferable are features in deep neural networks? ”
- Sinno Jialin Pan, James T. Kwok and Qiang Yang “Transfer Learning via Dimensionality Reduction”
- Lisa Torrey and Jude Shavlik “Handbook on Transfer Learning”
- Shin-Jye Lee , Tonglin Chen, Lun Yu, And Chin-Hui La “Image Classification Based on the Boost Convolutional Neural Network