An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification (1712.03541v2)

Published 10 Dec 2017 in cs.CV, cs.LG, cs.NE, and stat.ML

Abstract: Convolutional neural networks (CNNs) are similar to "ordinary" neural networks in the sense that they are made up of hidden layers consisting of neurons with "learnable" parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.

Citations (154)

View on Semantic Scholar

Summary

The paper demonstrates that substituting Softmax with an SVM in a CNN can yield competitive image classification results on MNIST and Fashion-MNIST.
It details a CNN model with two convolutional layers, ReLU activations, pooling, and a final fully connected layer utilizing an SVM classifier.
Results reveal marginal differences between CNN-SVM and CNN-Softmax, suggesting further research on preprocessing and network sophistication.

An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification

Introduction

This paper examines the integration of Convolutional Neural Networks (CNNs) with Support Vector Machines (SVMs) for the purpose of improving image classification performance. The common practice in CNNs involves utilizing the Softmax function as the final classifier. However, the paper explores the potential advantages of replacing Softmax with an SVM classifier, motivated by previous research that suggested possible improvements with SVMs in artificial neural network architectures. The research utilizes the MNIST and Fashion-MNIST datasets to evaluate the performance of the proposed CNN-SVM hybrid model.

Methodology

CNN-SVM Architecture

The architecture proposed draws from both CNN and SVM methodologies. It employs a traditional CNN with two convolutional layers and ReLU activation functions. The architecture’s distinct characteristic is the substitution of the Softmax layer, typically used in classification tasks, with an SVM. This configuration aims to leverage the margin-based decision boundaries of SVMs, positing a competitive advantage over Softmax in certain contexts.

Figure 1: The Rectified Linear Unit (ReLU) activation function produces 0 as an output when x < 0, and then produces a linear with slope of 1 when x > 0.

The CNN architecture encompasses:

Two convolutional layers each followed by ReLU and pooling layers.
A final fully connected layer outputting ten classes.
An SVM loss function in place of Softmax to handle classification.

Datasets

The research employs two datasets:

MNIST: A standard dataset consisting of 60,000 training examples and 10,000 test instances of handwritten digits.
Fashion-MNIST: A more challenging dataset reflecting similar distribution properties to MNIST but involving clothing item images.

Experiments

Hyperparameters

Experiments were carried out using fixed hyperparameters across all models. Both models, CNN-SVM and CNN-Softmax, underwent training over 10,000 steps, maintaining identical batch sizes, learning rates, and dropout rates. The experiment was conducted using TensorFlow on equipment equipped with NVIDIA GPU support.

Results and Analysis

Empirical results indicate:

MNIST: CNN-Softmax outperformed CNN-SVM marginally with test accuracies of 99.23% and 99.04% respectively.
Fashion-MNIST: Again, CNN-Softmax demonstrated superior performance with 91.86%, compared to CNN-SVM’s 90.72%.
Figure 2: Training accuracy of CNN-Softmax and CNN-SVM on image classification using MNIST.

Figure 3: Training loss of CNN-Softmax and CNN-SVM on image classification using MNIST.

It should be noted that discrepancies with prior studies suggesting SVM superiority may stem from the lack of preprocessing and the relatively simplistic CNN model utilized here.

Conclusion

The experimental results do not universally substantiate previous claims of CNN-SVM superiority over CNN-Softmax, especially given the minimal performance disparity. The observed performance might be influenced by the absence of advanced preprocessing and the simplicity of the CNN architecture implemented. Future work should explore more advanced preprocessing techniques and sophisticated CNN architectures to verify the potential performance benefits of SVMs as classifiers in neural networks. The authors suggest that further exploration is warranted to re-evaluate these conclusions under different settings and hyperparameter configurations.