Various classification methods and algorithms have been adopted by the remote sensing community. Some of the methods use techniques like support vector machines (SVMs), k-means clustering, Gaussian process (GP), random forest (RF), extreme learning machines (ELM) and deep neural network classifiers. However, the accuracies of these models are not up to the mark.
The challenges that a hyperspectral classification model need to deal with are high complexity due to high spectral resolution having hundreds of bands and limited availability of labeled data for training purpose as it is expensive as well as time consuming. An ideal model should have the potential to characterize the spectral-spatial features of the hyperspectral images. Other implementation which have been adopted like regular stacked auto-encoders (SAE), sparse auto-encoders (SSA) and deep belief networks (DBN) generate spatial information loss due to their two dimensional nature.
Thereafter, CNN based classifiers outperformed the previous implementations and the benchmark was laid by Chen et al. to classify remotely sensed HSI data. Since then there have been various improvements to this CNN based model using novel architectures and embedding algorithms. Even though these methods showed performance benefits, they struggle to handle the challenges of data complexity and limited training samples.
CNNs fared well in comparison with the earlier models because of the fact that convolutional act as a tool to detect the spectral spatial features present in the highly complex data. The convolutional layers present at the beginning capture the simple features of the image while the layers at the end capture higher level representations. But, the CNN models fail to exploit the relation between features detected at different positions within the image.
The introduction of max pooling layers help in detecting higher order features from a larger area, but it does not consider the relation between the simple and complex features. Moreover, the max pooling layer, under the pretext of downsampling the feature space, leads to the loss of vital information. Thus, the CNNs underperform when the input data contain rotation, tilts or any other changes in its orientation and thus cannot map the spatial relationships.
Deeper CNN architectures help mitigate this problem but such an architecture requires large amount of data for convergence and may lead to the vanishing gradient problem due to the poor propagation of gradients. ResNet and DenseNet based CNN extensions have been proposed to learn the spectral spatial features, both of which have the deeper architectures.
CNN drawbacks: summarize, diagrams
Considering the aforementioned underperformance of the earlier models and the limitations and drawbacks of the CNN based models, the paper proposes a novel CNN based model which is based on the Hinton’s Capsule Networks (CapsNets). The model achieves high accuracy for Hyperspectral Image Classification and at the same time, it significantly reduces the overall complexity of the network.