Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification (1706.01171v2)

Published 5 Jun 2017 in cs.CV

Abstract: Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.

Citations (233)

View on Semantic Scholar

Summary

The paper introduces TEX-Nets, a novel framework integrating LBP encoding with CNNs to enhance texture and remote sensing scene classification.
The paper compares early and late fusion strategies, finding that late fusion consistently outperforms standalone RGB CNNs in challenging conditions.
The paper demonstrates significant performance gains across multiple datasets, notably improving large-scale aerial scene classification.

Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

The paper presented in the paper under discussion explores an innovative approach to texture recognition and remote sensing scene classification by leveraging the strengths of binary pattern encoding and deep neural network architectures. This research introduces a novel methodology termed TEX-Nets, which integrates Local Binary Patterns (LBP) within Convolutional Neural Networks (CNNs) to enhance texture description robustness against varying imaging conditions such as scale, illumination, and viewpoint.

Methodological Approach

The core contribution of the paper lies in augmenting the traditional CNN paradigm with pre-processed input data crafted from LBP codes. These codes are not simply fed into CNNs but are creatively mapped into a 3D metric space using Multi-Dimensional Scaling (MDS) techniques. This transformation facilitates the effective confluence of texture information with standard RGB channels within a single network architecture.

The proposed TEX-Nets are realized through two primary fusion strategies:

Early Fusion: This strategy involves aggregating RGB and texture coded images at the network's input layer, resulting in a six-channel input for the CNN, which is trained to learn joint features from combined modalities.
Late Fusion: Here, the RGB and texture coded networks are trained separately before their feature maps are combined at higher layers in the network, such as fully connected (FC) layers.

Experimental Validation

The empirical evaluation of TEX-Nets spans four widely-regarded datasets for texture recognition—DTD, KTH-TIPS-2a, KTH-TIPS-2b, and Texture-10—and four datasets for remote sensing scene classification—UC-Merced, WHU-RS19, RSSCN7, and AID. The experiments demonstrate that the late fusion strategy consistently outperforms the standalone RGB CNNs and the early fusion approach. The superior performance of TEX-Nets is particularly evident in scenarios involving complex textures and varied natural environments.

Crucially, the implementation of TEX-Nets led to substantial improvements over existing state-of-the-art techniques, particularly on the AID dataset, which presents a diverse and large-scale challenge with aerial scenes from multiple geographic locations.

Implications and Future Directions

The introduction of LBP-encoded inputs into CNN frameworks highlights the potential for hand-crafted features to complement deep learning methodologies. The enhanced classification performance in both texture recognition and remote sensing illustrates the practical significance of this integration for applications where robustness to imaging conditions is paramount, such as environmental monitoring and land resource management.

Theoretically, this research prompts a re-examination of the role that deterministic feature descriptors can play in end-to-end learning systems traditionally dominated by purely data-driven approaches. The adaptive versatility introduced by fusion strategies like those in TEX-Nets could inform future developments in multimodal learning systems across various domains.

Looking forward, potential avenues for advancement include the exploration of other robust handcrafted descriptors alongside LBP and refining fusion strategies for better effectiveness. Additionally, expanding the capability of TEX-Nets to handle full-sized satellite images inclusive of spectral bands beyond RGB, such as Near Infrared, could substantially benefit remote sensing analytics.

In conclusion, the synthesis of binary pattern encoding with CNNs in the presented paper contributes a significant augmentation to texture and scene classification methods, setting a foundation for future explorations in amalgamating handcrafted and deep learning methodologies.