Shape-Texture Debiased Neural Network Training (2010.05981v2)

Published 12 Oct 2020 in cs.CV

Abstract: Shape and texture are two prominent and complementary cues for recognizing objects. Nonetheless, Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset. Our ablation shows that such bias degenerates model performance. Motivated by this observation, we develop a simple algorithm for shape-texture debiased learning. To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously. Experiments show that our method successfully improves model performance on several image recognition benchmarks and adversarial robustness. For example, by training on ImageNet, it helps ResNet-152 achieve substantial improvements on ImageNet (+1.2%), ImageNet-A (+5.2%), ImageNet-C (+8.3%) and Stylized-ImageNet (+11.1%), and on defending against FGSM adversarial attacker on ImageNet (+14.4%). Our method also claims to be compatible with other advanced data augmentation strategies, eg, Mixup, and CutMix. The code is available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining.

Citations (103)

View on Semantic Scholar

Summary

The paper presents a novel debiased training method that leverages cue-conflict images to balance shape and texture features in CNNs.
It employs adaptive instance normalization to blend images, resulting in improved performance with a 1.2% top-1 accuracy gain on ResNet-152.
The method significantly enhances model robustness across benchmarks, including ImageNet-A and adversarial FGSM attacks, outperforming standard techniques.

Evaluation of Shape-Texture Debiased Neural Network Training Techniques

The paper presented explores the biases in deep convolutional neural networks (CNNs) related to shape and texture representation, a pertinent concern in advancing the robustness of image recognition systems. The paper offers a comprehensive analysis of how CNNs, particularly those trained on datasets such as ImageNet, tend to exhibit biases toward either shape or texture features, depending on the dataset's characteristics. This biased focus may significantly affect recognition performance, prompting an investigation into methods that can facilitate more balanced and comprehensive feature learning.

Methodological Approach

The authors introduce a shape-texture debiased neural network training methodology by leveraging style transfer techniques to create images with conflicting shape and texture information, termed as "cue conflict images." The intent is to help CNNs learn to incorporate both cues effectively, rather than predominantly focusing on one. The images are created by blending the shape of one image with the texture of another using adaptive instance normalization style transfer, and both shape and texture are provided as supervision signals during training via soft-label assignment.

Key Findings

The paper presents substantial empirical evidence demonstrating that models trained using the proposed debiased approach exhibit enhanced performance across several benchmarks. Specific findings include:

Performance Gains: Utilizing the debiased method, ResNet-152 achieves a 1.2% improvement in top-1 accuracy on ImageNet compared to its baseline counterpart. Notable improvements were also reported for ResNet-50 and ResNet-101 models.
Robustness Enhancement: The shape-texture debiased models significantly outperform the standard models on robustness benchmarks, including ImageNet-A, ImageNet-C, Stylized-ImageNet, and in adversarial settings with FGSM attacks.

Comparative Analysis and Integration

The proposed debiased training approach demonstrates compatibility with and complementarity to existing data augmentation strategies such as Mixup and CutMix. It further enhanced the performance when combined with these methods. An ensemble of shape-biased and texture-biased models showed marginal enhancement but was computationally more expensive compared to the proposed method.

Implications and Future Directions

This paper underscores the importance of balanced feature representation learning in CNNs. By addressing the inherent shape and texture biases, models can generalize more effectively across varied visual domains and improve resilience to adversarial attacks. The method can be extrapolated to other applications, such as semantic segmentation, using modifications to accommodate task-specific constraints.

The implications suggest a promising avenue for further research into data generation methodologies that promote debiased learning in neural networks. Future work could explore alternative transfer techniques, applications across different domains, and integration with broader architectural innovations.

In summary, this paper contributes valuable insights and methodologies to the field of computer vision, fostering the development of more versatile and robust neural networks.

PDF Markdown

Related Papers

GitHub

GitHub - LiYingwei/ShapeTextureDebiasedTraining: Code and models for the paper Shape-Texture Debiased Neural Network Training (ICLR 2021) (108 stars)