Data Augmentation using Random Image Cropping and Patching for Deep CNNs (1811.09030v2)

Published 22 Nov 2018 in cs.CV and cs.LG

Abstract: Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19\%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO.

Citations (303)

View on Semantic Scholar

Summary

The paper introduces RICAP, a novel technique that creates composite images via random cropping and patching, significantly reducing CNN overfitting.
It demonstrates superior performance with a 2.19% test error on CIFAR-10 using the Shake-Shake model compared to conventional augmentation methods.
RICAP’s versatility extends to tasks like image-caption retrieval and object recognition, enhancing overall model generalization across benchmarks.

Data Augmentation using Random Image Cropping and Patching for Deep CNNs

This paper investigates a novel data augmentation technique designed to enhance the training of deep convolutional neural networks (CNNs). The proposed method, Random Image Cropping And Patching (RICAP), tackles the persistent issue of overfitting, a common problem when training CNNs, especially as model capacity continues to increase with architectural advancements.

RICAP innovatively augments datasets by selecting, cropping, and patching multiple images to create a new composite image. This methodology not only increases the diversity of available training samples but also mixes class labels in proportion to the areas contributed by each component image. This label mixing introduces soft labels which offer the advantage of smoother class boundaries.

Key Contributions and Results

RICAP Methodology: RICAP operates by sampling four images and determining a boundary point to crop and patch these images into one cohesive training instance. This process is executed randomly, which injects substantial variety into the dataset already at the image composition stage.
Performance Enhancements: On established benchmarks such as CIFAR-10, CIFAR-100, and ImageNet, models employing RICAP demonstrated superior performance over both traditional augmentation techniques and contemporary methods like cutout and mixup. Notably, an impressive test error rate of 2.19% was achieved on CIFAR-10 using the RICAP-augmented Shake-Shake model, setting a new standard for this dataset.
Broader Applicability: RICAP's benefits extend beyond classification tasks, improving performance metrics in image-caption retrieval with Microsoft COCO and achieving enhanced precision and recall in tasks involving object recognition and person re-identification. These results indicate that RICAP contributes positively to a model's ability to generalize, even in contexts where partial information might be presented during training.

Implications and Future Work

RICAP aligns with current trends in CNN design which emphasize increased parameterization and depth. By focusing not only on increasing dataset variation but also incorporating label smoothness, RICAP facilitates a significant reduction in overfitting. This framework helps in sustaining model accuracy even as network complexity grows, addressing a primary concern for AI research and practical deployment.

A potential avenue for future exploration includes further understanding of how label mixing proportionally impacts learning dynamics across different classes and scenarios, particularly in environments with class imbalance or when dealing with noisy labels. Moreover, the implications of RICAP in other domains can be profound, potentially influencing methods for data augmentation in text or multimodal data analysis.

Conclusion

This paper presents a compelling case for the integration of RICAP into the arsenal of data augmentation strategies for deep learning. It demonstrates that strategic pseudorandom image composition can significantly fortify a CNN's capacity to generalize, fostering deep learning's ability to tackle complex tasks with efficiency and robustness. As data collection remains a costly endeavor, such augmentation techniques that maximize the utility of existing datasets continue to be of paramount importance. RICAP offers a promising, versatile approach to meeting these challenges head-on.

PDF Markdown