A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions (1705.02498v1)

Published 6 May 2017 in cs.CV

Abstract: Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs.

Authors (2)

Samuel Dodge (6 papers)
Lina Karam (10 papers)

Citations (404)

View on Semantic Scholar

Summary

The paper demonstrates that human classification performance remains robust under noise and blur, unlike fine-tuned DNNs which still lag behind.
The study employs controlled experiments using a subset of ImageNet dog breed images and evaluates models like VGG16, Inception, and ResNet.
The findings highlight the need for DNN architectures to incorporate principles of human visual processing to improve resilience in real-world applications.

Recognition Performance of Humans and DNNs Under Visual Distortions

The paper "A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions" by Samuel Dodge and Lina Karam investigates the performance robustness of deep neural networks (DNNs) compared to human subjects when faced with visual distortions such as noise and blur. This paper critically examines whether the current DNN models can replicate the visual robustness inherent to the human visual system.

The authors employ a comparative experimental approach, contrasting the classification abilities of DNNs and humans when subjected to varied levels of Gaussian noise and Gaussian blur. While DNNs have demonstrated superior classification performance on standard, undistorted images, often exceeding human capabilities, this advantage does not hold under distorted conditions. The research shows that humans outperform DNNs when images are compromised by noise and blur, highlighting a significant gap in the robustness between human visual processes and DNN-based models.

Methodology and Experiments

The authors used a subset of the ImageNet dataset, focusing specifically on 10 classes of dog breeds to make the classification task non-trivial yet manageable. The testing set included images with incremental levels of both blur and noise distortions. The images were presented to 15 human subjects, obtained via Amazon Mechanical Turk, who classified these images after a comprehensive training phase. For DNN evaluation, the authors employed VGG16, Google Inception version 3, and ResNet models, with modifications to suit the reduced class set. The DNNs were also fine-tuned on distorted datasets to potentially improve their recognition under these conditions.

In the experiments, the authors observed that human subjects demonstrated a consistent classification accuracy even under severe visual distortions. Conversely, fine-tuned DNNs showed improved performance under distortions compared to their non-tuned counterparts; however, they still fell short of human levels. It's particularly noteworthy that human classification errors had low correlation with DNN errors, suggesting fundamentally different mechanisms of image representation and processing.

Results and Implications

The findings underscore a critical deficiency in current DNNs: their lack of robustness to realistic image distortions. While DNNs perform admirably on clean, standard images, their performance significantly degrades on distorted ones, pointing to a gap in their ability to generalize across varying image qualities—something the human visual system accomplishes inherently.

This gap has profound practical implications for deploying AI in environments where image quality varies unpredictably, such as automated surveillance, self-driving cars, or medical imaging, where occasional distortions are inevitable. It suggests the necessity for new approaches in DNN architecture or training regimens that incorporate principles from human visual processing to strengthen recognition robustness.

Future Directions

The paper advocates for leveraging insights from the human visual system to inform the development of more resilient DNN architectures. It suggests that exploring DNN structural mimicry of biological visual pathways or incorporating noise and distortion robustness explicitly into training might yield networks that better align with human capabilities. Additionally, further research could investigate if associative mechanisms, like transfer learning or cross-domain adaptation, could enhance robustness to distortions.

This research opens a pivotal conversation on improving DNN robustness, emphasizing that surpassing human-level performance on standard datasets is insufficient; comparable human robustness against imperfect conditions is equally crucial for the widespread usability of AI technologies.

PDF Markdown