Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future (2001.07092v2)

Published 20 Jan 2020 in q-bio.NC, cs.CV, and cs.NE

Abstract: Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNS in vision research beyond basic object recognition.

Citations (382)

View on Semantic Scholar

Summary

The paper demonstrates that CNNs effectively mimic the hierarchical structure of the biological visual system, building on insights from early visual cortex research.
The paper validates CNN models through rigorous neural and behavioral comparisons, revealing strong parallels with visual processing in key brain areas.
The paper outlines future challenges, emphasizing the need for biologically plausible connectivity and advanced learning regimens to enhance model realism.

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

The paper authored by Grace W. Lindsay provides a critical review of the role of Convolutional Neural Networks (CNNs) as models of the biological visual system, elucidating their development, validation, and potential in vision research. This examination offers valuable insights into the synergy between computational and biological paradigms of vision processing, while also addressing the scientific methodologies pertinent to evaluating these models.

Origins and Development

Convolutional Neural Networks find their conceptual roots in the foundational work of Hubel and Wiesel who identified distinct cell types in the primary visual cortex (V1) of cats, namely simple and complex cells. These discoveries led to the formulation of the Neocognitron by Fukushima, which served as a precursor to contemporary CNNs by modeling these cells' operations. The hierarchical construction of CNNs parallels the ventral visual pathway's layered processing of visual stimuli, a structure that has proven vital in object recognition tasks.

Despite significant milestones, such as the widespread recognition following the success of AlexNet on the ImageNet challenge, CNN architectures continue to evolve. Variations such as VGG models and ResNets explore deeper network configurations to optimize image processing effectiveness. This evolution underscores the continuous refinement of models to better grasp the complexity of visual systems.

Validation Techniques

Validation of CNNs as computational models of the visual system relies on several sophisticated methodologies. These models are engineered to exhibit architectural parity with biological systems, involving hierarchical layers that draw parallels with visual areas such as V1, V2, V4, and IT, each possessing distinct retinotopic and feature map configurations.

Two primary metrics employed in validation efforts include:

Neural Comparisons: These involve correlating the activity of artificial units within CNNs to that of biological neurons when exposed to identical stimuli. The predictive power of CNNs on neural responses, particularly towards complex neural architectures like V4 and IT, has surpassed previous methodologies.
Behavioral Comparisons: These evaluate CNNs against human-like performance. A detailed assessment of model misclassification can yield deeper insights into the congruence of human and model behavior, revealing areas of alignment and disparity.

Additional forms of validation encompass visualizations of CNNs to discern the feature representations across layers, aligning these representations to known visual processing phenomena.

Insights from Model Variations

Experimentation with different datasets, architectures, and training methodologies provides further insight into biological visual mechanisms. For instance:

Scene Recognition: By training CNNs with scene-focused datasets, the model's predictive power extends to cortical areas involved in spatial and object processing, suggesting pathways for understanding area-specific visual functions in the brain.
Social and Structural Variations: Implementations such as local recurrence and feedback connectivity are drawing parallels with biological feedback and selective attention mechanisms, offering promising avenues for enhancing model realism and effectiveness.
Training Regimens: Beyond supervised learning via backpropagation, unsupervised and reinforcement learning approaches are gaining traction, offering potential modalities for more biologically realistic models. These methods present opportunities and challenges in achieving models that both supplement and rival biological processes.

Future Directions

The paper delineates several areas for further exploration within the domain of CNNs in vision research. They include refining CNN architectures to incorporate more biologically plausible connectivity and learning methods and addressing the limitations of current CNN models in replicating non-primate visual systems. Efforts to incorporate spiking neural dynamics and biologically-driven noise into CNNs are seen as potential pathways for creating models that mirror biological neural systems more closely.

Moreover, opportunities to extend CNN applications beyond static object classification to more dynamic and interactive visual tasks align with the need to bridge the gap between computational models and the holistic capacities of biological vision.

In conclusion, while the convergence of CNN architecture with that of the biological visual system has opened unprecedented insights into neural processing mechanisms, ongoing research is crucial to further bridge these models with the complex, multifaceted nature of biological vision, achieving greater understanding and application in both cognitive neuroscience and machine learning domains.

PDF Markdown