Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Layerwise complexity-matched learning yields an improved model of cortical area V2 (2312.11436v3)

Published 18 Dec 2023 in q-bio.NC, cs.CV, and cs.LG

Abstract: Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (107)
  1. Spatiotemporal energy models for the perception of motion. Josa a, 2(2):284–299, 1985.
  2. Towards a theory of early visual processing. Neural computation, 2(3):308–320, 1990.
  3. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. Advances in Neural Information Processing Systems, 35:26671–26685, 2022.
  4. Hierarchical semantic compression predicts texture selectivity in early vision. In Proceedings of the Conference on Cognitive Computational Neuroscience, 2019.
  5. Greedy layerwise learning can scale to imagenet. In International conference on machine learning, pp.  583–593. PMLR, 2019.
  6. The “independent components” of natural scenes are edge filters. Vision research, 37(23):3327–3338, 1997.
  7. Greedy layer-wise training of deep networks. Advances in neural information processing systems, 19, 2006.
  8. Eigen-distortions of hierarchical representations. Advances in neural information processing systems, 30, 2017.
  9. Deep problems with neural network models of human vision. Behavioral and Brain Sciences, pp.  1–74, 2022.
  10. Learning intermediate-level representations of form and motion from natural movies. Neural computation, 24(4):827–866, 2012.
  11. Linearity and normalization in simple cells of the macaque primary visual cortex. Journal of Neuroscience, 17(21):8621–8644, 1997.
  12. Deep clustering for unsupervised learning of visual features. In The European Conference on Computer Vision (ECCV), September 2018.
  13. The shape and simplicity biases of adversarially robust imagenet-trained cnns. arXiv preprint arXiv:2006.09373, 2020a.
  14. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020b.
  15. What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines? BioRxiv, pp.  2022–03, 2022.
  16. Simulating a primary visual cortex at the front of cnns improves robustness to image perturbations. Advances in Neural Information Processing Systems, 33:13073–13087, 2020.
  17. Scaling vision transformers to 22 billion parameters. arXiv preprint arXiv:2302.05442, 2023.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  19. A canonical microcircuit for neocortex. Neural computation, 1(4):480–488, 1989.
  20. Plenoptic: A platform for synthesizing model-optimized visual stimuli. Journal of Vision, 23(9):5822–5822, 2023.
  21. Model metamers reveal divergent invariances between biological and artificial neural networks. Nature Neuroscience, pp.  1–18, 2023.
  22. Harmonizing the object recognition strategies of deep neural networks with humans. arXiv preprint arXiv:2211.04533, 2022.
  23. Spatial and temporal frequency selectivity of neurones in visual cortical areas v1 and v2 of the macaque monkey. The Journal of physiology, 365(1):331–363, 1985.
  24. Metamers of the ventral stream. Nature neuroscience, 14(9):1195–1201, 2011.
  25. A functional and perceptual signature of the second visual area in primates. Nature neuroscience, 16(7):974, 2013.
  26. Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980.
  27. On the duality between contrastive and non-contrastive self-supervised learning. arXiv preprint arXiv:2206.02574, 2022.
  28. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020a.
  29. Beyond accuracy: quantifying trial-by-trial behaviour of cnns and humans by measuring error consistency. Advances in Neural Information Processing Systems, 33:13890–13902, 2020b.
  30. Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34:23885–23899, 2021.
  31. Feedforward object-vision models only tolerate small image variations compared to human. Frontiers in computational neuroscience, 8:74, 2014.
  32. Unsupervised representation learning by predicting image rotations. International Conference on Learning Representations (ICLR), 2018.
  33. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  34. Watching the world go by: Representation learning from unlabeled videos. arXiv preprint arXiv:2003.07990, 2020.
  35. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 2020.
  36. The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks. Nature Neuroscience, pp.  1–10, 2023.
  37. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  38. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377, 2021.
  39. Computational models of cortical visual processing. Proceedings of the National Academy of Sciences, 93(2):623–627, 1996.
  40. Perceptual straightening of natural videos. Nature neuroscience, 22(6):984–991, 2019.
  41. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8340–8349, 2021.
  42. What shapes feature representations? exploring datasets, architectures, and training. Advances in Neural Information Processing Systems, 33:9995–10006, 2020.
  43. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.
  44. A hierarchical statistical model of natural images explains tuning properties in v2. Journal of Neuroscience, 35(29):10412–10428, 2015.
  45. A multi-layer sparse coding network learns contour coding from natural images. Vision research, 42(12):1593–1605, 2002.
  46. Local plasticity rules can learn deep representations using self-supervised contrastive predictions. Advances in Neural Information Processing Systems, 34:30365–30379, 2021.
  47. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  48. Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779, 2023.
  49. Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348, 2021.
  50. Visual features as stepping stones toward semantics: Explaining object similarity in it and perception with non-negative least squares. Neuropsychologia, 83:201–226, 2016.
  51. Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457(7225):83–86, 2009.
  52. Efficient coding of natural images with a population of noisy linear-nonlinear neurons. Advances in neural information processing systems, 24, 2011.
  53. Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific reports, 6(1):32672, 2016.
  54. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  55. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp.  1097–1105, 2012.
  56. Brain-like object recognition with high-performing shallow recurrent anns. Advances in neural information processing systems, 32, 2019.
  57. Deep neural networks capture texture sensitivity in v2. Journal of vision, 20(7):21–1, 2020.
  58. Peter Lennie. Single units and visual cortical organization. Perception, 27(8):889–935, 1998.
  59. Receptive fields and functional architecture of macaque v2. Journal of neurophysiology, 71(6):2517–2542, 1994.
  60. Zhaoping Li. A theory of the visual motion coding in the primary visual cortex. Neural computation, 8(4):705–730, 1996.
  61. Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex. arXiv preprint arXiv:2306.03779, 2023.
  62. Putting an end to end-to-end: Gradient-isolated learning of representations. Advances in neural information processing systems, 32, 2019.
  63. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  64. Exploring the limits of weakly supervised pretraining. In Proceedings of the European conference on computer vision (ECCV), pp.  181–196, 2018.
  65. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. Journal of Neuroscience, 35(39):13402–13418, 2015.
  66. A unifying principle for the functional organization of visual cortex. bioRxiv, pp.  2023–05, 2023.
  67. Contribution of linear spatiotemporal receptive field structure to velocity selectivity of simple cells in area 17 of cat. Vision research, 29(6):675–679, 1989.
  68. Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation. PLOS Computational Biology, 19(10):e1011506, 2023.
  69. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607, 1996.
  70. Self-supervised learning of a biologically-inspired visual texture model. arXiv preprint arXiv:2006.16976, 2020.
  71. Self-supervised video pretraining yields human-aligned visual representations, 2023.
  72. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  73. A parametric texture model based on joint statistics of complex wavelet coefficients. International journal of computer vision, 40(1):49–70, 2000.
  74. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  75. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33):7255–7269, 2018.
  76. Alexander Riedel. Bag of tricks for training brain-like deep neural networks. In Brain-Score Workshop, 2022.
  77. Hierarchical models of object recognition in cortex. Nature neuroscience, 2(11):1019–1025, 1999.
  78. Dario L Ringach. Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of neurophysiology, 88(1):455–463, 2002.
  79. Can contrastive learning avoid shortcut solutions? Advances in neural information processing systems, 34:4974–4986, 2021.
  80. Cross-orientation suppression in visual area v2. Nature communications, 8(1):15739, 2017.
  81. A simple way to make neural networks robust against diverse image corruptions. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp.  53–69. Springer, 2020.
  82. S. Sanghavi and J. J. DiCarlo. Sanghavi2020: Documentation pdf of dataset. https://doi.org/10.17605/OSF.IO/CHWDK, Nov 2021.
  83. Sanghavijozwik2020: Documentation pdf of dataset. https://doi.org/10.17605/OSF.IO/FHY36, Nov 2021a.
  84. Sanghavimurty2020: Documentation pdf of dataset. https://doi.org/10.17605/OSF.IO/FCHME, Nov 2021b.
  85. Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv, pp.  407007, 2018.
  86. Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 2020. URL https://www.cell.com/neuron/fulltext/S0896-6273(20)30605-X.
  87. Natural signal statistics and sensory gain control. Nature neuroscience, 4(8):819–825, 2001.
  88. Do image classifiers generalize across time? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9661–9669, 2021.
  89. The contrast gain control of the cat retina. Vision research, 19(4):431–434, 1979.
  90. Blockwise self-supervised learning at scale. arXiv preprint arXiv:2302.01647, 2023.
  91. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In Proceedings., International Conference on Image Processing, volume 3, pp.  444–447. IEEE, 1995.
  92. Spatial-frequency channels, shape bias, and adversarial robustness. arXiv preprint arXiv:2309.13190, 2023.
  93. Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018, 2023.
  94. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  95. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
  96. ‘what’and ‘where’in the human brain. Current opinion in neurobiology, 4(2):157–165, 1994.
  97. J Hans Van Hateren and Arjen van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1394):359–366, 1998.
  98. Neural representation of natural images in visual area v2. Journal of Neuroscience, 30(6):2102–2114, 2010.
  99. Slow feature analysis: Unsupervised learning of invariances. Neural computation, 14(4):715–770, 2002.
  100. Unsupervised object-level representation learning from scene images. Advances in Neural Information Processing Systems, 34, 2021a.
  101. Self-supervised learning with swin transformers. arXiv preprint arXiv:2105.04553, 2021b.
  102. Loco: Local contrastive representation learning. Advances in neural information processing systems, 33:11142–11153, 2020.
  103. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23):8619–8624, 2014.
  104. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp.  12310–12320. PMLR, 2021.
  105. Richard Zhang. Making convolutional networks shift-invariant again. In International conference on machine learning, pp.  7324–7334. PMLR, 2019.
  106. Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences, 118(3):e2014196118, 2021.
  107. Selectivity and tolerance for visual texture in macaque v2. Proceedings of the National Academy of Sciences, 113(22):E3140–E3149, 2016.

Summary

We haven't generated a summary for this paper yet.