Papers
Topics
Authors
Recent
2000 character limit reached

Neural population geometry and optimal coding of tasks with shared latent structure (2402.16770v2)

Published 26 Feb 2024 in q-bio.NC, cond-mat.dis-nn, cond-mat.stat-mech, cs.LG, and cs.NE

Abstract: Humans and animals can recognize latent structures in their environment and apply this information to efficiently navigate the world. However, it remains unclear what aspects of neural activity contribute to these computational capabilities. Here, we develop an analytical theory linking the geometry of a neural population's activity to the generalization performance of a linear readout on a set of tasks that depend on a common latent structure. We show that four geometric measures of the activity determine performance across tasks. Using this theory, we find that experimentally observed disentangled representations naturally emerge as an optimal solution to the multi-task learning problem. When data is scarce, these optimal neural codes compress less informative latent variables, and when data is abundant, they expand these variables in the state space. We validate our theory using macaque ventral stream recordings. Our results therefore tie population geometry to multi-task learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Abstract representations emerge in human hippocampal neurons during inference behavior. bioRxiv, 2023.
  2. The geometry of cortical representations of touch in rodents. Nature Neuroscience, 26(2):239–250, 2023.
  3. Semi-orthogonal subspaces for value mediate a tradeoff between binding and generalization. ArXiv, 2023.
  4. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967, 2020.
  5. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature communications, 12(1):6456, 2021.
  6. Le Chang and Doris Y Tsao. The code for facial identity in the primate brain. Cell, 169(6):1013–1028, 2017.
  7. Abstract representations emerge naturally in neural networks trained to perform multiple tasks. Nature Communications, 14(1):1040, 2023.
  8. Rotational dynamics reduce interference between sensory and memory representations. Nature neuroscience, 24(5):715–726, 2021.
  9. Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.
  10. What is a cognitive map? organizing knowledge for flexible behavior. Neuron, 100(2):490–509, 2018.
  11. Mapping of a non-spatial dimension by the hippocampal–entorhinal circuit. Nature, 543(7647):719–722, March 2017.
  12. Hippocampal neurons construct a map of an abstract value space. Cell, 184(18):4640–4650.e10, September 2021.
  13. Geometry of abstract learned knowledge in the hippocampus. Nature, 595(7865):80–84, July 2021.
  14. Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5):1066–1075.e5, June 2019.
  15. Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292):1464–1468, June 2016.
  16. Place cells, grid cells, and the brain’s spatial representation system. Annu. Rev. Neurosci., 31:69–89, 2008.
  17. Large-scale neural recordings call for new insights to link brain and behavior. Nature neuroscience, 25(1):11–19, 2022.
  18. SueYeon Chung and LF Abbott. Neural population geometry: An approach for understanding biological and artificial neural networks. Current opinion in neurobiology, 70:137–144, 2021.
  19. Classification and geometry of general perceptual manifolds. Physical Review X, 8(3):031003, 2018.
  20. Separability and geometry of object manifolds in deep neural networks. Nature communications, 11(1):746, 2020.
  21. Linear classification of neural manifolds with correlated variability. Physical Review Letters, 131(2):027301, 2023.
  22. Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences, 119(43):e2200800119, 2022.
  23. Reorganization between preparatory and movement population responses in motor cortex. Nature communications, 7(1):13239, 2016.
  24. Motor cortex embeds muscle-like commands in an untangled population response. Neuron, 97(4):953–966, 2018.
  25. A neural population mechanism for rapid learning. Neuron, 100(4):964–976, 2018.
  26. Cerebellar granule cell axons support high-dimensional representations. Nature neuroscience, 24(8):1142–1150, 2021.
  27. High-dimensional geometry of population responses in visual cortex. Nature, 571(7765):361–365, 2019.
  28. The neural code for face memory. BioRxiv, pages 2021–03, 2021.
  29. Three unfinished works on the optimal storage capacity of networks. Journal of Physics A: Mathematical and General, 22(12):1983, 1989.
  30. Modeling the influence of data structure on learning in neural networks: The hidden manifold model. Physical Review X, 10(4):041044, 2020.
  31. Learning curves of generic features maps for realistic datasets with a teacher-student model. Advances in Neural Information Processing Systems, 34:18137–18151, 2021.
  32. A Engel and C Van den Broeck. Statistical Mechanics of Learning. Cambridge University Press, 2005.
  33. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. Journal of Neuroscience, 35(39):13402–13418, 2015.
  34. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
  35. The gaussian equivalence of generative models for learning with shallow neural networks. In Mathematical and Scientific Machine Learning, pages 426–471. PMLR, 2022.
  36. Universality of max-margin classifiers. arXiv preprint arXiv:2310.00176, 2023.
  37. Origami in n dimensions: How feed-forward networks manufacture linear separability, 2022.
  38. Explicit information for category-orthogonal object properties increases along the ventral stream. Nature neuroscience, 19(4):613–622, 2016.
  39. Factorized visual representations in the primate visual system and deep neural networks. bioRxiv, pages 2023–04, 2023.
  40. Efficient sensory encoding and bayesian inference with heterogeneous neural populations. Neural computation, 26(10):2103–2134, 2014.
  41. Towards a theory of early visual processing. Neural computation, 2(3):308–320, 1990.
  42. Efficient coding of spatial information in the primate retina. Journal of Neuroscience, 32(46):16256–16264, 2012.
  43. The hippocampus as a predictive map. Nature neuroscience, 20(11):1643–1653, 2017.
  44. Neural learning rules for generating flexible predictions and computing the successor representation. Elife, 12:e80680, 2023.
  45. A unified theory for the computational and mechanistic origins of grid cells. Neuron, 111(1):121–137, 2023.
  46. The successor representation in human reinforcement learning. Nature human behaviour, 1(9):680–692, 2017.
  47. Ida Momennejad. Learning structures: predictive representations, replay, and generalization. Current Opinion in Behavioral Sciences, 32:155–166, 2020.
  48. Task-dependent optimal representations for cerebellar learning. Elife, 12:e82914, 2023.
  49. Sparseness and expansion in sensory representations. Neuron, 83(5):1213–1226, 2014.
  50. Optimal degrees of synaptic connectivity. Neuron, 93(5):1153–1164, 2017.
  51. Optimal routing to cerebellum-like structures. Nature neuroscience, 26(9):1630–1641, 2023.
  52. Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity. Current opinion in neurobiology, 70:113–120, 2021.
  53. Spectrum dependent learning curves in kernel regression and wide neural networks. In International Conference on Machine Learning, pages 1024–1034. PMLR, 2020.
  54. Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature communications, 12(1):2914, 2021.
  55. Population codes enable learning from few examples by shaping inductive bias. Elife, 11:e78606, 2022.
  56. A spectral theory of neural prediction and alignment. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  57. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013.
  58. A mathematical theory of semantic development in deep neural networks. Proceedings of the National Academy of Sciences, 116(23):11537–11546, 2019.
  59. Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press, 2005.
  60. Adam: A method for stochastic optimization, 2017.
Citations (1)

Summary

  • The paper develops an analytical framework linking neural population geometry with optimal generalization performance in tasks sharing latent structure.
  • It quantifies neural activity using key geometric measures—such as neural-latent correlation and spectral flattening—to predict performance across artificial and biological systems.
  • It validates the theoretical predictions on both multi-layer perceptrons and macaque neural data, demonstrating improved disentangled representations and signal-noise separation.

Neural Population Geometry and Optimal Coding of Tasks with Shared Latent Structure

Introduction

This paper addresses the analytical theory connecting neural population geometry to generalization performance in multi-task learning scenarios. The authors posit that geometric properties of neural activity patterns, specifically four key measures, determine the ability to generalize across tasks that share a latent structure. Through this framework, they identify the mechanisms by which disentangled neural representations emerge optimally, facilitating the navigation and application of latent environmental variables to various tasks.

Theory of Multi-Task Learning

The paper examines the role of neural population geometry in supporting tasks reliant on a shared latent structure. Specifically, it develops a model where binary classification tasks are generated by stimuli formed from latent variables, subjected to linear separation in the latent space (Figure 1). Figure 1

Figure 1: Schematic of the task and model setup using images from the d-sprites dataset as an example.

The generalization error of a linear readout tasked with classifying these stimuli is linked to the statistical properties of the neural activity. The analytical theory employs Gaussian model simplifications to approximate generalization error, validating these predictions against non-linear neural activations in artificial and biological data (Figures 2, 3). Figure 2

Figure 2: Schematic of the geometric terms. Geometric patterns elicited by stimuli are shown with varying levels of correlation and factorization.

The model incorporates geometric aspects such as neural-latent correlation, signal-signal factorization, signal-noise factorization, and neural dimension, demonstrating that these measures can predict generalization error with high fidelity.

Optimal Representation of Latent Variables

An intriguing result from the theory is the emergence of disentangled representations as optimal codes for multi-task learning. The paper elucidates that optimal neural representations possess orthogonal subspaces where each latent direction corresponds to orthogonal neuronal directions (Figure 3). Figure 3

Figure 3: Optimal representational geometry as a function of training samples and latent structure.

These optimal representations adaptively expand or compress less informative latent variables depending on the availability of training samples, affirming that higher-dimensional neural activity correlates with improved generalization as data increases. The eigenstructure of the neuron-neuron covariance thus evolves with the sample size, demonstrating significant spectral flattening in optimal conditions (Figure 4). Figure 4

Figure 4: Theory predicts empirical generalization error in Gaussian model with power law covariance spectra.

Geometry of Multi-Task Learning in Artificial Networks

The study extends its analysis to non-linear MLPs, demonstrating the theory’s robustness even under non-Gaussian conditions. Validation against trained and random MLPs reveals consistent agreement between theoretical predictions and empirical generalizations (Figures 5, 6). Figure 5

Figure 5: Theory predicts generalization error in random and trained MLPs.

Trained networks exhibited improved geometric organization through sequential layers, optimizing signal-noise and signal-signal factorization while raising dimensionality, particularly through relu layers, aligning with optimal spectral strategies (Figure 6). Figure 6

Figure 6: Evolution of generalization error through training stages in MLPs.

Predicting Readout Performance in Biological Systems

Further empirical validation is conducted on macaque V4 and IT neural data, forming task labels from latent variables tied to visual stimulus categories. The application of the theoretical framework accurately predicts generalization error in biological neural responses (Figure 7). Figure 7

Figure 7: Theory predicts multi-task error in macaque V4 and IT neural data.

Comparative analysis across raw pixels, V4, and IT reveals superior generalization performance in brain regions compared to raw data, with IT optimally factorizing signal directions relative to V4, consistent with improved signal-noise separation strategies.

Conclusion

This research delineates an analytical pathway from neural population geometry to multi-task learning efficacy. Through exploring the geometry of neural activity, the study articulates a coherent mechanism underlying both artificial and biological systems’ ability to generalize across shared latent tasks. The results not only provide insight into disentangled representation emergence as optimal learning codes but also hypothesize future developments in AI inspired by biological efficiency, offering predictive tools for decoding neural dynamics and facilitating targeted cognitive computational models.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 1 like about this paper.