Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta predictive learning model of languages in neural circuits (2309.04106v2)

Published 8 Sep 2023 in cs.CL and q-bio.NC

Abstract: LLMs based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in LLMs. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of LLMs. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv:2303.12712, 2023.
  2. Haiping Huang. Eight challenges in developing theory of intelligence. arXiv:2306.11232, 2023.
  3. Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999.
  4. Yanping Huang and Rajesh P. N. Rao. Predictive coding. WIREs Cognitive Science, 2(5):580–593, 2011.
  5. James C. R. Whittington and Rafal Bogacz. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity. Neural Computation, 29(5):1229–1262, 2017.
  6. Predictive coding: a theoretical and experimental review. arXiv:2107.12979, 2021.
  7. Karl Friston. Does predictive coding have a future? Nature Neuroscience, 21(8):1019–1021, 2018.
  8. Rick A Adams George R. Mangun Pascal Fries Andree M. Bastos, W. Martin Usrey and Karl J. Friston. Canonical microcircuits for predictive coding. Neuron, 76:695–711, 2012.
  9. Hippocampus as a generative circuit for predictive coding of future sequences. bioRxiv, 2022.
  10. Predictive coding across the left fronto-temporal hierarchy during language comprehension. Cerebral Cortex, 33(8):4478–4497, 2023.
  11. Constrained predictive coding as a biologically plausible model of the cortical hierarchy. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 14155–14169. Curran Associates, Inc., 2022.
  12. Predictive coding can do exact backpropagation on convolutional and recurrent neural networks. arXiv:2103.03725, 2021.
  13. Predictive coding: towards a future of deep learning beyond backpropagation? arXiv:2202.09467, 2022.
  14. Energy–information trade-off induces continuous and discontinuous phase transitions in lateral predictive coding. arXiv:2302.11681, 2023.
  15. Probabilistic brains: knowns and unknowns. Nature neuroscience, 16(9):1170–1178, 2013.
  16. Spine dynamics in the brain, mental disorders and artificial neural networks. Nature Reviews Neuroscience, 22(7):407–422, 2021.
  17. Brain-inspired computational intelligence via predictive coding. 2023.
  18. Ensemble perspective for understanding temporal credit assignment. Physical Review E, 107(2):024307, 2023.
  19. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  20. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137–1155, 2003.
  21. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, pages 3104–3112, 2014.
  22. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014.
  23. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.
  24. Neural machine translation by jointly learning to align and translate. In ICLR 2015 : International Conference on Learning Representations 2015, 2015.
  25. The emergence of number and syntax units in lstm language models. arXiv:1903.07435, 2019.
  26. Building a large annotated corpus of english: The penn treebank. Comput. Linguist., 19(2):313–330, 1993.
  27. Learning credit assignment. Physical Review Letters, 125(17):178301, 2020.
  28. Brains and algorithms partially converge in natural language processing. Communications Biology, 5:134, 2022.
  29. Dissociating language and thought in large language models: a cognitive perspective. arXiv:2301.06627, 2023.
  30. Noise in the nervous system. Nature Reviews Neuroscience, 9(4):292–303, 2008.
  31. Liqun Luo. Architectures of neuronal circuits. Science, 373(6559):1103, 2021.
  32. Robert Rosenbaum. On the relationship between predictive coding and backpropagation. Plos one, 17(3):e0266102, 2022.
  33. Dendritic cortical microcircuits approximate the backpropagation algorithm. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31, page 8721–8732. Curran Associates, Inc., 2018.
  34. What can we learn from synaptic weight distributions? Trends in Neurosciences, 30:622–629, 2007.
  35. Haiping Huang. Role of zero synapses in unsupervised feature learning. Journal of Physics A: Mathematical and Theoretical, 51(8):08LT01, 2018.
  36. https://github.com/Qjbtiger/Meta-predictive-coding.
  37. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  38. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pages 3111–3119, Red Hook, NY, USA, 2013. Curran Associates Inc.
  39. Scaling laws for neural language models. arXiv:2001.08361, 2020.
  40. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
  41. Haiping Huang. Statistical Mechanics of Neural Networks. Springer, Singapore, 2022.
  42. Predictive coding beyond gaussian distributions. arXiv:2211.03481, 2022. in NeurIPS 2022.
  43. Terrence J. Sejnowski. Large Language Models and the Reverse Turing Test. Neural Computation, 35(3):309–342, 2023.
  44. Weight uncertainty in transformer network for the traveling salesman problem. In 2023 International Symposium of Electronics Design Automation (ISEDA), pages 219–224, 2023.
  45. Gated recurrent neural networks discover attention. 2309.01775, 2023.
  46. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5156–5165. PMLR, 2020.
  47. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
Citations (1)

Summary

We haven't generated a summary for this paper yet.