Language Modeling Using Tensor Trains (2405.04590v1)
Abstract: We propose a novel tensor network LLM based on the simplest tensor network (i.e., tensor trains), called `Tensor Train LLM' (TTLM). TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion. We demonstrate that the architectures of Second-order RNNs, Recurrent Arithmetic Circuits (RACs), and Multiplicative Integration RNNs are, essentially, special cases of TTLM. Experimental evaluations on real LLMing tasks show that the proposed variants of TTLM (i.e., TTLM-Large and TTLM-Tiny) outperform the vanilla Recurrent Neural Networks (RNNs) with low-scale of hidden units. (The code is available at https://github.com/shuishen112/tensortrainlm.)
- Exact holographic tensor networks for the motzkin spin chain. Quantum, 5:546, 2021.
- A maximum likelihood approach to continuous speech recognition. IEEE transactions on pattern analysis and machine intelligence, (2):179–190, 1983.
- Parameterized machine learning for high-energy physics. arXiv preprint arXiv:1601.07913, 2016.
- Chapter 1 - tensor decompositions: computations, applications, and challenges. In Liu, Y. (ed.), Tensors for Data Processing, pp. 1–30. Academic Press, 2022.
- Bridle, J. S. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing, pp. 227–236. Springer, 1990.
- Convolutional rectifier networks as generalized tensor decompositions. In International Conference on Machine Learning, pp. 955–963. PMLR, 2016.
- On the expressive power of deep learning: A tensor analysis. In Conference on learning theory, pp. 698–728. PMLR, 2016.
- UCI machine learning repository, 2017. URL https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption.
- Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618–629, 2015.
- Deep learning, volume 1. MIT Press, 2016.
- First-order versus second-order single-layer recurrent neural networks. IEEE Transactions on Neural Networks, 5(3):511–513, 1994.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Mining pure high-order word associations via information geometry for information retrieval. ACM Transactions on Information Systems (TOIS), 31(3):1–32, 2013.
- Itskov, M. Tensor Algebra and Tensor Analysis for Engineers: With Applications to Continuum Mechanics. Springer Publishing Company, Incorporated, 2nd edition, 2009. ISBN 3540939067.
- Expressive power of recurrent neural networks. In International Conference on Learning Representations, 2018.
- Tensor regression networks. The Journal of Machine Learning Research, 21(1):4862–4882, 2020.
- Benefits of depth for long-term memory of recurrent networks. 2018.
- Critical behavior from deep dynamics: a hidden dimension in natural language. arXiv preprint arXiv:1606.06737, 2016.
- Marcinkiewicz, M. A. Building a large annotated corpus of english: The penn treebank. Using Large Corpora, 273, 1994.
- Marcus, G. F. Rethinking eliminative connectionism. Cognitive psychology, 37(3):243–282, 1998.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pp. 416–423. IEEE, 2001.
- Language modeling with a general second-order rnn. In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4749–4753, 2020.
- Language model evaluation beyond perplexity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5328–5339, Online, August 2021. Association for Computational Linguistics.
- Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
- Context dependent recurrent neural network language model. In 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 234–239. IEEE, 2012.
- Tensor networks for probabilistic sequence modeling. In International Conference on Artificial Intelligence and Statistics, pp. 3079–3087. PMLR, 2021.
- Are biological systems poised at criticality? Journal of Statistical Physics, 144(2):268–302, 2011.
- Tensorizing neural networks. Advances in neural information processing systems, 28, 2015.
- Exponential machines. arXiv preprint arXiv:1605.03795, 2016.
- Tensor-train density estimation. In Uncertainty in Artificial Intelligence, pp. 1321–1331. PMLR, 2021.
- Oseledets, I. V. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011.
- Tensor network language model. arXiv preprint arXiv:1710.10248, 2017.
- Language as a matrix product state. arXiv preprint arXiv:1711.01416, 2017.
- Using the output embedding to improve language models. arXiv preprint arXiv:1608.05859, 2016.
- Connecting weighted automata and recurrent neural networks through spectral learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1630–1639. PMLR, 2019.
- Radev, D. Clair collection of fraud email, acl data and code repository. ADCR2008T001, 2008.
- Rendle, S. Factorization machines. In 2010 IEEE International conference on data mining, pp. 995–1000. IEEE, 2010.
- Boosted decision trees as an alternative to artificial neural networks for particle identification. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 543(2-3):577–584, 2005.
- Supervised learning with tensor networks. Advances in Neural Information Processing Systems, 29, 2016.
- Generating text with recurrent neural networks. In ICML, 2011.
- Criticality in large-scale brain fmri dynamics unveiled by a novel point process analysis. Frontiers in physiology, 3:15, 2012.
- Tomita, M. Dynamic construction of finite-state automata from examples using hill-climbing. In Proceedings of the Fourth Annual Conference of the Cognitive Science Society, pp. 105–108, 1982.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Werbos, P. J. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
- An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural computation, 2(4):490–501, 1990.
- On multiplicative integration with recurrent neural networks. Advances in neural information processing systems, 29, 2016.
- A generalized language model in tensor space. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 7450–7458, 2019.