Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantized Approximately Orthogonal Recurrent Neural Networks (2402.04012v2)

Published 5 Feb 2024 in cs.NE, cs.LG, eess.SP, math.ST, stat.TH, and cs.AI

Abstract: In recent years, Orthogonal Recurrent Neural Networks (ORNNs) have gained popularity due to their ability to manage tasks involving long-term dependencies, such as the copy-task, and their linear complexity. However, existing ORNNs utilize full precision weights and activations, which prevents their deployment on compact devices.In this paper, we explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs). The construction of such networks remained an open problem, acknowledged for its inherent instability. We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and orthogonal projections. We also study post-training quantization of the activations for pure integer computation of the recurrent loop. The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Sorting out Lipschitz function approximation. In International Conference on Machine Learning, pages 291–301. PMLR.
  2. Unitary evolution recurrent neural networks. In International Conference on Machine Learning, pages 1120–1128. PMLR.
  3. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
  4. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166.
  5. An iterative algorithm for computing the best estimate of an orthogonal matrix. SIAM Journal on Numerical Analysis, 8(2):358–364.
  6. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111.
  7. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 1724.
  8. Extremely low bit transformer quantization for on-device neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4812–4826.
  9. BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 28.
  10. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353.
  11. Understanding how orthogonality of parameters improves quantization of neural networks. IEEE Transactions on Neural Networks and Learning Systems, 34(12):10737–10746.
  12. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, chapter 13, pages 291–326. Chapman and Hall/CRC.
  13. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649. IEEE.
  14. Compression of deep learning models for text: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(4):1–55.
  15. Hadamard matrices and their applications. The Annals of Statistics, 6(6):1184–1238.
  16. Orthogonal recurrent neural networks with scaled Cayley transform. In International Conference on Machine Learning, pages 1969–1978. PMLR.
  17. Recurrent orthogonal networks and long-memory tasks. In International Conference on Machine Learning, pages 2034–2042. PMLR.
  18. Hinton, G. (2012). Neural networks for machine learning. Coursera, video lectures. Lecture 15b.
  19. Long short-term memory. Neural computation, 9(8):1735–1780.
  20. Horadam, K. J. (2012). Hadamard matrices and their applications. Princeton university press.
  21. Loss-aware binarization of deep networks. In International Conference on Learning Representations.
  22. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research, 18(187):1–30.
  23. Tunable efficient unitary neural networks (EUNN) and their application to RNNs. In International Conference on Machine Learning, pages 1733–1741. PMLR.
  24. Kronecker recurrent units. In International Conference on Machine Learning, pages 2380–2389. PMLR.
  25. Keller, J. B. (1975). Closest unitary, orthogonal and Hermitian operators to a given operator. Mathematics Magazine, 48(4):192–197.
  26. projUNN: efficient method for training deep networks with unitary matrices. Advances in Neural Information Processing Systems, 35.
  27. Adam: A method for stochastic optimization. In International Conference on Learning Representations.
  28. Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In International Conference on Machine Learning, pages 3794–3803. PMLR.
  29. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
  30. Efficient orthogonal parametrisation of recurrent neural networks using Householder reflections. In International Conference on Machine Learning, pages 2401–2409. PMLR.
  31. Recurrent neural networks with limited numerical precision. arXiv preprint arXiv:1608.06902.
  32. Fully quantized transformer for machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1–14.
  33. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision, pages 525–542. Springer.
  34. Learning representations by back-propagating errors. Nature, 323(6088):533–536.
  35. A systematic literature review on binary neural networks. IEEE Access, 11:27546–27578.
  36. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27.
  37. Attention is all you need. Advances in Neural Information Processing systems, 30.
  38. On orthogonality and learning recurrent networks with long term dependencies. In International Conference on Machine Learning, pages 3570–3578. PMLR.
  39. Full-capacity unitary recurrent neural networks. Advances in Neural Information Processing Systems, 29.
  40. A comprehensive review of binary neural network. Artificial Intelligence Review, pages 12949––13013.
  41. A survey on model compression for large language models. arXiv preprint arXiv:2308.07633.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets