Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks (2401.03384v1)

Published 7 Jan 2024 in cs.LG and cs.CV

Abstract: Modern ConvNets continue to achieve state-of-the-art results over a vast array of vision and image classification tasks, but at the cost of increasing parameters. One strategy for compactifying a network without sacrificing much expressive power is to reshape it into a tensorial neural network (TNN), which is a higher-order tensorization of its layers, followed by a factorization, such as a CP-decomposition, which strips a weight down to its critical basis components. Passes through TNNs can be represented as sequences of multilinear operations (MLOs), where the evaluation path can greatly affect the number of floating point operations (FLOPs) incurred. While functions such as the popular einsum can evaluate simple MLOs such as contractions, existing implementations cannot process multi-way convolutions, resulting in scant assessments of how optimal evaluation paths through tensorized convolutional layers can improve training speed. In this paper, we develop a unifying framework for representing tensorial convolution layers as einsum-like strings and a meta-algorithm conv_einsum which is able to evaluate these strings in a FLOPs-minimizing manner. Comprehensive experiments, using our open-source implementation, over a wide range of models, tensor decompositions, and diverse tasks, demonstrate that conv_einsum significantly increases both computational and memory-efficiency of convolutional TNNs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8):5455–5516, 2020.
  2. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  3. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  4. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  5. Learning iot in edge: Deep learning for the internet of things with edge computing. IEEE network, 32(1):96–101, 2018.
  6. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
  7. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In 3rd International Conference on Learning Representations, ICLR 2015-Conference Track Proceedings, 2015.
  8. Tensor contraction layers for parsimonious deep nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 26–32, 2017.
  9. Tensor regression networks. Journal of Machine Learning Research, 21(123):1–21, 2020.
  10. Tensorial neural networks: Generalization of neural networks and application to model compression. arXiv preprint arXiv:1805.10352, 2018.
  11. Ultimate tensorization: compressing convolutional and fc layers alike. arXiv preprint arXiv:1611.03214, 2016.
  12. Wide compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9329–9338, 2018.
  13. Faster identification of optimal contraction sequences for tensor networks. Physical Review E, 90(3):033315, 2014.
  14. Array programming with numpy. Nature, 585(7825):357–362, 2020.
  15. Interleaved group convolutions. In Proceedings of the IEEE international conference on computer vision, pages 4373–4382, 2017.
  16. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530, 2015.
  17. Tensor decomposition via simultaneous power iteration. In International Conference on Machine Learning, pages 3665–3673, 2017.
  18. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
  19. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  20. Exploring unexplored tensor network decompositions for convolutional neural networks. Advances in Neural Information Processing Systems, 32:5552–5562, 2019.
  21. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  22. Opt\\\backslash\_einsum-a python package for optimizing contraction order for einsum-like expressions. Journal of Open Source Software, 3(26):753, 2018.
  23. Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117–158, 2014.
  24. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53–78, 2013.
  25. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends® in Machine Learning, 9(4-5):249–429, 2016.
  26. Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives. Foundations and Trends® in Machine Learning, 9(6):431–673, 2017.
  27. Convolutional rectifier networks as generalized tensor decompositions. In International Conference on Machine Learning, pages 955–963, 2016.
  28. Expressive power of recurrent neural networks. In International Conference on Learning Representations, 2018.
  29. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press, 2014.
  30. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems, pages 1269–1277, 2014.
  31. Efficient and accurate approximations of nonlinear convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1984–1992, 2015.
  32. Stable low-rank tensor decomposition for compression of convolutional neural network. In European Conference on Computer Vision, pages 522–539. Springer, 2020.
  33. Tensorizing neural networks. In Advances in Neural Information Processing Systems, pages 442–450, 2015.
  34. Ivan V Oseledets. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011.
  35. Tensor-train recurrent neural networks for video classification. In International Conference on Machine Learning, pages 3891–3900. PMLR, 2017.
  36. Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:1711.00073, 2017.
  37. Convolutional tensor-train lstm for spatio-temporal learning. arXiv preprint arXiv:2002.09131, 2020.
  38. Tensor ring decomposition. arXiv preprint arXiv:1606.05535, 2016.
  39. Block-term tensor neural networks. Neural Networks, 2020.
  40. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019.
  41. Tensorly: Tensor learning in python. Journal of Machine Learning Research, 20(26):1–6, 2019.
  42. Einops. https://github.com/arogozhnikov/einops, 2020.
  43. Fast gpu convolution for cp-decomposed tensorial neural networks. In Proceedings of SAI Intelligent Systems Conference, pages 468–487. Springer, 2020.
  44. Tednet: A pytorch toolkit for tensor decomposition networks. Neurocomputing, 469:234–238, 2022.
  45. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730, 2018.
  46. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pages 568–576, 2014.
  47. Ucf101: A dataset of 101 human actions classes from videos in the wild. ArXiv, abs/1212.0402, 2012.
  48. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  49. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  50. Yi Huang and Rajat Shrivastava. Two-stream action recognition. https://github.com/jeffreyyihuang/two-stream-action-recognition, 2019.
  51. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
  52. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  53. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206–5210. IEEE, 2015.
  54. A Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront, 2009.
  55. Hybrid tensor decomposition in neural network compression. Neural Networks, 132:309–320, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tahseen Rabbani (13 papers)
  2. Xiaoyu Liu (139 papers)
  3. David Chan (24 papers)
  4. Geoffrey Sangston (3 papers)
  5. Furong Huang (150 papers)
  6. JiaHao Su (19 papers)

Summary

We haven't generated a summary for this paper yet.