On the Expressive Power of Deep Learning: A Tensor Analysis (1509.05009v3)

Published 16 Sep 2015 in cs.NE, cs.LG, cs.NA, and stat.ML

Abstract: It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.

Citations (457)

View on Semantic Scholar

Summary

The paper shows that deep networks, using hierarchical tensor decompositions, capture complex functions with polynomial resources, contrasting dramatically with shallow models.
It employs arithmetic circuits and measure theory to rigorously prove that nearly all functions realizable by deep architectures need exponential resources in shallow counterparts.
The findings suggest practical neural design strategies, such as small pooling windows and convolutional architectures, to optimize efficiency and expressiveness.

On the Expressive Power of Deep Learning: A Tensor Analysis

The paper by Cohen, Sharir, and Shashua examines the expressive efficiency of deep learning models through the lens of tensor analysis. The authors focus on the empirical success of deep hierarchical networks for compositional data, such as text and images, and aim to provide a theoretical grounding for this phenomenon by employing arithmetic circuits and tensor decompositions.

Expressive Power of Depth

The crux of the research lies in establishing an equivalence between certain deep networks and hierarchical tensor factorizations, specifically contrasting shallow networks, which correspond to CP (rank-1) decompositions, with deep networks that align with Hierarchical Tucker decompositions. The paper posits that deep networks, which utilize locality, sharing, and pooling—features inherent to convolutional networks—can represent complex functions with polynomial size, while shallow networks would require exponentially larger structures to achieve the same expressive power.

Theoretical Insights and Results

Using measure theory and matrix algebra, the authors rigorously prove that, except for a negligible set, functions implemented by deep networks of polynomial size necessitate exponential size for shallow networks to approximate them. This conclusion extends to shared-coefficient models, illustrating the intrinsic advantage of deep architectures even under sharing constraints.

Key Theorems

Theorem of Network Capacity: Highlights an almost-everywhere separation between deep and shallow models in terms of expressive efficiency.
Generalization: Provides a broader perspective on network depth, quantifying the exponential resource cost when layers are progressively removed.

Practical and Theoretical Implications

The findings underscore the architectural merits of deep models, particularly convolutional arithmetic circuits (SimNets), reinforcing their empirical robustness with theoretical affirmation. This work not only sheds light on the depth-versus-width debate but also suggests practical design strategies for network architectures, such as adopting small pooling windows to preserve depth efficiency.

Future Directions and Impact

The interplay between tensor decompositions and neural network expressiveness opens new avenues for advancing AI capabilities. Future work could explore alternative tensorial frameworks and extend the application of these theoretical insights to address practical challenges, such as model interpretability and efficiency.

In conclusion, the paper offers a compelling analysis of why depth in neural networks proves to be significantly more expressive and efficient, providing a solid mathematical foundation for the observed successes of deep learning in handling complex, structured data.

PDF Markdown