- The paper shows that deep networks, using hierarchical tensor decompositions, capture complex functions with polynomial resources, contrasting dramatically with shallow models.
- It employs arithmetic circuits and measure theory to rigorously prove that nearly all functions realizable by deep architectures need exponential resources in shallow counterparts.
- The findings suggest practical neural design strategies, such as small pooling windows and convolutional architectures, to optimize efficiency and expressiveness.
On the Expressive Power of Deep Learning: A Tensor Analysis
The paper by Cohen, Sharir, and Shashua examines the expressive efficiency of deep learning models through the lens of tensor analysis. The authors focus on the empirical success of deep hierarchical networks for compositional data, such as text and images, and aim to provide a theoretical grounding for this phenomenon by employing arithmetic circuits and tensor decompositions.
Expressive Power of Depth
The crux of the research lies in establishing an equivalence between certain deep networks and hierarchical tensor factorizations, specifically contrasting shallow networks, which correspond to CP (rank-1) decompositions, with deep networks that align with Hierarchical Tucker decompositions. The paper posits that deep networks, which utilize locality, sharing, and pooling—features inherent to convolutional networks—can represent complex functions with polynomial size, while shallow networks would require exponentially larger structures to achieve the same expressive power.
Theoretical Insights and Results
Using measure theory and matrix algebra, the authors rigorously prove that, except for a negligible set, functions implemented by deep networks of polynomial size necessitate exponential size for shallow networks to approximate them. This conclusion extends to shared-coefficient models, illustrating the intrinsic advantage of deep architectures even under sharing constraints.
Key Theorems
- Theorem of Network Capacity: Highlights an almost-everywhere separation between deep and shallow models in terms of expressive efficiency.
- Generalization: Provides a broader perspective on network depth, quantifying the exponential resource cost when layers are progressively removed.
Practical and Theoretical Implications
The findings underscore the architectural merits of deep models, particularly convolutional arithmetic circuits (SimNets), reinforcing their empirical robustness with theoretical affirmation. This work not only sheds light on the depth-versus-width debate but also suggests practical design strategies for network architectures, such as adopting small pooling windows to preserve depth efficiency.
Future Directions and Impact
The interplay between tensor decompositions and neural network expressiveness opens new avenues for advancing AI capabilities. Future work could explore alternative tensorial frameworks and extend the application of these theoretical insights to address practical challenges, such as model interpretability and efficiency.
In conclusion, the paper offers a compelling analysis of why depth in neural networks proves to be significantly more expressive and efficient, providing a solid mathematical foundation for the observed successes of deep learning in handling complex, structured data.