Compute Trends Across Three Eras of Machine Learning (2202.05924v2)

Published 11 Feb 2022 in cs.LG, cs.AI, and cs.CY

Abstract: Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). In this paper we study trends in the most readily quantified factor - compute. We show that before 2010 training compute grew in line with Moore's law, doubling roughly every 20 months. Since the advent of Deep Learning in the early 2010s, the scaling of training compute has accelerated, doubling approximately every 6 months. In late 2015, a new trend emerged as firms developed large-scale ML models with 10 to 100-fold larger requirements in training compute. Based on these observations we split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era. Overall, our work highlights the fast-growing compute requirements for training advanced ML systems.

Authors (6)

Jaime Sevilla (12 papers)
Lennart Heim (21 papers)
Anson Ho (10 papers)
Tamay Besiroglu (20 papers)
Marius Hobbhahn (19 papers)
Pablo Villalobos (4 papers)

Citations (220)

View on Semantic Scholar

Summary

The paper assembles a curated dataset of 123 milestone ML systems to trace compute evolution over distinct historical eras.
It reports varying doubling times: approximately 21.3 months in the Pre Deep Learning era, 5.7 months in the Deep Learning era, and 9.9 months for large-scale models.
The study underscores compute’s pivotal role in ML progress and highlights strategic challenges in sustaining large-scale training infrastructures.

Analysis of Compute Trends in Machine Learning: A Synthesis of Historical Trajectories

The paper "Compute Trends Across Three Eras of Machine Learning" offers an incisive examination of compute as a foundational parameter that has historically accelerated advancements in machine learning. The authors disaggregate the evolution of training compute into three distinguished epochs: the Pre Deep Learning Era (1952-2010), the Deep Learning Era (2010-2022), and the emergent Large-Scale Era (starting around 2015-2016). These epochs highlight shifts in the doubling time of training compute, with significant implications on the structuring and resource allocation of ML systems over time.

Key Findings

Curated Dataset: The authors assembled a dataset comprising 123 milestone ML systems, each annotated with its respective training compute. This dataset, both comprehensive and rigorously vetted, forms the backbone of the paper. It uncovers trends that delineate the Pre Deep Learning, Deep Learning, and Large-Scale eras.
Distinct Eras and Trends:
- Pre Deep Learning Era: During this period, compute requirements doubled approximately every 21.3 months, aligning closely with Moore's law.
- Deep Learning Era: Marked by the resurgence and proliferation of deep learning methodologies, this era observed a substantial reduction in doubling time to approximately 5.7 months.
- Large-Scale Era: Signifying a departure towards compute-intensive specialized models, this era arose approximately in 2015-2016. Large-scale models, chiefly funded by corporate entities capable of substantial compute investments, showed a doubling time of 9.9 months.
Validation and Comparative Analysis: Against previous studies such as Amodei et al. (2018), which suggested a faster doubling time of 3.4 months between 2012 and 2018, this paper reveals a nuanced narrative. It finds that the compute trend involves complex overlaps and separations not captured in prior frameworks, highlighting the evolution post-2016 with separate trajectories for large-scale and regular-scale models.

Implications and Speculative Outlook

Strategic Importance of Compute Resources: The advanced ML systems' progression underscores the necessity of extensive compute infrastructure and specialist knowledge for staying competitive in the machine learning sphere. As compute requirements soar, organizations with substantial engineering resources or access to large computing clusters will remain at the forefront of ML research.
Saturation and Challenge in Large-Scale Models: The diminishing rate of doubling time observed in large-scale models intimates at possible saturation—attributable not merely to financial constraints but to infrastructural and engineering complexity as well. These challenges may underscore an inflection toward optimizing computational efficiency or innovating alternate strategies, such as more autonomous intelligent systems that perform well under constrained compute scenarios.
Future Prospects with Data and Algorithms: While compute remains pivotal, data and algorithmic improvements play a significant role. Future work could focus on assessing how the interplay of these factors constrains or accelerates the ML models' capabilities. Moreover, understanding dataset trends and optimizing data utilization will become invaluable.

Conclusion

This paper contributes a crucial layer of understanding regarding how compute has driven machine learning's architecture and trajectory. By examining nuanced distinctions between different eras of ML development, the authors provide a broader perspective on the field’s compute-centric growth and its subsequent implications. Such insights are invaluable for experts aiming to map out the sustainability and future directions of machine learning technology amidst escalating demands and evolving computational landscapes.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ohlennart/status/1833882564949332333

https://twitter.com/jwithing/status/1796710426970849763

https://twitter.com/hamzanaveedh/status/1779995361450311886

https://twitter.com/roninhahn/status/1786064362400047547

https://twitter.com/BenPielstick/status/1793645773332303904

YouTube

Show All Videos