The simple essence of automatic differentiation (1804.00746v4)

Published 2 Apr 2018 in cs.PL

Abstract: Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution. This paper develops a simple, generalized AD algorithm calculated from a simple, natural specification. The general algorithm is then specialized by varying the representation of derivatives. In particular, applying well-known constructions to a naive representation yields two RAD algorithms that are far simpler than previously known. In contrast to commonly used RAD implementations, the algorithms defined here involve no graphs, tapes, variables, partial derivatives, or mutation. They are inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style, thanks to use of an AD-agnostic compiler plugin.

Citations (104)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Automatic differentiation in machine learning: a survey (2015)
Randomized Automatic Differentiation (2020)
You Only Linearize Once: Tangents Transpose to Gradients (2022)
Automatic Differentiation in Prolog (2023)
AD for an Array Language with Nested Parallelism (2022)

Tweets

https://twitter.com/dccsillag/status/1894529318107189746

https://twitter.com/dccsillag/status/1876398245389418870

https://twitter.com/qd_forall/status/1826731897155961156