Risks from Learned Optimization in Advanced Machine Learning Systems (1906.01820v3)

Published 5 Jun 2019 in cs.AI

Abstract: We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.

Citations (127)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/joshuaac0716/status/1809163762970337355

https://twitter.com/shawnwillden/status/1773772116649021693

https://twitter.com/CFGeek/status/1752420640999780460

YouTube

Show All Videos

Risks from Learned Optimization in Advanced Machine Learning Systems (1906.01820v3)

Summary

Related Papers

Tweets

YouTube