An Overview of Catastrophic AI Risks (2306.12001v6)
Abstract: Rapid advancements in AI have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitigate them. This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans. For each category of risk, we describe specific hazards, present illustrative stories, envision ideal scenarios, and propose practical suggestions for mitigating these dangers. Our goal is to foster a comprehensive understanding of these risks and inspire collective and proactive efforts to ensure that AIs are developed and deployed in a safe manner. Ultimately, we hope this will allow us to realize the benefits of this powerful technology while minimizing the potential for catastrophic outcomes.
- David Malin Roodman “On the probability distribution of long-term changes in the growth rate of the global economy: An outside view”, 2020
- Tom Davidson “Could Advanced AI Drive Explosive Economic Growth?”, 2021
- Carl Sagan “Pale Blue Dot: A Vision of the Human Future in Space” New York: Random House, 1994
- Roman V Yampolskiy “Taxonomy of Pathways to Dangerous Artificial Intelligence” In AAAI Workshop: AI, Ethics, and Society, 2016
- Keith Olson “Aum Shinrikyo: once and future threat?” In Emerging Infectious Diseases 5, 1999, pp. 513–516
- Kevin M. Esvelt “Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics” In Geneva Papers Geneva Centre for Security Policy, 2022
- Siro Igino Trevisanato “The ’Hittite plague’, an epidemic of tularemia and the first record of biological warfare.” In Medical hypotheses 69 6, 2007, pp. 1371–4
- U.S.Department State “Adherence to and Compliance with Arms Control, Nonproliferation, and Disarmament Agreements and Commitments”, 2022
- Robert Carlson “The changing economics of DNA synthesis” Number: 12 Publisher: Nature Publishing Group In Nature Biotechnology 27.12, 2009, pp. 1091–1094
- Sarah R. Carter, Jaime M. Yassif and Chris Isaac “Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance”, 2023
- “Dual use of artificial-intelligence-powered drug discovery” In Nature Machine Intelligence, 2022
- “Highly accurate protein structure prediction with AlphaFold” In Nature 596.7873, 2021, pp. 583–589
- “Machine learning-assisted directed protein evolution with combinatorial libraries” In Proceedings of the National Academy of Sciences 116.18 National Acad Sciences, 2019, pp. 8852–8858
- “Can large language models democratize access to dual-use biotechnology?”, 2023
- Max Tegmark “Life 3.0: Being human in the age of artificial intelligence” Vintage, 2018
- Leanne Pooley “We Need To Talk About A.I.” New Zealand, 2020
- Richard Sutton [@RichardSSutton] “It will be the greatest intellectual achievement of all time. An achievement of science, of engineering, and of the humanities, whose significance is beyond humanity, beyond life, beyond good and bad.” In Twitter, 2022
- Richard Sutton “AI Succession” In Youtube, 2023
- “Prevalence of Psychopathy in the General Adult Population: A Systematic Review and Meta-Analysis” In Frontiers in Psychology 12, 2021
- U.S.Department State Office of The Historian “U.S. Diplomacy and Yellow Journalism, 1895–1898”
- “Online Human-Bot Interactions: Detection, Estimation, and Characterization” In ArXiv abs/1703.03107, 2017
- “Artificial Influence: An Analysis Of AI-Driven Persuasion” In ArXiv abs/2303.08721, 2023
- Anna Tong “What happens when your AI chatbot stops loving you back?” In Reuters, 2023
- Pierre-François Lovens “Sans ces conversations avec le chatbot Eliza, mon mari serait toujours là” In La Libre, 2023
- “Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News” In Social Media + Society 6, 2020
- Moin Nadeem, Anna Bethke and Siva Reddy “StereoSet: Measuring stereotypical bias in pretrained language models” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Online: Association for Computational Linguistics, 2021, pp. 5356–5371
- Evan G. Williams “The Possibility of an Ongoing Moral Catastrophe” In Ethical Theory and Moral Practice 18.5, 2015, pp. 971–982
- The Nucleic Acid Observatory Consortium “A Global Nucleic Acid Observatory for Biodefense and Planetary Health” In ArXiv abs/2108.02678, 2021
- Toby Shevlane “Structured access to AI capabilities: an emerging paradigm for safe AI deployment” In ArXiv abs/2201.05159, 2022
- “Towards best practices in AGI safety and governance: A survey of expert opinion”, 2023 arXiv:2305.07153
- Yonadav Shavit “What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring” In ArXiv abs/2303.11341, 2023
- Anat Lior “AI Entities as AI Agents: Artificial Intelligence Liability and the AI Respondeat Superior Analogy” In Torts & Products Liability Law eJournal, 2019
- “Artificial Intelligence Act: How the EU can take on the challenge posed by general-purpose AI systems” Mozilla Foundation, 2022
- Paul Scharre “Army of None: Autonomous Weapons and The Future of War” Norton, 2018
- DARPA “AlphaDogfight Trials Foreshadow Future of Human-Machine Symbiosis”, 2020
- Panel Experts on Libya “Letter dated 8 March 2021 from the Panel of Experts on Libya established pursuant to resolution 1973 (2011) addressed to the President of the Security Council”, 2021
- David Hambling “Israel used world’s first AI-guided combat drone swarm in Gaza attacks” New Scientist, 2021
- Zachary Kallenborn “Applying arms-control frameworks to autonomous weapons” In Brookings, 2021
- J.E. Mueller “War, Presidents, and Public Opinion”, UPA book University Press of America, 1985
- Matteo E. Bonfanti “Artificial intelligence and the offense–defense balance in cyber security” In Cyber Security Politics: Socio-Technological Transformations and Political Fragmentation, CSS Studies in Security and International Relations Taylor & Francis, 2022, pp. 64–79
- “The Threat of Offensive AI to Organizations” In Computers & Security, 2023
- Kim Zetter “Meet MonsterMind, the NSA Bot That Could Wage Cyberwar Autonomously” In Wired, 2014
- “The Flash Crash: High-Frequency Trading in an Electronic Market” In The Journal of Finance 72.3, 2017, pp. 967–998
- Michael C Horowitz “The Diffusion of Military Power: Causes and Consequences for International Politics” Princeton University Press, 2010
- Robert E. Jervis “Cooperation under the Security Dilemma” In World Politics 30, 1978, pp. 167–214
- Richard Danzig “Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority”, 2018
- Billy Perrigo “Bing’s AI Is Threatening Users. That’s No Laughing Matter” In Time, 2023
- “In A.I. Race, Microsoft and Google Choose Speed Over Caution” In The New York Times, 2023
- Thomas H. Klier “From Tail Fins to Hybrids: How Detroit Lost Its Dominance of the U.S. Auto Market” In RePEc Federal Reserve Bank of Chicago, 2009
- Robert Sherefkin “Ford 100: Defective Pinto Almost Took Ford’s Reputation With It” In Automotive News, 2003
- Lee Strobel “Reckless Homicide?: Ford’s Pinto Trial” And Books, 1980
- “Grimshaw v. Ford Motor Co.”, 1981
- Paul C. Judge “Selling Autos by Selling Safety” In The New York Times, 1990
- Theo Leggett “737 Max crashes: Boeing says not guilty to fraud charge” In BBC News, 2023
- Edward Broughton “The Bhopal disaster and its aftermath: a review” In Environmental Health 4.1, 2005, pp. 6
- Charlotte Curtis “Machines vs. Workers” In The New York Times, 1983
- “Examples of AI Improving AI”, 2023 URL: https://ai-improving-ai.safe.ai
- Stuart Russell “Human Compatible: Artificial Intelligence and the Problem of Control” Penguin, 2019
- Dan Hendrycks “Natural Selection Favors AIs over Humans” In ArXiv abs/2303.16200, 2023
- Dan Hendrycks “The Darwinian Argument for Worrying About AI” In Time, 2023
- Richard C. Lewontin “The Units of Selection” In Annual Review of Ecology, Evolution, and Systematics 1, 1970, pp. 1–18
- “Facebook use predicts declines in subjective well-being in young adults” In PloS one, 2013
- “Intercommunity interactions and killings in central chimpanzees (Pan troglodytes troglodytes) from Loango National Park, Gabon” In Primates; Journal of Primatology 62, 2021, pp. 709–722
- Anne E Pusey and Craig Packer “Infanticide in Lions: Consequences and Counterstrategies” In Infanticide and parental care Taylor & Francis, 1994, pp. 277
- Peter D. Nagy and Judit Pogany “The dependence of viral RNA replication on co-opted host factors” In Nature Reviews. Microbiology 10, 2011, pp. 137–149
- Alfred Buschinger “Social Parasitism among Ants: A Review” In Myrmecological News 12, 2009, pp. 219–235
- Greg Brockman, Ilya Sutskever and OpenAI “Introducing OpenAI”, 2015
- Devin Coldewey “OpenAI shifts from nonprofit to ‘capped-profit’ to attract capital” In TechCrunch, 2019
- Kyle Wiggers, Devin Coldewey and Manish Singh “Anthropic’s $5B, 4-year plan to take on OpenAI” In TechCrunch, 2023
- Center AI Safety “Statement on AI Risk (“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”)”, 2023 URL: https://www.safe.ai/statement-on-ai-risk
- “Aum Shinrikyo: Insights into How Terrorists Develop Biological and Chemical Weapons”, 2012 URL: https://www.jstor.org/stable/resrep06323
- “Datasheets for datasets” In Communications of the ACM 64.12, 2021, pp. 86–92
- “Intriguing properties of neural networks” In CoRR, 2013
- “Unsolved Problems in ML Safety” In arXiv preprint arXiv:2109.13916, 2021
- John Uri “35 Years Ago: Remembering Challenger and Her Crew” In NASA, 2021
- International Atomic Energy Agency “The Chernobyl Accident: Updating of INSAG-1”, 1992
- “The Sverdlovsk anthrax outbreak of 1979.” In Science 266 5188, 1994, pp. 1202–8
- “Fine-tuning language models from human preferences” In arXiv preprint arXiv:1909.08593, 2019
- Charles Perrow “Normal Accidents: Living with High-Risk Technologies” Princeton, NJ: Princeton University Press, 1984
- Mitchell Rogovin and George T.Frampton Jr. “Three Mile Island: a report to the commissioners and to the public. Volume I”, 1979
- Richard Rhodes “The Making of the Atomic Bomb” New York: Simon & Schuster, 1986
- “Sparks of Artificial General Intelligence: Early experiments with GPT-4” In ArXiv abs/2303.12712, 2023
- Theodore I. Lidsky and Jay S. Schneider “Lead neurotoxicity in children: basic mechanisms and clinical correlates.” In Brain : a journal of neurology 126 Pt 1, 2003, pp. 5–19
- “Asbestos: scientific developments and implications for public policy.” In Science 247 4940, 1990, pp. 294–301
- Kate Moore “The Radium Girls: The Dark Story of America’s Shining Women” Naperville, IL: Sourcebooks, 2017
- Stephen S. Hecht “Tobacco smoke carcinogens and lung cancer.” In Journal of the National Cancer Institute 91 14, 1999, pp. 1194–210
- Mario J. Molina and F.Sherwood Rowland “Stratospheric sink for chlorofluoromethanes: chlorine atomc-atalysed destruction of ozone” In Nature 249, 1974, pp. 810–812
- James H. Kim and Anthony R. Scialli “Thalidomide: the tragedy of birth defects and the effective treatment of disease.” In Toxicological sciences : an official journal of the Society of Toxicology 122 1, 2011, pp. 1–6
- Betul Keles, Niall McCrae and Annmarie Grealish “A systematic review: the influence of social media on depression, anxiety and psychological distress in adolescents” In International Journal of Adolescence and Youth 25, 2019, pp. 79–93
- “The Matter of Heartbleed” In Proceedings of the 2014 Conference on Internet Measurement Conference, 2014
- “Adversarial Policies Beat Professional-Level Go AIs” In ArXiv abs/2211.00241, 2022
- T.R. Laporte and Paula M. Consolini “Working in Practice But Not in Theory: Theoretical Challenges of “High-Reliability Organizations”” In Journal of Public Administration Research and Theory 1, 1991, pp. 19–48
- Thomas G. Dietterich “Robust artificial intelligence and robust human organizations” In Frontiers of Computer Science 13, 2018, pp. 1–3
- Nancy G Leveson “Engineering a safer world: Systems thinking applied to safety” The MIT Press, 2016
- David Manheim “Building a Culture of Safety for AI: Perspectives and Challenges” In SSRN, 2023
- “Lessons Learned from the Fukushima Nuclear Accident for Improving Safety of U.S. Nuclear Plants” Washington, D.C.: National Academies Press, 2014
- Diane Vaughan “The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA” Chicago, IL: University of Chicago Press, 1996
- Dan Lamothe “Air Force Swears: Our Nuke Launch Code Was Never ’00000000”’ In Foreign Policy, 2014
- Toby Ord “The precipice: Existential risk and the future of humanity” Hachette Books, 2020
- U.S.Nuclear Regulatory Commission “Final Safety Culture Policy Statement”, Federal Register, 2011, pp. 34773
- Bruce Schneier “Inside the Twisted Mind of the Security Professional” In Wired, 2008
- “X-Risk Analysis for AI Research” In ArXiv abs/2206.05862, 2022
- CSRC Content Editor “Red Team - Glossary”
- “Confronting Tech Power”, 2023
- Nassim Nicholas Taleb “The Fourth Quadrant: A Map of the Limits of Statistics” Edge, 2008
- “Release strategies and the social impacts of language models” In arXiv preprint arXiv:1908.09203, 2019
- Neal Woollen “Incident Response (Why Planning is Important)”
- “The impact of chief risk officer appointments on firm risk and operational efficiency” In Journal of Operations Management, 2022
- “Role of Internal Audit” URL: https://www.marquette.edu/riskunit/internalaudit/role.shtml
- “Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems” O’Reilly Media, 2020
- Center for Security and Emerging Technology “AI Safety – Emerging Technology Observatory Research Almanac”, 2023
- Donald T Campbell “Assessing the impact of planned social change” In Evaluation and program planning 2.1 Elsevier, 1979, pp. 67–90
- “Dead rats, dopamine, performance metrics, and peacock tails: proxy failure is an inherent risk in goal-oriented systems” In Behavioral and Brain Sciences Cambridge University Press, 2023, pp. 1–68 DOI: 10.1017/S0140525X23002753
- Jonathan Stray “Aligning AI Optimization to Community Well-Being” In International Journal of Community Well-Being, 2020
- “What are you optimizing for? Aligning Recommender Systems with Human Values” In ArXiv abs/2107.10939, 2021
- “Dissecting racial bias in an algorithm used to manage the health of populations” In Science 366, 2019, pp. 447–453
- “Faulty reward functions in the wild”, 2016
- Alexander Pan, Kush Bhatia and Jacob Steinhardt “The effects of reward misspecification: Mapping and mitigating misaligned models” In ICLR, 2022
- “Activation of the human brain by monetary reward” In Neuroreport 8.5, 1997, pp. 1225–1228
- Edmund T. Rolls “The Orbitofrontal Cortex and Reward” In Cerebral Cortex 10.3, 2000, pp. 284–294
- T. Schroeder “Three Faces of Desire”, Philosophy of Mind Series Oxford University Press, USA, 2004
- Joseph Carlsmith “Existential Risk from Power-Seeking AI” In Oxford University Press, 2023
- John Mearsheimer “Structural realism” Oxford University Press, 2007
- “Emergent Tool Use From Multi-Agent Autocurricula” In International Conference on Learning Representations, 2020
- “The Off-Switch Game” In ArXiv abs/1611.08219, 2016
- “Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.” In ICML, 2023
- “Lyndon Baines Johnson” In Oxford Reference, 2016
- “Human-level play in the game of Diplomacy by combining language models with strategic reasoning” In Science 378, 2022, pp. 1067–1074
- “Deep reinforcement learning from human preferences” Discussed in https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity, 2017 arXiv:1706.03741
- “Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning”, 2017 arXiv:1712.05526
- “Benchmarking Neural Network Proxy Robustness to Optimization Pressure”, 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting” In ArXiv abs/2305.04388, 2023
- “Discovering Latent Knowledge in Language Models Without Supervision” In The Eleventh International Conference on Learning Representations, 2023
- “Representation engineering: Understanding and controlling the inner workings of neural networks”, 2023
- “In-context Learning and Induction Heads” In ArXiv abs/2209.11895, 2022
- “Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small” In The Eleventh International Conference on Learning Representations, 2023
- Xinyang Zhang, Zheng Zhang and Ting Wang “Trojaning Language Models for Fun and Profit” In 2021 IEEE European Symposium on Security and Privacy (EuroS&P), 2020, pp. 179–197
- “Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models” In ArXiv abs/2305.14710, 2023
- “Unsolved Problems in ML Safety” In ArXiv abs/2109.13916, 2021
- “LEACE: Perfect linear concept erasure in closed form” In ArXiv abs/2306.03819, 2023
- “The Artificial Moral Advisor. The "Ideal Observer" Meets Artificial Intelligence” In Philosophy & Technology 31.2, 2018, pp. 169–188
- Nick Beckstead “On the overwhelming importance of shaping the far future”, 2013
- Jens Rasmussen “Risk management in a Dynamic Society: A Modeling Problem” In Proceedings of the Conference on Human Interaction with Complex Systems,, 1996
- Jennifer Robertson “Human rights vs. robot rights: Forecasts from Japan” In Critical Asian Studies 46.4 Taylor & Francis, 2014, pp. 571–598
- John Rawls “Political Liberalism” Columbia University Press, 1993
- “The Parliamentary Approach to Moral Uncertainty”, 2021
- “System Safety in Aircraft Acquisition”, 1984