On the Societal Impact of Open Foundation Models (2403.07918v1)
Abstract: Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to both their benefits and risks. Open foundation models present significant benefits, with some caveats, that span innovation, competition, the distribution of decision-making power, and transparency. To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk. Across several misuse vectors (e.g. cyberattacks, bioweapons), we find that current research is insufficient to effectively characterize the marginal risk of open foundation models relative to pre-existing technologies. The framework helps explain why the marginal risk is low in some cases, clarifies disagreements about misuse risks by revealing that past work has focused on different subsets of the framework with different assumptions, and articulates a way forward for more constructive debate. Overall, our work helps support a more grounded assessment of the societal impact of open foundation models by outlining what research is needed to empirically validate their theoretical benefits and risks.
- Automating Image Abuse: Deepfake Bots on Telegram. Sensity, October 2020.
- Amos, Z. What Is FraudGPT?, August 2023. URL https://hackernoon.com/what-is-fraudgpt.
- Frontier AI Regulation: Managing Emerging Risks to Public Safety, November 2023. URL http://arxiv.org/abs/2307.03718. arXiv:2307.03718 [cs].
- Apple Support. Safely open apps on your Mac, September 2023. URL https://support.apple.com/en-us/HT202491.
- Identifying and Mitigating the Security Risks of Generative AI. Foundations and Trends® in Privacy and Security, 6(1):1–52, December 2023. ISSN 2474-1558, 2474-1566. doi: 10.1561/3300000041. URL https://www.nowpublishers.com/article/Details/SEC-041. Publisher: Now Publishers, Inc.
- Batalis, S. Can Chatbots Help You Build a Bioweapon?, November 2023. URL https://foreignpolicy.com/2023/11/05/ai-artificial-intelligence-chatbot-bioweapon-virus-bacteria-genetic-engineering/.
- Biderman, S. Good Work Enabled by Open Models, November 2023. URL https://docs.google.com/spreadsheets/d/1kt5jp1U50AfDGEKsqFXvtCCQ8HYvQGcya1vf_OaD8x0/edit?usp=sharing&usp=embed_facebook.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
- Into the LAIONs Den: Investigating Hate in Multimodal Datasets, November 2023. URL http://arxiv.org/abs/2311.03449. arXiv:2311.03449 [cs].
- Gpt-neox-20b: An open-source autoregressive language model. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136, 2022.
- Letter on the leak of meta’s ai model. US Senate, 2023a. URL https://www.blumenthal.senate.gov/imo/media/doc/06062023metallamamodelleakletter.pdf.
- Bipartisan Framework for U.S. AI Act, September 2023b. URL https://www.blumenthal.senate.gov/imo/media/doc/09072023bipartisanaiframework.pdf.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In Advances in Neural Information Processing Systems, 2022.
- Considerations for Governing Open Foundation Models, December 2023a. URL https://hai.stanford.edu/issue-brief-considerations-governing-open-foundation-models.
- The foundation model transparency index, 2023b.
- Ecosystem graphs: The social footprint of foundation models. ArXiv, abs/2303.15772, 2023c. URL https://api.semanticscholar.org/CorpusID:257771875.
- AI might help unleash the next pandemic, MIT study says - The Boston Globe, 2023. URL https://www.bostonglobe.com/2023/06/28/business/ai-might-help-unleash-next-pandemic-mit-study-says/.
- Broughton, A. Tennessee man charged in ’virtual pornography’ case, June 2009. URL https://edition.cnn.com/2009/CRIME/06/24/virtual.child.porn/index.html.
- The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, February 2018. URL http://arxiv.org/abs/1802.07228. arXiv:1802.07228 [cs].
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pp. 77–91, 2018.
- Cañas, J. Illegal use of artificial intelligence to create deepfakes depicting sexual content of minors. European Parliament, 2023. URL https://www.europarl.europa.eu/doceo/document/E-9-2023-002788_EN.html.
- Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security, July 2018. URL https://papers.ssrn.com/abstract=3213954.
- Chinese National Information Security Standardization Technical Committee. Basic Safety Requirements for Generative Artificial Intelligence Services (Draft for Feedback), November 2023. URL https://cset.georgetown.edu/publication/china-safety-requirements-for-generative-ai/.
- Tools for verifying neural models’ training data. arXiv preprint arXiv:2307.00682, 2023.
- Cihon, P. How to get AI regulation right for open source, July 2023. URL https://github.blog/2023-07-26-how-to-get-ai-regulation-right-for-open-source/.
- Cole, S. AI-Assisted Fake Porn Is Here and We’re All Fucked, December 2017. URL https://www.vice.com/en/article/gydydm/gal-gadot-fake-ai-porn.
- CraigMarcho. IE7 - Introducing the Phishing Filter, March 2007. URL https://techcommunity.microsoft.com/t5/ask-the-performance-team/ie7-introducing-the-phishing-filter/ba-p/372327.
- Supporting Open Source and Open Science in the EU AI Act. Technical report, July 2023. URL https://github.blog/wp-content/uploads/2023/07/Supporting-Open-Source-and-Open-Science-in-the-EU-AI-Act.pdf.
- Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 11:70977–71002, 2023. doi: 10.1109/ACCESS.2023.3294090.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
- Drake, V. Threat Modeling, August 2021. URL https://owasp.org/www-community/Threat_Modeling.
- Executive Office of the President. Safe, secure, and trustworthy development and use of artificial intelligence. Executive Order, 10 2023. URL https://www.federalregister.gov/documents/2023/10/30/2023-24110/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence. Federal Register Vol. 88, No. 210 (October 30, 2023).
- Flavio Ceolin. Two buffer overflow vulnerabilities in Zephyr USB code, 2023. URL https://github.com/zephyrproject-rtos/zephyr/security/advisories/GHSA-4vgv-5r6q-r6xh.
- OPTQ: Accurate Quantization for Generative Pre-trained Transformers. 2023. URL https://openreview.net/forum?id=tcbBPnfwxS.
- G7 Hiroshima Summit. Hiroshima Process International Guiding Principles for Advanced AI system. Technical report, October 2023. URL https://www.mofa.go.jp/files/100573471.pdf.
- Google. Email sender guidelines, December 2023. URL https://support.google.com/mail/answer/81126?hl=en.
- Will releasing the weights of future large language models grant widespread access to pandemic agents?, November 2023. URL http://arxiv.org/abs/2310.18233. arXiv:2310.18233 [cs].
- Moderating Model Marketplaces: Platform Governance Puzzles for AI Intermediaries, November 2023. URL http://arxiv.org/abs/2311.12573. arXiv:2311.12573 [cs].
- Ai regulation has its own alignment problem: The technical and institutional feasibility of disclosure, registration, licensing, and auditing. George Washington Law Review, Symposium on Legally Disruptive Emerging Technologies, 2023.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Hao, K. Deepfake porn is ruining women’s lives. Now the law may finally ban it., February 2021. URL https://www.technologyreview.com/2021/02/12/1018222/deepfake-revenge-porn-coming-ban/.
- Cleaning up chatgpt takes heavy toll on human workers. The Wall Street Journal, July 2023. URL https://www.wsj.com/articles/chatgpt-openai-content-abusive-sexually-explicit-harassment-kenya-workers-on-human-workers-cf191483. Photographs by Natalia Jidovanu.
- Harwell, D. AI-generated child sex images spawn new nightmare for the web. Washington Post, June 2023. ISSN 0190-8286. URL https://www.washingtonpost.com/technology/2023/06/19/artificial-intelligence-child-sex-abuse-images/.
- Hazell, J. Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns, May 2023. URL http://arxiv.org/abs/2305.06972. arXiv:2305.06972 [cs].
- Foundation models and fair use. arXiv preprint arXiv:2303.15715, 2023.
- Illinois General Assembly. Full Text of HB2123, July 2023. URL https://www.ilga.gov/legislation/fulltext.asp?DocName=&SessionId=112&GA=103&DocTypeId=HB&DocNum=2123&GAID=17&LegID=145586&SpecSess=&Session=.
- Preventing Generation of Verbatim Memorization in Language Models Gives a False Sense of Privacy. In Keet, C. M., Lee, H.-Y., and Zarrieß, S. (eds.), Proceedings of the 16th International Natural Language Generation Conference, pp. 28–53, Prague, Czechia, September 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.inlg-main.3. URL https://aclanthology.org/2023.inlg-main.3.
- Joshi, S. They Follow You on Instagram, Then Use Your Face To Make Deepfake Porn in This Sex Extortion Scam, September 2021. URL https://www.vice.com/en/article/z3x9yj/india-instagram-sextortion-phishing-deepfake-porn-scam.
- OpenAI’s policies hinder reproducible research on language models, March 2023. URL https://www.aisnakeoil.com/p/openais-policies-hinder-reproducible.
- Promises and pitfalls of artificial intelligence for legal applications, January 2024. URL http://arxiv.org/abs/2402.01656. arXiv:2402.01656 [cs].
- Kaspersky. How real is deepfake threat? Kaspersky Daily, 2023.
- Metasploit: The Penetration Tester’s Guide. No Starch Press, USA, 1st edition, 2011. ISBN 159327288X.
- A Watermark for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning, pp. 17061–17084. PMLR, July 2023. URL https://proceedings.mlr.press/v202/kirchenbauer23a.html. ISSN: 2640-3498.
- Algorithmic monoculture and social welfare. Proceedings of the National Academy of Sciences, 118(22):e2018340118, 2021. doi: 10.1073/pnas.2018340118. URL https://www.pnas.org/doi/abs/10.1073/pnas.2018340118.
- Spectre Attacks: Exploiting Speculative Execution, January 2018. URL http://arxiv.org/abs/1801.01203. arXiv:1801.01203 [cs].
- Kocsis, E. Deepfakes, Shallowfakes, and the Need for a Private Right of Action Comments. PennState Law Review, 126(2):621–650, 2021. URL https://heinonline.org/HOL/P?h=hein.journals/dknslr126&i=633.
- Control Systems Cyber Security:Defense in Depth Strategies. Technical Report INL/EXT-06-11478, Idaho National Lab. (INL), Idaho Falls, ID (United States), May 2006. URL https://www.osti.gov/biblio/911553.
- Lakatos, S. A Revealing Picture: AI-Generated ‘Undressing’ Images Move from Niche Pornography Discussion Forums to a Scaled and Monetized Online Business. Technical report, December 2023. URL https://public-assets.graphika.com/reports/graphika-report-a-revealing-picture.pdf.
- Lambda. GPU Cloud - VMs for Deep Learning, February 2024. URL https://web.archive.org//web/20240226155153/https://lambdalabs.com/service/gpu-cloud#pricing.
- Lazar, S. Governing the algorithmic city. Tanner Lectures, 2023. URL https://write.as/sethlazar/.
- Bloom: A 176b-parameter open-access multilingual language model. 2022. doi: 10.48550/ARXIV.2211.05100. URL https://arxiv.org/abs/2211.05100.
- Do language models plagiarize? arXiv preprint arXiv:2203.07618, 2022.
- Talkin”bout ai generation: Copyright and the generative-ai supply chain. arXiv preprint arXiv:2309.08133, 2023.
- Structural similarities between language models and neural response measurements, 2023.
- The time is now to develop community norms for the release of foundation models. City, 2022a.
- The time is now to develop community norms for the release of foundation models, 2022b. URL https://crfm.stanford.edu/2022/05/17/community-norms.html.
- Meltdown, January 2018. URL http://arxiv.org/abs/1801.01207. arXiv:1801.01207 [cs].
- AI-Powered Fuzzing: Breaking the Bug Hunting Barrier, 2023. URL https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html.
- Llach, L. Naked deepfake images of teenage girls shock spanish town: But is it an ai crime? EuroNews, 2023. URL https://www.euronews.com/next/2023/09/24/spanish-teens-received-deepfake-ai-nudes-of-themselves-but-is-it-a-crime.
- The data provenance initiative: A large scale audit of dataset licensing & attribution in ai. arXiv preprint arXiv:2310.16787, 2023a.
- A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169, 2023b.
- Letter to the National Institute of Standards and Technology Director Laurie Locascio from the House Committee on Science, Space, and Technology, December 2023. URL https://republicans-science.house.gov/_cache/files/8/a/8a9f893d-858a-419f-9904-52163f22be71/191E586AF744B32E6831A248CD7F4D41.2023-12-14-aisi-scientific-merit-final-signed.pdf.
- Maiberg, E. Civitai and OctoML Introduce Radical New Measures to Stop Abuse After 404 Media Investigation, December 2023a. URL https://www.404media.co/civitai-and-octoml-introduce-radical-new-measures-to-stop-abuse-after-404-media-investigation/.
- Maiberg, E. Inside the AI Porn Marketplace Where Everything and Everyone Is for Sale, August 2023b. URL https://www.404media.co/inside-the-ai-porn-marketplace-where-everything-and-everyone-is-for-sale/.
- Matthews, D. Scientists grapple with risk of artificial intelligence-created pandemics, 2023. URL https://sciencebusiness.net/news/ai/scientists-grapple-risk-artificial-intelligence-created-pandemics.
- MITRE. Adversarial threat landscape for ai systems, 2021. URL https://atlas.mitre.org/.
- Mortimer, S. StopNCII.org has launched, December 2021. URL https://revengepornhelpline.org.uk/news/stopncii-org-has-launched/.
- The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study. Technical report, RAND Corporation, January 2024. URL https://www.rand.org/pubs/research_reports/RRA2977-2.html.
- Mozilla. Joint Statement on AI Safety and Openness, October 2023. URL https://open.mozilla.org/letter/.
- Musser, M. A Cost Analysis of Generative Language Models and Influence Operations, August 2023. URL http://arxiv.org/abs/2308.03740. arXiv:2308.03740 [cs].
- Generative ai companies must publish transparency reports, 2023a. URL https://knightcolumbia.org/blog/generative-ai-companies-must-publish-transparency-reports.
- GPT-4 and professional benchmarks: the wrong answer to the wrong question, March 2023b. URL https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmarks.
- Model alignment protects against accidental harms, not intentional ones, December 2023. URL https://www.aisnakeoil.com/p/model-alignment-protects-against.
- Dual use foundation artificial intelligence models with widely available model weights. Request for Comment, 02 2024. URL https://www.ntia.gov/federal-register-notice/2024/dual-use-foundation-artificial-intelligence-models-widely-available. Federal Register.
- Noble, S. U. Algorithms of Oppression. New York University Press, 2018.
- OpenAI. Better language models and their implications, 2019a. URL https://openai.com/research/better-language-models.
- OpenAI. GPT-2: 1.5B release, 2019b. URL https://openai.com/research/gpt-2-1-5b-release.
- Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning, July 2019. URL http://arxiv.org/abs/1907.11274. arXiv:1907.11274 [cs].
- Does writing with language models reduce content diversity?, 2023.
- Deepfakes and Cheap Fakes, September 2019. URL https://datasociety.net/library/deepfakes-and-cheap-fakes/. Publisher: Data & Society Research Institute.
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, pp. 1–18, New York, NY, USA, October 2022. Association for Computing Machinery. ISBN 978-1-4503-9320-1. doi: 10.1145/3526113.3545616. URL https://dl.acm.org/doi/10.1145/3526113.3545616.
- Can sensitive information be deleted from llms? objectives for defending against extraction attacks. arXiv preprint arXiv:2309.17410, 2023.
- Building an early warning system for LLM-aided biological threat creation, January 2024. URL https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation.
- Perrigo, B. Openai used kenyan workers on less than 2 per hour to make chatgpt less toxic, Jan 2023. URL https://time.com/6247678/openai-chatgpt-kenya-workers/.
- Typhoon: Thai Large Language Models, December 2023. URL http://arxiv.org/abs/2312.13951. arXiv:2312.13951 [cs].
- Fine-tuning aligned language models compromises safety, even when users do not intend to!, 2023.
- Quintero, B. Introducing virustotal code insight: Empowering threat analysis with generative ai, apr 2023. URL https://blog.virustotal.com/2023/04/introducing-virustotal-code-insight.html.
- Improving language understanding by generative pre-training. Technical report, OpenAI, 2018.
- Language models are unsupervised multitask learners. 2019a.
- Language Models are Unsupervised Multitask Learners. 2019b. URL https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe.
- Raffel, C. Building Machine Learning Models Like Open Source Software, February 2023. URL https://cacm.acm.org/magazines/2023/2/268952-building-machine-learning-models-like-open-source-software/fulltext.
- Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, pp. 429–435, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450363242. doi: 10.1145/3306618.3314244. URL https://doi.org/10.1145/3306618.3314244.
- Reid, S. The Deepfake Dilemma: Reconciling Privacy and First Amendment Protections, June 2020. URL https://papers.ssrn.com/abstract=3636464.
- Roberts, J. J. Fake Porn Videos Are Terrorizing Women. Do We Need a Law to Stop Them?, January 2019. URL https://fortune.com/2019/01/15/deepfakes-law/.
- Saliba, E. Sharing deepfake pornography could soon be illegal in America, June 2023. URL https://abcnews.go.com/Politics/sharing-deepfake-pornography-illegal-america/story?id=99084399.
- Sandbrink, J. B. Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools, 2023.
- Sastry, G. Beyond “release” vs. “not release”, 2021. URL https://crfm.stanford.edu/commentary/2021/10/18/sastry.html.
- Satter, R. FBI says artificial intelligence being used for ’sextortion’ and harassment. Reuters, June 2023. URL https://www.reuters.com/world/us/fbi-says-artificial-intelligence-being-used-sextortion-harassment-2023-06-07/.
- Scott, D. Deepfake Porn Nearly Ruined My Life, February 2020. URL https://www.elle.com/uk/life-and-culture/a30748079/deepfake-porn/. Section: Life + Culture.
- Seaman, J. Cyber threat prediction and modelling. In Montasari, R. (ed.), Artificial Intelligence and National Security. Springer, Cham, 2022. doi: 10.1007/978-3-031-06709-9˙7.
- Open-sourcing highly capable foundation models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives. 2023.
- Service, R. F. Could chatbots help devise the next pandemic virus?, 2023. URL https://www.science.org/content/article/could-chatbots-help-devise-next-pandemic-virus.
- Shevlane, T. Structured access: an emerging paradigm for safe ai deployment. arXiv preprint arXiv:2201.05159, 2022.
- The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES ’20, pp. 173–179, New York, NY, USA, February 2020. Association for Computing Machinery. ISBN 978-1-4503-7110-0. doi: 10.1145/3375627.3375815. URL https://doi.org/10.1145/3375627.3375815.
- Shostack, A. Threat Modeling: Designing for Security. John Wiley & Sons, Feb 2014. ISBN 978-1-118-80999-0. The only security book to be chosen as a Dr. Dobbs Jolt Award Finalist since Bruce Schneier’s Secrets and Lies and Applied Cryptography!
- Siddique, H. Sharing deepfake intimate images to be criminalised in England and Wales. The Guardian, June 2023. ISSN 0261-3077. URL https://www.theguardian.com/society/2023/jun/27/sharing-deepfake-intimate-images-to-be-criminalised-in-england-and-wales.
- Can large language models democratize access to dual-use biotechnology?, 2023.
- Solaiman, I. The gradient of generative ai release: Methods and considerations. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 111–122, 2023.
- Release Strategies and the Social Impacts of Language Models, November 2019a. URL http://arxiv.org/abs/1908.09203. arXiv:1908.09203 [cs].
- Release strategies and the social impacts of language models. ArXiv, abs/1908.09203, 2019b.
- Solsman, J. E. A deepfake bot is creating nudes out of regular photos, October 2020. URL https://www.cnet.com/news/privacy/deepfake-bot-on-telegram-is-violating-women-by-forging-nudes-from-regular-pics/.
- Staff, A. Meltdown and Spectre: Here’s what Intel, Apple, Microsoft, others are doing about it, January 2018. URL https://arstechnica.com/gadgets/2018/01/meltdown-and-spectre-heres-what-intel-apple-microsoft-others-are-doing-about-it/.
- Sweeney, L. Discrimination in online ad delivery. Queue, 11(3):10:10–10:29, March 2013. ISSN 1542-7730. doi: 10.1145/2460276.2460278. URL http://doi.acm.org/10.1145/2460276.2460278.
- Fuzzing for Software Security. Artech House Publishers, Norwood, MA, 1st edition edition, July 2008. ISBN 978-1-59693-214-2.
- Online Consent Moderation, December 2020. URL https://cyber.fsi.stanford.edu/io/news/ncii-legislation-limitations.
- Generative ML and CSAM: Implications and Mitigations. 2023. doi: 10.25740/jv206yg3793. URL https://purl.stanford.edu/jv206yg3793.
- Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention. arXiv preprint arXiv:2310.00535, 2023.
- Generative AI could revolutionize health care — but not if control is ceded to big tech. Nature, 624(7990):36–38, December 2023. doi: 10.1038/d41586-023-03803-y. URL https://www.nature.com/articles/d41586-023-03803-y. Bandiera_abtest: a Cg_type: Comment Number: 7990 Publisher: Nature Publishing Group Subject_term: Machine learning, Health care, Medical research, Technology.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- UK AISI. Introducing the ai safety institute, 2023. URL https://www.gov.uk/government/publications/ai-safety-institute-overview/introducing-the-ai-safety-institute.
- UK CMA. Ai foundation models: Initial report, 2023. URL https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf.
- Vincent, J. Meta’s powerful AI language model has leaked online — what happens now?, March 2023. URL https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse.
- Market concentration implications of foundation models: The invisible hand of chatgpt. The Brookings Institution, 2023. URL https://www.brookings.edu/articles/market-concentration-implications-of-foundation-models-the-invisible-hand-of-chatgpt.
- Directional Bias Amplification. In Proceedings of the 38th International Conference on Machine Learning, pp. 10882–10893. PMLR, July 2021. URL https://proceedings.mlr.press/v139/wang21t.html. ISSN: 2640-3498.
- Gpt-j-6b: A 6 billion parameter autoregressive language model, 2021.
- Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. In The Eleventh International Conference on Learning Representations, 2022.
- Limits and Possibilities for “Ethical AI” in Open Source: A Study of Deepfakes. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pp. 2035–2046, New York, NY, USA, June 2022. Association for Computing Machinery. ISBN 978-1-4503-9352-2. doi: 10.1145/3531146.3533779. URL https://dl.acm.org/doi/10.1145/3531146.3533779.
- Open (for business): Big tech, concentrated power, and the political economy of open ai. 2023.
- Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning, October 2023. URL http://arxiv.org/abs/2310.06694. arXiv:2310.06694 [cs].
- Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949, 2023.
- Defending against neural fake news. In Advances in Neural Information Processing Systems (NeurIPS), pp. 9054–9065, 2019.
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, June 2023. URL http://arxiv.org/abs/2303.16199. arXiv:2303.16199 [cs].
- OPT: Open pre-trained transformer language models. arXiv, 2022.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.