Emergent Mind

A Safe Harbor for AI Evaluation and Red Teaming

(2403.04893)
Published Mar 7, 2024 in cs.AI

Abstract

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying public interest safety research and protecting it from the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.

Overview

  • The paper proposes establishing legal and technical safe harbors for researchers to conduct safety and security studies on generative AI systems without facing legal or account-related consequences.

  • It emphasizes the need for independent evaluation of AI systems to identify risks such as bias, privacy breaches, disinformation, and copyright infringement, which are currently hampered by AI companies' restrictive practices.

  • The proposed legal safe harbor would protect researchers engaged in good-faith vulnerability testing from legal retaliation, akin to practices in cybersecurity research.

  • Technical safe harbor involves creating allowances for researchers to prevent account suspension or technical enforcement, possibly through verification and endorsement by third parties like universities or nonprofits.

Proposing Safe Harbors for AI Evaluation and Red Teaming

Introduction to the Proposal

With the rapid deployment of generative AI systems and their significant societal impact, there is an urgent need for independent evaluation and red-team exercises to ensure the safety, security, and trustworthiness of these systems. Current practices and terms of service by AI companies, however, pose significant barriers to such essential research activities. To address these issues, this paper proposes the establishment of legal and technical safe harbors for public interest safety research on generative AI systems. These safe harbors aim to indemnify and technically protect researchers from potential legal and account-related repercussions that may arise from their work in identifying risks and vulnerabilities in AI systems.

The Necessity for Independent Evaluation

Generative AI systems have become pervasive, presenting risks ranging from bias and privacy breaches to disinformation and copyright infringement. Despite these concerns, transparent and independent evaluations of these systems remain scarce due to the limited access provided by AI companies and the fear of legal consequences among researchers. This lack of independent research into AI systems' safety and vulnerabilities undermines public trust and hampers efforts to mitigate potential harms.

Legal and Technical Barriers to Research

AI companies' terms of service and usage policies restrict unauthorized interactions with their systems that may be crucial for identifying vulnerabilities and risks. These restrictions serve to curb misuse but inadvertently also disincentivize and criminalize necessary safety and security research. Researchers face a dilemma: the essential activities for ensuring AI safety might violate terms of service, risking legal action or account suspension. This environment has created a chilling effect, stifling the advancement and dissemination of knowledge about AI systems' safety and limitations.

Proposed Safe Harbors

This paper advocates for two primary interventions by AI companies:

  • Legal Safe Harbor: Legally protect researchers engaging in good-faith efforts to test and identify vulnerabilities in AI systems. Such a safe harbor would align with established practices in cybersecurity research, providing a framework within which researchers can work without fear of legal retaliation.
  • Technical Safe Harbor: Establish mechanisms to protect researchers from account suspension or other forms of technical enforcement that could hinder their ability to conduct safety research. This could involve creating specific allowances or exemptions for verified researchers within the companies' enforcement protocols.

Implementation and the Role of Independent Third Parties

Realizing these safe harbors requires careful consideration to prevent misuse. The paper suggests involving reputable third parties such as universities or nonprofit organizations to vet and endorse researchers, ensuring that only qualified individuals benefit from these protections. This approach could facilitate broader, more inclusive research efforts without compromising the integrity and security of AI systems.

Conclusion and Call for Action

By instituting legal and technical safe harbors, AI companies can foster a more collaborative and transparent research environment. Such steps would not only enhance the safety and reliability of AI systems but also reinforce public trust in these technologies. The proposal represents a necessary evolution in the governance of AI research, encouraging a proactive approach to understanding and mitigating the risks associated with generative AI systems.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.