Emergent Mind

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

(2405.10632)
Published May 17, 2024 in cs.CY , cs.AI , and cs.HC

Abstract

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

Taxonomy of human-LLM interaction modes from instructions to goal-oriented or open-ended tasks.

Overview

  • The paper highlights the inadequacies of current AI evaluations that focus on controlled conditions and proposes Human Interaction Evaluations (HIEs) to better capture the intricacies of human-AI interactions in real-world scenarios.

  • Human Interaction Evaluations can offer richer and more generalizable data by including human users, assessing the immediate and societal impacts of AI, and tackling the sociotechnical gaps not covered by traditional static evaluations.

  • A three-stage framework for HIEs is introduced, involving the identification of risk areas, characterizing the use context, and choosing appropriate evaluation parameters, illustrated with examples on overreliance and persuasion risks.

Understanding the Importance of Human Interaction Evaluations for AI Models

Background and Context

When we talk about evaluating AI models, we're typically thinking of how they perform in clinical, controlled conditions—like running a car engine in a lab rather than on a busy highway. Evaluations usually focus on how well these models handle isolated tasks such as answering questions directly or identifying objects in images. But what about when the rubber hits the road? Or in this case, when the model starts interacting with people in real-world applications?

The paper we're diving into discusses a gap in current AI evaluations and proposes a new paradigm to address it: Human Interaction Evaluations (HIEs). The authors argue that while current evaluations are informative, they fall short in capturing the intricacies of human-AI interactions. They aim to fill this void by introducing a framework for HIEs that specifically targets human-LLM (Large Language Model) interactions.

The Case for Human Interaction Evaluations

Defining HIEs

The term "Human Interaction Evaluations" might sound technical, but it's essentially about assessing how well AI models work when real humans are involved. This includes not just whether the models perform well in controlled conditions, but how they fare in the messy, unpredictable real world. The paper describes different ways HIEs can bring new insights:

  • Increasing Evaluation Validity: By including human users, HIEs offer richer data and context, ultimately leading to more accurate and generalizable evaluations.
  • Assessing Direct Human Impact: Unlike traditional evaluations, HIEs can assess the immediate effects of AI interactions on people—whether it's changing their beliefs, affecting their decisions, or even just causing harm.
  • Guiding Societal Impact Assessments: By understanding individual-level impacts, we can better anticipate societal implications, helping to shape policies and regulations that mitigate AI risks.

Why Current Evaluations Fall Short

Traditional AI evaluations focus heavily on static benchmarks, checking for biases, harmful outputs, or other risks from a model in isolation. But this doesn't cover the "sociotechnical gap," which occurs because:

  1. Joint Performance Gaps: Many AI applications require human interaction, but most benchmarks do not account for this.
  2. Evaluation Task Misalignment: Real-world tasks often differ significantly from benchmark tasks.
  3. Human Impact: Static evaluations can't fully explore how AI affects its users.

A Framework for Conducting HIEs

The authors present a three-stage framework for designing HIEs that can help researchers more effectively evaluate AI models' safety and performance in real-world scenarios.

Stage 1: Identifying the Risk and/or Harm Area

The first step is to clearly define the real-world problems you want to address—whether it’s biases in the hiring process or persuasion risks in political opinion shaping. The paper categorizes risks into three types:

  • Absolute Risks: Directly evaluating the chances and severity of harm from the AI model.
  • Marginal Risks: Comparing the risks from the AI model to some baseline (e.g., human decision-making).
  • Residual Risks: Assessing remaining risks after safety mitigations.

Stage 2: Characterizing the Use Context

Once you know the risk area, the next step is to set up a context for evaluation that closely mirrors real-world usage:

  • Harmful Use Scenarios: Define whether the risk comes from misuse, unintended personal impact, or unintended external impact.
  • User, Model, and System Dimensions: Consider who the users are (e.g., technical literacy), details about the model (e.g., size, datasets), and system architecture (e.g., supporting tools).
  • Interaction Modes and Tasks: Define how the human and model will interact. This could be collaboration, direction, assistance, cooperation, or exploration.

Stage 3: Choosing Evaluation Parameters

The final step involves selecting the evaluation targets and metrics:

  • Evaluation Target: Decide whether to focus on the interaction process or the outcome.
  • Metrics: Use both subjective metrics (e.g., user satisfaction) and objective metrics (e.g., task accuracy) for comprehensive insights.

Example Evaluations

To make things concrete, the paper provides two detailed examples:

  • Overreliance Risks: Examines how hiring managers use AI for decision-making and whether it introduces an overreliance on model output.
  • Persuasion Risks: Looks at how AI can amplify the persuasive power of messages in political opinion pieces.

Both cases illustrate how detailed planning and context-specific strategies can lead to useful, actionable insights.

Practical Implications and Future Directions

The introduction of HIEs marks an important shift in how we evaluate AI safety and effectiveness. By simulating real-world interactions, these evaluations can highlight previously unseen risks and inform better design and regulatory practices.

Recommendations for the Field

  • Invest in HIE Development: More funds and efforts should go into creating and refining HIEs.
  • Leverage Established Methods: Utilize best practices from fields like Human-Computer Interaction (HCI) and experimental psychology to develop rigorous evaluations.
  • Broaden Representation: Ensure diverse user groups are included to make evaluations more representative.
  • Address Ethical Concerns: Careful design can mitigate ethical issues, such as ensuring participants are not exposed to harmful content unnecessarily.

Conclusion

Human Interaction Evaluations offer a promising way to bridge the gap between how AI models perform in isolation and their real-world applications. By incorporating the complexity of human interactions, these evaluations can provide a more holistic view of AI safety and impact, ultimately leading to better, safer AI systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.