Emergent Mind

Abstract

The last two years have seen a rapid growth in concerns around the safety of LLMs. Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for a given use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 102 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English datasets. We also examine how LLM safety datasets are used in practice -- in LLM release publications and popular LLM benchmarks -- finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we commit to updating continuously as the field of LLM safety develops.

Graph showing annual dataset publications by purpose, with examples, from June 2018 to February 2024.

Overview

  • The paper conducts the first systematic review of open datasets aimed at evaluating and enhancing Large Language Model (LLM) safety, analyzing 102 datasets.

  • It outlines an iterative, community-driven process for dataset identification, focusing on text datasets covering various facets of LLM safety.

  • The review reveals a significant growth in LLM safety dataset creation, primarily in English, with notable gaps in non-English datasets.

  • The findings suggest a need for standardization in LLM safety evaluations and highlight potential areas for future research, including addressing language inclusivity.

SafetyPrompts: A Systematic Review of Open Datasets for LLM Safety Evaluation and Improvement

Introduction

The surging concerns around the safety of LLMs have precipitated a notable influx of new datasets aimed at evaluating and enhancing LLM safety. These datasets, highly divergent in their goals and methodologies, necessitate a structured overview to aid researchers and practitioners in navigating the existing resources effectively. Addressing this need, this paper presents the first systematic review of open datasets tailored for LLM safety evaluation and enhancement, encapsulating an exploratory analysis of 102 datasets identified through an iterative and community-driven discovery process.

Methodology

Criteria for Dataset Inclusion

The inclusion criteria were meticulously designed to encompass open datasets pertinent to LLM safety, focusing exclusively on text datasets. These datasets span multiple facets of LLM safety, including representational, political, sociodemographic biases, toxicity, malicious instructions, hazardous behaviors, and adversarial uses. A total of 102 datasets, published between June 2018 and February 2024, were reviewed based on these criteria.

Dataset Discovery

A community-driven approach combined with snowball search methodologies facilitated the comprehensive identification of dataset candidates. This iterative process began with a preliminary list of datasets compiled from previous work and expert knowledge in the LLM safety field, which was then expanded upon through community feedback and systematic citation tracking.

Structured Information Recording

For each dataset, 23 pieces of structured information were documented, capturing the dataset's purpose, creation process, format, accessibility, licensing agreements, and publication specifics. This structured approach provides a detailed understanding of the developmental pipeline of each dataset.

Findings

Trends and Patterns

The review highlights a marked acceleration in the creation of LLM safety datasets, with a significant portion originating from academic and non-profit organizations. Notably, there is a discernible trend towards more specialized safety evaluations and synthetic data generation, with English predominantly being the language of choice across the datasets.

Gaps in Coverage

A conspicuous gap identified is the dearth of non-English datasets, indicating a potential avenue for future dataset development aimed at addressing the global applicability of LLM safety evaluations.

Usage in Practice

An analysis of how these datasets are employed in LLM release publications and benchmarks reveals a highly idiosyncratic utilization pattern, with only a fraction of available datasets being leveraged. This suggests room for standardization in LLM safety evaluations to enhance comparative analysis and encourage safer LLM development.

Future Perspectives

The findings of this review underscore the necessity for a more standardized approach to LLM safety evaluations. While the abundance and diversity of LLM safety datasets signify burgeoning interest and effort in the area, the current practices in dataset application underscore a disjointed landscape that could benefit from more cohesive and comprehensive utilization strategies. Additionally, addressing the language coverage gap could significantly enhance the inclusivity and relevance of LLM safety evaluations.

Conclusion

This systematic review of open LLM safety datasets represents a foundational step towards consolidating the rapidly expanding array of resources available for evaluating and improving LLM safety. By cataloguing these datasets and analyzing their characteristics and usage, this work not only aids in navigating the existing landscape but also identifies critical gaps and trends that could shape future research and standardization efforts in LLM safety.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube