Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoSlicer: Scalable Automated Data Slicing for ML Model Analysis (2212.09032v1)

Published 18 Dec 2022 in cs.LG and cs.DB

Abstract: Automated slicing aims to identify subsets of evaluation data where a trained model performs anomalously. This is an important problem for machine learning pipelines in production since it plays a key role in model debugging and comparison, as well as the diagnosis of fairness issues. Scalability has become a critical requirement for any automated slicing system due to the large search space of possible slices and the growing scale of data. We present Autoslicer, a scalable system that searches for problematic slices through distributed metric computation and hypothesis testing. We develop an efficient strategy that reduces the search space through pruning and prioritization. In the experiments, we show that our search strategy finds most of the anomalous slices by inspecting a small portion of the search space.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zifan Liu (10 papers)
  2. Evan Rosen (3 papers)
  3. Paul Suganthan G. C (2 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.