Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers (2406.14325v2)

Published 20 Jun 2024 in cs.SE, cs.IR, and cs.LG

Abstract: Research in various fields is currently experiencing challenges regarding the reproducibility of results. This problem is also prevalent in ML research. The issue arises, for example, due to unpublished data and/or source code and the sensitivity of ML training conditions. Although different solutions have been proposed to address this issue, such as using ML platforms, the level of reproducibility in ML-driven research remains unsatisfactory. Therefore, in this article, we discuss the reproducibility of ML-driven research with three main aims: (i) identifying the barriers to reproducibility when applying ML in research as well as categorize the barriers to different types of reproducibility (description, code, data, and experiment reproducibility), (ii) discussing potential drivers such as tools, practices, and interventions that support ML reproducibility, as well as distinguish between technology-driven drivers, procedural drivers, and drivers related to awareness and education, and (iii) mapping the drivers to the barriers. With this work, we hope to provide insights and to contribute to the decision-making process regarding the adoption of different solutions to support ML reproducibility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Harald Semmelrock (2 papers)
  2. Tony Ross-Hellauer (2 papers)
  3. Simone Kopeinik (10 papers)
  4. Dieter Theiler (7 papers)
  5. Armin Haberl (2 papers)
  6. Stefan Thalmann (2 papers)
  7. Dominik Kowald (58 papers)

Summary

We haven't generated a summary for this paper yet.