Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Characterizing Bugs in Python and R Data Analytics Programs (2306.08632v1)

Published 14 Jun 2023 in cs.SE

Abstract: R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of bugs are common? What are the main root causes? What is the relation between bugs and root causes? How to mitigate these bugs? We present a comprehensive study of 5,068 Stack Overflow posts, 1,800 bug fix commits from GitHub repositories, and several GitHub issues of the most used libraries to understand bugs in R and Python. Our key findings include: while both R and Python have bugs due to inexperience with data analysis, Python see significantly larger data preprocessing bugs compared to R. Developers experience significantly more data flow bugs in R because intermediate results are often implicit. We also found changes and bugs in packages and libraries cause more bugs in R compared to Python while package or library misselection and conflicts cause more bugs in Python than R. While R has a slightly higher readability barrier for data analysts, the statistical power of R leads to a less number of bad performance bugs. In terms of data visualization, R packages have significantly more bugs than Python libraries. We also identified a strong correlation between comparable packages in R and Python despite their linguistic and methodological differences. Lastly, we contribute a large dataset of manually verified R and Python bugs.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube