Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis (1708.07583v2)

Published 25 Aug 2017 in cs.PL

Abstract: Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Eric L. Seidel (7 papers)
  2. Huma Sibghat (1 paper)
  3. Kamalika Chaudhuri (122 papers)
  4. Westley Weimer (18 papers)
  5. Ranjit Jhala (28 papers)
Citations (31)

Summary

We haven't generated a summary for this paper yet.