Full-ECE: A Metric For Token-level Calibration on Large Language Models (2406.11345v1)

Published 17 Jun 2024 in cs.CL and cs.AI

Abstract: Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications. LLMs have recently emerged as powerful tools, demonstrating exceptional performance in language tasks. However, traditional calibration metrics such as Expected Calibration Error (ECE) and classwise-ECE (cw-ECE) are inadequate for LLMs due to their vast vocabularies, data complexity, and distributional focus. To address this, we propose a novel calibration concept called full calibration and introduce its corresponding metric, Full-ECE. Full-ECE evaluates the entire predicted probability distribution, offering a more accurate and robust measure of calibration for LLMs.

Authors (5)

Han Liu (340 papers)
Yupeng Zhang (25 papers)
Bingning Wang (29 papers)
Weipeng Chen (56 papers)
Xiaolin Hu (99 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/realmofresearch/status/1804362686194737304

Full-ECE: A Metric For Token-level Calibration on Large Language Models (2406.11345v1)

Summary

Related Papers

Tweets