Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 152 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It (2403.14715v3)

Published 19 Mar 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Label smoothing (LS) is a popular regularisation method for training neural networks as it is effective in improving test accuracy and is simple to implement. Hard'' one-hot labels aresmoothed'' by uniformly distributing probability mass to other classes, reducing overfitting. Prior work has suggested that in some cases LS can degrade selective classification (SC) -- where the aim is to reject misclassifications using a model's uncertainty. In this work, we first demonstrate empirically across an extended range of large-scale tasks and architectures that LS consistently degrades SC. We then address a gap in existing knowledge, providing an explanation for this behaviour by analysing logit-level gradients: LS degrades the uncertainty rank ordering of correct vs incorrect predictions by suppressing the max logit more when a prediction is likely to be correct, and less when it is likely to be wrong. This elucidates previously reported experimental results where strong classifiers underperform in SC. We then demonstrate the empirical effectiveness of post-hoc logit normalisation for recovering lost SC performance caused by LS. Furthermore, linking back to our gradient analysis, we again provide an explanation for why such normalisation is effective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. NPJ Digital Medicine (2021)
  2. In: ICCV (2021)
  3. arXiv preprint arXiv:2311.01434 (2023)
  4. ArXiv abs/2305.15508 (2023)
  5. In: ICLR (2023)
  6. In: ICML (2022)
  7. In: ECCV (2018)
  8. In: ICLR (2022)
  9. Chow, C.K.: An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers (1957)
  10. ArXiv abs/2003.03879 (2020)
  11. In: NeurIPS (2019)
  12. CVPR (2016)
  13. In: ICLR (2021)
  14. Journal of Machine Learning Research (2010)
  15. In: BMVC (2021)
  16. In: BMVC (2022)
  17. Gal, Y.: Uncertainty in Deep Learning. Ph.D. thesis, University of Cambridge (2016)
  18. In: IJCNLP (2020)
  19. In: NeurIPS (2017)
  20. In: ICML (2019)
  21. In: ECCV (2022)
  22. In: ICML (2017)
  23. In: WACV (2024)
  24. In: CVPR (2016)
  25. In: CVPR (2019)
  26. ICML (2022)
  27. ICLR (2017)
  28. In: NeurIPSW (2015)
  29. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2261–2269 (2017)
  30. In: NeurIPS (2020)
  31. In: NeurIPS (2021)
  32. Machine Learning (2021)
  33. Expert Systems with Applications (2023)
  34. Kirsch, A.: Advancing deep active learning & data subset selection: Unifying principles with information-theory intuitions (2024)
  35. Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., MIT (2009)
  36. JMIR Med Inform (2022)
  37. In: ICLR (2024)
  38. In: ICCV (2019)
  39. In: CVPR (2022a)
  40. arXiv preprint arXiv:1506.04579 (2015)
  41. NeurIPS (2020)
  42. In: CVPR (2023)
  43. In: CVPR (2022b)
  44. In: CVPR (2022c)
  45. In: ECCV (2022d)
  46. In: ICLR (2019)
  47. In: ICML (2020)
  48. Nesterov, Y.E.: A method of solving a convex programming problem with convergence rate 𝒪⁢(1k2)𝒪1superscript𝑘2\mathcal{O}\left(\frac{1}{k^{2}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ). In: Doklady Akademii Nauk (1983)
  49. In: NeurIPS (2022)
  50. International Journal of Computer Vision (2015)
  51. The journal of machine learning research (2014)
  52. Transactions on Machine Learning Research (2022)
  53. In: ICML (2013)
  54. In: ICML (2021)
  55. In: CVPR (2022)
  56. Wightman, R.: Pytorch image models. https://github.com/rwightman/pytorch-image-models (2019)
  57. In: NeurIPSW (2021)
  58. In: ACCV (2022a)
  59. ArXiv abs/2207.07517 (2022b)
  60. In: ICCV (2023)
  61. arXiv preprint arXiv:2110.11334 (2021)
  62. In: CVPR (2020)
  63. In: ICLR (2018)
  64. arXiv preprint arXiv:2306.09301 (2023)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: