Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 42 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 217 tok/s Pro

GPT OSS 120B 474 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Perplexed: Understanding When Large Language Models are Confused (2404.06634v1)

Published 9 Apr 2024 in cs.SE

Abstract: LLMs have become dominant in the NLP field causing a huge surge in progress in a short amount of time. However, their limitations are still a mystery and have primarily been explored through tailored datasets to analyze a specific human-level skill such as negation, name resolution, etc. In this paper, we introduce perplexed, a library for exploring where a particular LLM is perplexed. To show the flexibility and types of insights that can be gained by perplexed, we conducted a case study focused on LLMs for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. Specifically, we explore success and failure cases at the token level of code LLMs under different scenarios pertaining to the type of coding structure the model is predicting, e.g., a variable name or operator, and how predicting of internal verses external method invocations impact performance. From this analysis, we found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct. Additionally, we found the models to generally perform worse at predicting internal method invocations than external ones. We have open sourced both of these tools to allow the research community to better understand LLMs in general and LLMs for code generation.

References (49)