"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces (2312.06147v1)

Published 11 Dec 2023 in cs.CL and cs.IR

Abstract: LLMs that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.

References (39)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that few-shot prompting with semantically similar examples improves recall rates for web element retrieval.
The study shows that intelligently truncating HTML content and adopting a Web Assistant persona significantly enhances LLM performance.
The paper highlights challenges such as LLMs occasionally not following directions and referencing nonexistent elements, indicating areas for refinement.

Introduction to LLMs and Web Interfaces

LLMs have shown a remarkable ability to comprehend a variety of data formats, including HTML. Given that web interfaces are built using HTML, it is crucial to understand how LLMs can be utilized to retrieve and interact with important elements on a web page in response to a user query. This understanding can potentially lead to more effective and efficient information retrieval from web interfaces, significantly improving user experience and productivity.

Experiment Design

The paper examines the capability of LLMs, using Claude2 by Anthropic, which stands out with its 100k token context length, in extracting relevant web elements based on user queries. It explores four critical aspects that influence this process:

The impact of example selection in few-shot prompting,
The specificity of user queries,
Strategies for truncating HTML documents, and
The persona adopted by the LLM during interaction.

Findings and Challenges

The findings suggest that while LLMs show reasonable effectiveness in retrieving web UI elements, improvements are necessary. A notable discovery is that the method of example selection in prompting can significantly impact LLM performance. Semantically similar few-shot examples tend to improve recall rates, but too many examples can hamper performance due to longer input sequences. Simplifying or abstracting the specificity of a query to more closely mimic actual user behavior did not consistently affect outcomes, whereas intelligently truncating HTML content led to substantial performance gains.

Moreover, the role assumed by the LLM (e.g., a Web Assistant, Generic User, or UI Designer) also influenced outcomes, with the Web Assistant persona demonstrating superior performance. However, LLMs occasionally failed to follow directions or created references to nonexistent web elements, indicating areas where these models need refinement.

Conclusion and Outlook

The paper concludes with implications for future research, emphasizing the importance of further exploring LLM responsiveness to user intent, regardless of the level of prompt specificity. Strategies to encode extensive HTML content within the limited context length of LLMs are considered vital for extending such capabilities. Researchers should not only focus on model enhancements but also on privacy and security concerns when integrating personal user data.

The advancement in this field promises the development of more reliable and intelligent systems capable of assisting users in navigating the increasingly complex digital world efficiently.

PDF Markdown

Related Papers

Tweets

https://twitter.com/1164464202846625792/status/1734992316799005095