User-LLM: Efficient LLM Contextualization with User Embeddings (2402.13598v2)

Published 21 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have achieved remarkable success across various domains, but effectively incorporating complex and potentially noisy user timeline data into LLMs remains a challenge. Current approaches often involve translating user timelines into text descriptions before feeding them to LLMs, which can be inefficient and may not fully capture the nuances of user behavior. Inspired by how LLMs are effectively integrated with images through direct embeddings, we propose User-LLM, a novel framework that leverages user embeddings to directly contextualize LLMs with user history interactions. These embeddings, generated by a user encoder pretrained using self-supervised learning on diverse user interactions, capture latent user behaviors and interests as well as their evolution over time. We integrate these user embeddings with LLMs through cross-attention, enabling LLMs to dynamically adapt their responses based on the context of a user's past actions and preferences. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X. Comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate that User-LLM outperforms text-prompt-based contextualization on tasks requiring deep user understanding, with improvements of up to 16.33%, particularly excelling on long sequences that capture subtle shifts in user behavior. Furthermore, the incorporation of Perceiver layers streamlines the integration between user encoders and LLMs, yielding additional computational savings.

References (75)

Citations (13)

View on Semantic Scholar

Summary

The paper introduces a two-phase framework that generates dense user embeddings from multimodal interactions to efficiently contextualize LLMs.
It employs cross-attention and soft prompting to integrate user-specific data while significantly reducing input lengths and computational costs.
Experimental evaluations on three datasets demonstrate improved performance and up to 78.1X FLOPs reduction compared to text-prompt methods.

The paper "User-LLM: Efficient LLM Contextualization with User Embeddings" (2402.13598) proposes a novel framework called User-LLM to address the challenge of effectively incorporating rich, complex, and potentially noisy user interaction data into LLMs for personalization. Traditional methods often rely on feeding raw user history as text prompts, which is computationally expensive, especially for long sequences, and can struggle with the structure and noise of real-world interaction data.

User-LLM tackles this by introducing a two-phase approach:

User Embedding Generation: A dedicated Transformer-based encoder is pretrained in a self-supervised manner on diverse user interaction data (which can include multiple modalities like item name, rating, category). This encoder distills the user's historical behavior and preferences into dense, compressed user embeddings. The paper primarily uses an Autoregressive Transformer as the encoder, processing sequences of fused embeddings representing user activities. An alternative Dual Encoder architecture is also explored. The autoregressive encoder generates a sequence of user embeddings, one for each input event.
LLM Contextualization: These generated user embeddings are integrated with an LLM to provide user-specific context. The primary integration mechanism explored is cross-attention, similar to models like Flamingo [alayrac2022flamingo], where the LLM's intermediate text representations attend to the user embeddings. An alternative soft-prompting approach is also investigated, where the user embeddings are prepended as soft tokens to the LLM input.

A key advantage highlighted by User-LLM is its efficiency compared to text-prompt-based methods. By condensing potentially very long user history sequences into a fixed number of dense embeddings (often one embedding per event, or even further compressed using Perceiver layers), the LLM's input sequence length remains significantly shorter. This drastically reduces the computational cost and memory requirements for LLMs, particularly for tasks involving extensive user history. The paper demonstrates substantial FLOPs reductions (up to 78.1X) compared to text prompting as the history length increases. Perceiver layers [jaegle2021perceiver] are incorporated to further compress user embeddings using a learnable latent query, improving efficiency and potentially handling noisy contexts.

The framework offers flexible training strategies:

Full: Finetune all components (user encoder, projection layers, LLM).
Enc: Finetune only the user encoder and projection layers, keeping the LLM frozen.
LoRA: Finetune the user encoder, projection layers, and LLM using LoRA for parameter efficiency.
Proj: Finetune only the projection layers, keeping the user encoder and LLM frozen.

The effectiveness of User-LLM is evaluated on three public datasets: MovieLens20M, Google Local Review, and Amazon Review, across various tasks: Next Item Prediction, Favorite Genre/Category Prediction (requiring deep user understanding), and Multimodal Review Generation.

Key findings from the experiments include:

Performance: User-LLM generally outperforms non-LLM baselines (Dual Encoder, Bert4Rec) on denser datasets like MovieLens and Google Local Review for next item prediction. While Bert4Rec performs well on the sparse Amazon dataset, User-LLM shows competitive results and excels in tasks requiring deeper user understanding or generation.
Long Context Handling: User-LLM significantly outperforms text-prompt-based LLM finetuning on tasks with long user history sequences, where text prompting becomes computationally prohibitive and performance degrades due to LLM limitations with long inputs [liu2023lost].
Efficiency: User-LLM requires fewer trainable parameters for competitive performance (e.g., the 'Enc' strategy performs well). Its ability to represent user history with fewer tokens than text prompts leads to significant inference efficiency gains. User-LLM with Perceiver can further reduce the number of user embedding tokens while maintaining performance.
Training Strategies: The 'Enc' strategy (frozen LLM, tune encoder/projection) effectively contextualizes the LLM and often outperforms text-prompt-based LoRA tuning, suggesting it leverages the LLM's pre-existing knowledge well without overfitting.
Ablation Studies: Pretraining the user encoder is shown to be crucial. Combining long-term user embeddings with short-term text prompts can yield better performance. Cross-attention generally performs better than soft-prompting for integrating user embeddings, especially for generation tasks. The Autoregressive encoder tends to outperform the Dual Encoder in this setup.

In practice, User-LLM provides a computationally efficient and effective way to personalize LLM responses by leveraging user interaction history. Its architecture allows for processing diverse, multimodal data and long sequences, making it suitable for applications like personalized recommendations, content generation, and user-aware chatbots in real-world systems where user history can be extensive and varied. The flexible training strategies offer options for balancing performance, computational cost, and the need to preserve the LLM's base capabilities.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1760521226756177972

https://twitter.com/IntuitMachine/status/1760659921903702261

https://twitter.com/fly51fly/status/1760674024861163653

https://twitter.com/_reachsumit/status/1760577966470517079

https://twitter.com/TheTuringPost/status/1762507935035625934

https://twitter.com/knishimae0531/status/1760830992758292780

User-LLM: Efficient LLM Contextualization with User Embeddings (2402.13598v2)

Summary

Related Papers

Tweets