Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation (2405.12119v1)

Published 20 May 2024 in cs.IR, cs.AI, and cs.CL

Abstract: LLMs are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item popularity, on targeted conversational recommendation platforms. In conversational recommendation, LLMs recommend items by generating the titles (as multiple tokens) autoregressively, making it difficult to obtain and control the recommendations over all items. Thus, we propose a Reindex-Then-Adapt (RTA) framework, which converts multi-token item titles into single tokens within LLMs, and then adjusts the probability distributions over these single-token item titles accordingly. The RTA framework marries the benefits of both LLMs and traditional recommender systems (RecSys): understanding complex queries as LLMs do; while efficiently controlling the recommended item distributions in conversational recommendations as traditional RecSys do. Our framework demonstrates improved accuracy metrics across three different conversational recommendation datasets and two adaptation settings

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. LLM Based Generation of Item-Description for Recommendation System. In Proceedings of the 17th ACM Conference on Recommender Systems. 1204–1207.
  2. Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata. In Proceedings of the 17th ACM Conference on Recommender Systems. 1–1.
  3. Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings. Springer, 722–735.
  4. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447 (2023).
  5. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  6. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901.
  7. Li Chen and Pearl Pu. 2012. Critiquing-based recommenders: survey and emerging trends. User Modeling and User-Adapted Interaction 22 (2012), 125–150.
  8. Towards Knowledge-Based Recommender Dialog System. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1803–1813.
  9. Understanding Differential Search Index for Text Retrieval. arXiv preprint arXiv:2305.02073 (2023).
  10. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.
  11. Evaluation of bert and albert sentence embedding performance on downstream nlp tasks. In 2020 25th International conference on pattern recognition (ICPR). IEEE, 5482–5487.
  12. PaLM: Scaling Language Modeling with Pathways. ArXiv abs/2204.02311 (2022).
  13. Towards conversational recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 815–824.
  14. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
  15. M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems. arXiv:2205.08084 [cs.IR]
  16. Uncovering ChatGPT’s Capabilities in Recommender Systems. arXiv preprint arXiv:2305.02182 (2023).
  17. A Large Language Model Enhanced Conversational Recommender System. arXiv preprint arXiv:2308.06212 (2023).
  18. Leveraging Large Language Models in Conversational Recommender Systems. arXiv preprint arXiv:2305.07961 (2023).
  19. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). In RecSys ’22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18 - 23, 2022, Jennifer Golbeck, F. Maxwell Harper, Vanessa Murdock, Michael D. Ekstrand, Bracha Shapira, Justin Basilico, Keld T. Lundgaard, and Even Oldridge (Eds.). ACM, 299–315.
  20. Improving the gating mechanism of recurrent neural networks. In International Conference on Machine Learning. PMLR, 3800–3809.
  21. Leveraging Large Language Models for Sequential Recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems. 1096–1102.
  22. INSPIRED: Toward Sociable Recommendation Dialog Systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8142–8152.
  23. Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 720–730.
  24. Bundle MCR: Towards Conversational Bundle Recommendation. In Proceedings of the 16th ACM Conference on Recommender Systems. 288–298.
  25. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  26. Large language models are zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845 (2023).
  27. Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).
  28. Fism: factored item similarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 659–667.
  29. Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
  30. Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction. arXiv preprint arXiv:2305.06474 (2023).
  31. Scaling Laws for Neural Language Models. ArXiv abs/2001.08361 (2020).
  32. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, Vol. 1. 2.
  33. Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining. 304–312.
  34. Interactive path reasoning on graph for conversational recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2073–2083.
  35. GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation. arXiv:2304.03879 [cs.IR]
  36. Towards deep conversational recommendations. Advances in neural information processing systems 31 (2018).
  37. Self-Supervised Bot Play for Conversational Recommendation with Justifications. arXiv preprint arXiv:2112.05197 (2021).
  38. User-centric conversational recommendation with multi-aspect user modeling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 223–233.
  39. Competition-level code generation with AlphaCode. Science 378 (2022), 1092 – 1097.
  40. Hugo Liu and Push Singh. 2004. ConceptNet—a practical commonsense reasoning tool-kit. BT technology journal 22, 4 (2004), 211–226.
  41. Is ChatGPT a Good Recommender? A Preliminary Study. arXiv:2304.10149 [cs.IR]
  42. RevCore: Review-Augmented Conversational Recommendation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1161–1173.
  43. Large Language Model Augmented Narrative Driven Recommendations. arXiv preprint arXiv:2306.02250 (2023).
  44. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
  45. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992.
  46. Large Language Models are Competitive Near Cold-start Recommenders for Language-and Item-based Preferences. In Proceedings of the 17th ACM Conference on Recommender Systems. 890–896.
  47. Chatgpt: Optimizing language models for dialogue. OpenAI (2022).
  48. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th international conference on World Wide Web. 111–112.
  49. One embedder, any task: Instruction-finetuned text embeddings. arXiv preprint arXiv:2212.09741 (2022).
  50. Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems 35 (2022), 21831–21843.
  51. MosaicML NLP Team. 2023. Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. www.mosaicml.com/blog/mpt-7b Accessed: 2023-05-05.
  52. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  53. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
  54. Generative Recommendation: Towards Next-generation Recommender Paradigm. arXiv:2304.03516 [cs.IR]
  55. Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models. arXiv preprint arXiv:2305.13112 (2023).
  56. Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1929–1937.
  57. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 24824–24837.
  58. Deep language-based critiquing for recommender systems. In Proceedings of the 13th ACM Conference on Recommender Systems. 137–145.
  59. LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking. arXiv preprint arXiv:2311.02089 (2023).
  60. DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 270–278.
  61. Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation. In Proceedings of the ACM Web Conference 2022. 2153–2162.
  62. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697–12706.
  63. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X. arXiv:2303.17568 [cs.LG]
  64. Improving conversational recommender systems via knowledge graph based semantic fusion. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 1006–1014.
  65. Improving conversational recommender systems via transformer-based sequential modelling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2319–2324.
Citations (2)

Summary

  • The paper introduces the RTA framework to realign LLM outputs with dynamic recommendation platforms by reindexing multi-token item titles into single tokens and adapting logits.
  • It employs trainable aggregators and adapts logits via bias adjustment or RecSys gating, achieving significant efficiency gains and reduced model sizes compared to traditional methods.
  • Experiments across multiple CRS datasets show improved recommendation accuracy (HIT@10 and NDCG@10) and up to 100× faster inference than generative retrieval.

Reindex-Then-Adapt: Enhancing LLMs for Conversational Recommendation

Introduction

The paper introduces the Reindex-Then-Adapt (RTA) framework to address the distribution misalignment between LLMs and target conversational recommendation platforms. LLMs, when used for conversational recommendation, exhibit strong capabilities in indexing item content and understanding complex conversational contexts. However, their generative retrieval paradigm leads to suboptimal control over the distribution of recommended items, particularly in dynamic environments where item popularity shifts rapidly. The RTA framework proposes a two-step solution: reindexing multi-token item titles into single tokens within LLMs, followed by adapting the probability distributions over these single-token item titles to better align with target data distributions.

LLMs as Differentiable Search Indexes in CRS

LLMs can be conceptualized as Differentiable Search Index (DSI) models, where item content and conversational context are mapped to item indices via Learn to Index (L2I) and Learn to Retrieve (L2R) tasks, respectively. Empirical analysis demonstrates that LLMs, even without fine-tuning, possess substantial knowledge of popular items, as evidenced by high HIT@5 scores for frequently occurring movies in the ReDIAL dataset. Figure 1

Figure 1: HIT@5 accuracy of various LLMs on item indexing tasks using Wikipedia movie descriptions, stratified by item frequency in ReDIAL.

Despite this, LLMs' internal item popularity distributions often diverge from those of the target platform, both statically and dynamically. This misalignment is exacerbated by the rapid evolution of item popularity over time, as shown in the Reddit-Movie dataset.

The RTA Framework

The RTA framework consists of two main steps:

  1. Reindex Step: Multi-token item titles are aggregated into single-token embeddings using trainable aggregators (e.g., weighted pooling, RNN, transformer-based models). This enables efficient extraction of logit vectors for all items, facilitating direct control over recommendation distributions.
  2. Adapt Step: The logit vectors from the reindexed LLM are adjusted via bias terms or combined with traditional RecSys models using a gating mechanism. This step aligns the recommendation probabilities with the target data distribution, allowing for rapid adaptation to changing item popularity. Figure 2

    Figure 2: The RTA framework: reindexing item titles as single tokens and adapting logits via bias adjustment or RecSys gating.

Implementation Details

Reindexing

The reindex step aggregates multi-token embeddings into single-token representations. The aggregation is learned using a contrastive loss, ensuring semantic preservation. Aggregators such as weighted pooling, GRU-based RNNs, and transformers are evaluated. The weighted pooling method, despite its simplicity, achieves competitive performance, indicating that existing LLM token embeddings are sufficiently expressive for this task. Figure 3

Figure 3: Comparison of single-token embedding methods and their HIT@5 accuracy after reindexing.

Adaptation

The adapt step employs either bias term adjustment (affine transformation of logits) or RecSys gating (weighted combination of LLM and RecSys logits). The bias term approach is parameter-efficient and effective for small datasets, while RecSys gating leverages collaborative filtering for larger datasets with richer interaction data. Both methods are trained using maximum likelihood estimation on the target platform's data.

Experimental Results

Experiments are conducted on three CRS datasets: INSPIRED, ReDIAL, and Reddit-V1.5. The RTA framework consistently outperforms traditional RecSys models, zero-shot LLMs, dense retrievers, and prior CRS models in recommendation accuracy (HIT@10 and NDCG@10). Notably, the reindex step alone yields substantial efficiency gains, with the aggregator-based methods being approximately 10×10\times smaller than OOV embedding tables and 233×233\times smaller than the Llama2-7b base model. Recommendation inference is accelerated by 100×100\times compared to generative retrieval.

The adapt step further improves accuracy, with bias term adjustment excelling on small datasets and RecSys gating providing superior results on larger datasets. Ablation studies confirm the effectiveness of both multiplicative and additive bias terms, as well as the choice of RecSys model (FISM vs. SASRec) depending on dataset size and complexity.

Qualitative Analysis

Case studies demonstrate that the RTA framework enables LLMs to generate recommendations that are both contextually relevant and aligned with platform-specific popularity distributions. The framework also supports seamless integration with natural language generation for user-facing responses. Figure 4

Figure 4: Example recommendations from Llama2, Llama-R, and Llama-RTA (+SASRec) on a ReDIAL conversation, with corresponding natural language responses.

Implications and Future Directions

The RTA framework provides a scalable, architecture-agnostic solution for aligning LLM-based recommenders with dynamic target distributions. Its modularity allows for rapid adaptation to distribution shifts without retraining the base LLM, making it suitable for production environments with evolving item catalogs. The approach is compatible with proprietary models (e.g., GPT-3.5-turbo) and can be extended to larger LLMs as compute resources permit.

Theoretically, the framework bridges the gap between generative retrieval and traditional collaborative filtering, leveraging the strengths of both paradigms. Practically, it enables fine-grained control over recommendation outputs, supporting objectives such as fairness, diversity, and platform-specific constraints.

Future work may explore more sophisticated aggregation strategies, dynamic gating mechanisms, and integration with user modeling for personalized recommendations. Additionally, extending the framework to other domains (e.g., e-commerce, music) and evaluating its robustness to cold-start items remain promising directions.

Conclusion

The Reindex-Then-Adapt framework effectively addresses the distribution misalignment challenge in LLM-based conversational recommendation. By reindexing item titles as single tokens and adapting recommendation distributions via bias adjustment or RecSys gating, the framework achieves superior accuracy, efficiency, and controllability. Its design facilitates rapid adaptation to dynamic environments and provides a foundation for further advances in LLM-powered recommender systems.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com