Emergent Mind

Abstract

Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, LLMs (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs.

Example input prompt for Review-LLM.

Overview

  • The paper proposes Review-LLM, a framework that utilizes LLMs to generate personalized reviews in e-commerce settings by incorporating user-specific preferences, historical behaviors, and satisfaction ratings.

  • Review-LLM employs supervised fine-tuning with Low-Rank Adaptation (LoRA) to enhance the model's ability to generate reviews that reflect user sentiments and writing styles, outperforming state-of-the-art models such as GPT-3.5-Turbo and GPT-4o.

  • Experimental results, including human evaluations, demonstrate that Review-LLM produces more semantically consistent reviews, especially in scenarios requiring the generation of negative reviews, showcasing its potential for improving automated review systems.

Review-LLM: Harnessing LLMs for Personalized Review Generation

Introduction

The paper titled "Review-LLM: Harnessing LLMs for Personalized Review Generation" addresses the challenge of generating personalized reviews in e-commerce settings using LLMs. While LLMs like ChatGPT exhibit superior text modeling capabilities, leveraging these models directly for review generation poses certain issues, such as the tendency to generate overly polite reviews and the lack of personalized input from user history. To tackle this, the authors propose Review-LLM, a system that customizes LLMs to account for user-specific preferences and sentiments, improving the quality and relevance of the generated reviews.

Methodology

The proposed Review-LLM framework reconstructs the prompt input by incorporating user historical behaviors, item titles, and corresponding reviews. By integrating this information, the model can better capture user interest features and review writing styles. Additionally, user ratings are included in the prompt to indicate satisfaction levels, thus influencing the sentiment of the generated reviews.

Review-LLM utilizes Supervised Fine-Tuning (SFT) with Low-Rank Adaptation (LoRA) for parameter-efficient training. This fine-tuning process allows the LLM to generate personalized reviews for given user and target items. The input prompt for Review-LLM is composed of the following:

  1. Generation Instruction: Instructs the LLM to consider user preferences and historical behaviors to generate the review.
  2. Input: Contains the items previously interacted with by the user, along with their titles, reviews, and ratings.
  3. Target Item: Information about the newly purchased item and its rating.
  4. Response: The generated review for the target item.

Experimental Results

The authors conducted experiments on five Amazon review datasets and compared the Review-LLM with several baselines, including GPT-3.5-Turbo, GPT-4o, and Llama-3-8b. The performance was evaluated using metrics such as ROUGE-1, ROUGE-L, and BERT similar score (BertScore).

Simple Evaluation

The experimental results indicate that Review-LLM significantly outperforms the baselines across all metrics. Specifically, the inclusion of user ratings in the prompt contributes to better performance:

  • ROUGE-1: 31.15
  • ROUGE-L: 26.88
  • BertScore: 49.52

Negative Review Performance

To test the model's ability to generate negative reviews, a hard evaluation dataset composed of negative reviews was used. Review-LLM demonstrated a superior performance in reflecting user dissatisfaction compared to the baselines, reaffirming the effectiveness of incorporating rating information:

  • ROUGE-1: 21.93
  • ROUGE-L: 16.63
  • BertScore: 39.35

Human Evaluation and Case Study

Human evaluators confirmed that Review-LLM's generated reviews were more semantically consistent with the reference reviews. A case study further illustrated that Review-LLM could produce reviews that better reflect the user's sentiment and writing style, compared to GPT-3.5-Turbo and GPT-4o.

Implications and Future Work

The findings imply that personalized review generation can be significantly enhanced by aggregating rich user behavior data and integrating it into LLMs through supervised fine-tuning. Practically, this approach can improve the quality and relevance of automated reviews in e-commerce platforms, potentially enhancing user satisfaction and engagement.

Future research should focus on addressing the limitations of the current framework. Specifically, capturing the diversity of individual preferences and incorporating the temporal dynamics of user interactions could further refine the personalization aspect. Additionally, extending this approach to other domains where personalized content generation is critical could offer broader applicability.

Conclusion

The proposed Review-LLM framework successfully leverages LLMs for personalized review generation by integrating detailed user behavior data and ratings into the model inputs. The fine-tuning approach ensures that the generated reviews reflect user-specific preferences and sentiments, outperforming state-of-the-art models like GPT-3.5-Turbo and GPT-4o. This work underscores the potential of LLMs in enhancing personalized content generation in recommender systems, paving the way for future innovations in AI-driven personalization.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.