Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention (2404.03637v2)
Abstract: In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward recommendation scenarios, designing a proper reward setting that reconciles the inner dynamics of various tasks is quite intricate. In response to these challenges, we introduce DT4IER, an advanced decision transformer-based recommendation model that is engineered to not only elevate the effectiveness of recommendations but also to achieve a harmonious balance between immediate user engagement and long-term retention. The DT4IER applies an innovative multi-reward design that adeptly balances short and long-term rewards with user-specific attributes, which serve to enhance the contextual richness of the reward sequence ensuring a more informed and personalized recommendation process. To enhance its predictive capabilities, DT4IER incorporates a high-dimensional encoder, skillfully designed to identify and leverage the intricate interrelations across diverse tasks. Furthermore, we integrate a contrastive learning approach within the action embedding predictions, a strategy that significantly boosts the model's overall performance. Experiments on three real-world datasets demonstrate the effectiveness of DT4IER against state-of-the-art Sequential Recommender Systems (SRSs) and Multi-Task Learning (MTL) models in terms of both prediction accuracy and effectiveness in specific tasks. The source code is accessible online to facilitate replication
- Industry 4.0 and health: Internet of things, big data, and cloud computing for healthcare 4.0. Journal of Industrial Information Integration 18 (2020), 100129.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys (CSUR) (2021).
- Incremental natural actor-critic algorithms. Advances in neural information processing systems 20 (2007).
- Reinforcing User Retention in a Billion Scale Short Video Recommender System. In Companion Proceedings of the ACM Web Conference 2023. 421–426.
- Two-Stage Constrained Actor-Critic for Short Video Recommendation. In Proceedings of the ACM Web Conference 2023. 865–875.
- Large-scale interactive recommendation with tree-structured policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3312–3320.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084–15097.
- Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.
- Generative adversarial user model for reinforcement learning based recommendation system. In International Conference on Machine Learning. PMLR, 1052–1061.
- A survey of deep reinforcement learning in recommender systems: A systematic review and future directions. arXiv preprint arXiv:2109.03540 (2021).
- Generative inverse deep reinforcement learning for online recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 201–210.
- Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
- Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
- Model-free reinforcement learning with continuous action in practice. In 2012 American Control Conference (ACC). IEEE, 2177–2182.
- Sequential user-based recurrent neural network recommendations. In Proceedings of the eleventh ACM conference on recommender systems. 152–160.
- Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).
- Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: short papers). 845–850.
- Rank and rate: multi-task learning for recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems. 451–454.
- Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM conference on recommender systems. 241–248.
- SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In Proceedings of the Twenty-eighth International Joint Conference on Artificial Intelligence (IJCAI-19). Macau, China, 2592–2599. See arXiv:1905.12767 for a related and expanded paper (with additional material and authors)..
- SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019).
- RecSim: A Configurable Simulation Platform for Recommender Systems. (2019). arXiv:1909.04847 [cs.LG]
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
- Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
- MLP4Rec: A pure MLP architecture for sequential recommendations. arXiv preprint arXiv:2204.11510 (2022).
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013
- Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).
- State representation modeling for deep reinforcement learning based recommendation. Knowledge-Based Systems 205 (2020), 106170.
- Multi-Task Recommendations with Reinforcement Learning. In Proceedings of the ACM Web Conference 2023. 1273–1282.
- Coevolutionary recommendation model: Mutual learning between ratings and reviews. In Proceedings of the 2018 World Wide Web Conference. 773–782.
- Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930–1939.
- Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
- Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems. In Proceedings of the ninth international conference on Electronic commerce. 75–84.
- Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994–4003.
- Optimal radio channel recommendations with explicit and implicit feedback. In Proceedings of the sixth ACM conference on Recommender systems. 75–82.
- Raymond J Mooney and Loriene Roy. 2000. Content-based book recommending using learning for text categorization. In Proceedings of the fifth ACM conference on Digital libraries. 195–204.
- A Mandarin Prosodic Boundary Prediction Model Based on Multi-Task Learning.. In Interspeech. 4485–4488.
- BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, USA, 311–318. https://doi.org/10.3115/1073083.1073135
- Value-aware recommendation based on reinforcement profit maximization. In The World Wide Web Conference. 3123–3129.
- Jan Peters and Stefan Schaal. 2008. Natural actor-critic. Neurocomputing 71, 7-9 (2008), 1180–1190.
- Simplifying Reward Design through Divide-and-Conquer. arXiv:1806.02501 [cs.RO]
- Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.
- An MDP-based recommender system. Journal of Machine Learning Research 6, 9 (2005).
- BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
- Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In The 41st international acm sigir conference on research & development in information retrieval. 235–244.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems 12 (1999).
- Usage-based web recommendations: a reinforcement learning approach. In Proceedings of the 2007 ACM conference on Recommender systems. 113–120.
- Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st workshop on deep learning for recommender systems. 17–22.
- Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Fourteenth ACM Conference on Recommender Systems. 269–278.
- Attention is All You Need. https://arxiv.org/pdf/1706.03762.pdf
- Modelling user retention in mobile games. In 2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
- Surrogate for Long-Term User Experience in Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4100–4109.
- A Theoretical Analysis of NDCG Type Ranking Measures. arXiv:1304.6480 [cs.LG]
- Returning is believing: Optimizing long-term user engagement in recommender systems. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1927–1936.
- Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 931–940.
- Meta-gradient reinforcement learning. Advances in neural information processing systems 31 (2018).
- Yongxin Yang and Timothy Hospedales. 2016. Deep multi-task representation learning: A tensor factorisation approach. arXiv preprint arXiv:1605.06391 (2016).
- Beyond clicks: dwell time for personalization. In Proceedings of the 8th ACM Conference on Recommender systems. 113–120.
- Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4510–4520.
- Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.
- Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering (2021).
- KuaiSim: A Comprehensive Simulator for Recommender Systems. arXiv preprint arXiv:2309.12645 (2023).
- User Retention-oriented Recommendation with Decision Transformer. In Proceedings of the ACM Web Conference 2023. 1141–1149.
- ” Deep reinforcement learning for search, recommendation, and online advertising: a survey” by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM sigweb newsletter Spring (2019), 1–15.
- Whole-chain recommendations. In Proceedings of the 29th ACM international conference on information & knowledge management. 1883–1891.
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.
- Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67, 4 (2011), 1422–1433.
- DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference. 167–176.
- Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2810–2818.