Understanding the Ranking Loss for Recommendation with Sparse User Feedback (2403.14144v2)
Abstract: Click-through rate (CTR) prediction is a crucial area of research in online advertising. While binary cross entropy (BCE) has been widely used as the optimization objective for treating CTR prediction as a binary classification problem, recent advancements have shown that combining BCE loss with an auxiliary ranking loss can significantly improve performance. However, the full effectiveness of this combination loss is not yet fully understood. In this paper, we uncover a new challenge associated with the BCE loss in scenarios where positive feedback is sparse: the issue of gradient vanishing for negative samples. We introduce a novel perspective on the effectiveness of the auxiliary ranking loss in CTR prediction: it generates larger gradients on negative samples, thereby mitigating the optimization difficulties when using the BCE loss only and resulting in improved classification ability. To validate our perspective, we conduct theoretical analysis and extensive empirical evaluations on public datasets. Additionally, we successfully integrate the ranking loss into Tencent's online advertising system, achieving notable lifts of 0.70% and 1.26% in Gross Merchandise Value (GMV) for two main scenarios. The code is openly accessible at: https://github.com/SkylerLinn/Understanding-the-Ranking-Loss.
- Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4502–4508.
- Learning to rank with nonsmooth cost functions. Advances in neural information processing systems 19 (2006).
- Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. 89–96.
- Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning. 129–136.
- Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 4 (2014), 1–34.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
- DeepFM: a factorization-machine based neural network for CTR prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 1725–1731.
- On the Embedding Collapse when Scaling up Recommendation Models. arXiv preprint arXiv:2310.04400 (2023).
- Practical lessons from predicting clicks on ads at facebook. In International Workshop on Data Mining for Online Advertising (ADKDD). 1–9.
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 133–142.
- Field-aware factorization machines in a real-world online advertising system. In International Conference on World Wide Web (WWW). 680–688.
- Learning to rank from bayesian decision inference. In Proceedings of the 18th ACM conference on Information and knowledge management. 827–836.
- Criteo Labs. 2014. Display Advertising Challenge. https://www.kaggle.com/c/criteo-display-ad-challenge
- Click-through prediction for advertising in twitter timeline. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1959–1968.
- xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1754–1763.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
- Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225–331.
- FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. arXiv preprint arXiv:2304.00902 (2023).
- Ad click prediction: a view from the trenches. In ACM SIGKDD International conference on Knowledge Discovery & Data Mining (KDD). 1222–1230.
- Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference. 1349–1357.
- Ad Recommendation in a Collapsed and Entangled World. arXiv preprint arXiv:2403.00793 (2024).
- Personalized re-ranking for recommendation. In Proceedings of the 13th ACM conference on recommender systems. 3–11.
- Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 1149–1154.
- Steffen Rendle. 2010a. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.
- Steffen Rendle. 2010b. Factorization machines. In IEEE International Conference on Data Mining (ICDM). 995–1000.
- David Sculley. 2010. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 979–988.
- Combined regression and tripletwise learning for conversion rate prediction in real-time bidding advertising. In The 41st international ACM SIGIR conference on research & development in information retrieval. 115–123.
- Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4813–4822.
- STEM: Unleashing the Power of Embeddings for Multi-task Recommendation. arXiv preprint arXiv:2308.13537 (2023).
- FM2: field-matrixed factorization machines for recommender systems. In Proceedings of the Web Conference 2021. 2828–2837.
- Pirank: Scalable learning to rank via differentiable sorting. Advances in Neural Information Processing Systems 34 (2021), 21644–21654.
- Ctr prediction for contextual advertising: Learning-to-rank approach. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. 1–8.
- Deep & cross network for ad click predictions. In International Workshop on Data Mining for Online Advertising (ADKDD). 1–7.
- Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021. 1785–1797.
- Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning. 1192–1199.
- Scale calibration of deep ranking models. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4300–4309.
- Learning to Rank For Push Notifications Using Pairwise Expected Regret. arXiv preprint arXiv:2201.07681 (2022).
- A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 287–294.
- Deep interest evolution network for click-through rate prediction. In AAAI. 5941–5948.
- Deep interest network for click-through rate prediction. In SIGKDD. 1059–1068.
- Temporal interest network for click-through rate prediction. arXiv preprint arXiv:2308.08487 (2023).