Diversity-Aware $k$-Maximum Inner Product Search Revisited (2402.13858v1)
Abstract: The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware $k$MIPS (D$k$MIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for D$k$MIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of $1/4$ with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.
- M. Abdool, M. Haldar, P. Ramanathan, T. Sax, L. Zhang, A. Manaswala, L. Yang, B. Turnbull, Q. Zhang, and T. Legrand, “Managing diversity in Airbnb search,” in KDD, 2020, pp. 2952–2960.
- F. Abuzaid, G. Sethi, P. Bailis, and M. Zaharia, “To index or not to index: Optimizing exact maximum inner product search,” in ICDE, 2019, pp. 1250–1261.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in WSDM, 2009, pp. 5–14.
- T. D. Ahle, R. Pagh, I. Razenshteyn, and F. Silvestri, “On the complexity of inner product similarity join,” in PODS, 2016, pp. 151–164.
- D. Amagata and T. Hara, “Reverse maximum inner product search: How to efficiently find users who would like to buy my item?” in RecSys, 2021, pp. 273–281.
- ——, “Reverse maximum inner product search: Formulation, algorithms, and analysis,” ACM Trans. Web, vol. 17, no. 4, pp. 26:1–26:23, 2023.
- A. Anderson, L. Maystre, I. Anderson, R. Mehrotra, and M. Lalmas, “Algorithmic effects on the diversity of consumption on Spotify,” in WWW, 2020, pp. 2155–2165.
- V. W. Anelli, A. Bellogín, T. Di Noia, and C. Pomo, “Rethinking neural vs. matrix-factorization collaborative filtering: the theoretical perspectives,” in ICML, 2021, pp. 521–529.
- A. Ashkan, B. Kveton, S. Berkovsky, and Z. Wen, “Optimal greedy diversity for recommendation,” in IJCAI, 2015, pp. 1742–1748.
- G. Ballard, T. G. Kolda, A. Pinar, and C. Seshadhri, “Diamond sampling for approximate maximum all-pairs dot-product (MAD) search,” in ICDM, 2015, pp. 11–20.
- O. Barkan and N. Koenigstein, “Item2Vec: Neural item embedding for collaborative filtering,” in MLSP@RecSys, 2016, pp. 1–6.
- Z. Cai, G. Kalamatianos, G. J. Fakas, N. Mamoulis, and D. Papadias, “Diversified spatial keyword search on RDF data,” VLDB J., vol. 29, no. 5, pp. 1171–1189, 2020.
- J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in SIGIR, 1998, pp. 335–336.
- W. Chen, P. Ren, F. Cai, F. Sun, and M. de Rijke, “Multi-interest diversification for end-to-end sequential recommendation,” ACM Trans. Inf. Syst., vol. 40, no. 1, 2022.
- P. Covington, J. Adams, and E. Sargin, “Deep neural networks for YouTube recommendations,” in RecSys, 2016, pp. 191–198.
- R. R. Curtin and P. Ram, “Dual-tree fast exact max-kernel search,” Stat. Anal. Data Min., vol. 7, no. 4, pp. 229–253, 2014.
- X. Dai, X. Yan, K. K. W. Ng, J. Liu, and J. Cheng, “Norm-explicit quantization: Improving vector quantization for maximum inner product search,” in AAAI, 2020, pp. 51–58.
- Q. Ding, H. Yu, and C. Hsieh, “A fast sampling algorithm for maximum inner product search,” in AISTATS, 2019, pp. 3004–3012.
- M. Drosou and E. Pitoura, “Search result diversification,” SIGMOD Rec., vol. 39, no. 1, pp. 41–47, 2010.
- ——, “Disc diversity: Result diversification based on dissimilarity and coverage,” Proc. VLDB Endow., vol. 6, no. 1, pp. 13–24, 2012.
- W. Fan, X. Wang, and Y. Wu, “Diversified top-k graph pattern matching,” Proc. VLDB Endow., vol. 6, no. 13, pp. 1510–1521, 2013.
- C. Févotte and J. Idier, “Algorithms for nonnegative matrix factorization with the β𝛽\betaitalic_β-divergence,” Neural Comput., vol. 23, no. 9, pp. 2421–2456, 2011.
- S. Gollapudi and A. Sharma, “An axiomatic approach for result diversification,” in WWW, 2009, pp. 381–390.
- Q. Guo, H. V. Jagadish, A. K. H. Tung, and Y. Zheng, “Finding diverse neighbors in high dimensional space,” in ICDE, 2018, pp. 905–916.
- R. Guo, S. Kumar, K. Choromanski, and D. Simcha, “Quantization based fast inner product search,” in AISTATS, 2016, pp. 482–490.
- K. Han, Z. Cao, S. Cui, and B. Wu, “Deterministic approximation for submodular maximization over a matroid in nearly linear time,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 430–441, 2020.
- F. M. Harper and J. A. Konstan, “The MovieLens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, 2016.
- K. Hirata, D. Amagata, S. Fujita, and T. Hara, “Solving diversity-aware maximum inner product search efficiently and effectively,” in RecSys, 2022, pp. 198–207.
- ——, “Categorical diversity-aware inner product search,” IEEE Access, vol. 11, pp. 2586–2596, 2023.
- Q. Huang, G. Ma, J. Feng, Q. Fang, and A. K. H. Tung, “Accurate and fast asymmetric locality-sensitive hashing scheme for maximum inner product search,” in KDD, 2018, pp. 1561–1570.
- Q. Huang and A. K. H. Tung, “Lightweight-yet-efficient: Revitalizing ball-tree for point-to-hyperplane nearest neighbor search,” in ICDE, 2023, pp. 436–449.
- Q. Huang, Y. Wang, and A. K. H. Tung, “SAH: Shifting-aware asymmetric hashing for reverse k maximum inner product search,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 4, pp. 4312–4321, 2023.
- M. Kaminskas and D. Bridge, “Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems,” ACM Trans. Interact. Intell. Syst., vol. 7, no. 1, 2016.
- M. Karimi, D. Jannach, and M. Jugovac, “News recommender systems–survey and roads ahead,” Inf. Process. Manag., vol. 54, no. 6, pp. 1203–1227, 2018.
- O. Keivani, K. Sinha, and P. Ram, “Improved maximum inner product search with better theoretical guarantee using randomized partition trees,” Mach. Learn., vol. 107, no. 6, pp. 1069–1094, 2018.
- N. Koenigstein, P. Ram, and Y. Shavitt, “Efficient retrieval of recommendations in a matrix factorization framework,” in CIKM, 2012, pp. 535–544.
- Y. Koren, R. M. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
- A. Krause and D. Golovin, “Submodular function maximization,” in Tractability: Practical Approaches to Hard Problems. Cambridge, UK: Cambridge University Press, 2014, pp. 71–104.
- M. Kunaver and T. Požrl, “Diversity in recommender systems–a survey,” Knowl.-Based Syst., vol. 123, pp. 154–162, 2017.
- C.-C. Kuo, F. Glover, and K. S. Dhir, “Analyzing and modeling the maximum diversity problem by zero-one programming,” Dec. Sci., vol. 24, no. 6, pp. 1171–1185, 1993.
- H. Li, T. N. Chan, M. L. Yiu, and N. Mamoulis, “FEXIPRO: Fast and exact inner product retrieval in recommender systems,” in SIGMOD, 2017, pp. 835–850.
- S. Li, “Food.com recipes and interactions,” 2019. [Online]. Available: https://www.kaggle.com/dsv/783630
- H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding top-k shortest paths with diversity,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 488–502, 2018.
- J. Liu, X. Yan, X. Dai, Z. Li, J. Cheng, and M. Yang, “Understanding and improving proximity graph based maximum inner product search,” in AAAI, 2020, pp. 139–146.
- S. S. Lorenzen and N. Pham, “Revisiting wedge sampling for budgeted maximum inner product search,” in ECML-PKDD (I), 2020, pp. 439–455.
- S. Morozov and A. Babenko, “Non-metric similarity graphs for maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 4726–4735, 2018.
- H. Nakama, D. Amagata, and T. Hara, “Approximate top-k inner product join with a proximity graph,” in IEEE Big Data, 2021, pp. 4468–4471.
- B. Neyshabur and N. Srebro, “On symmetric and asymmetric LSHs for inner product search,” in ICML, 2015, pp. 1926–1934.
- F. Pedregosa, G. Varoquaux et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
- A. Pfadler, H. Zhao, J. Wang, L. Wang, P. Huang, and D. L. Lee, “Billion-scale recommendation with heterogeneous side information at Taobao,” in ICDE, 2020, pp. 1667–1676.
- N. Pham, “Simple yet efficient algorithms for maximum inner product search via extreme order statistics,” in KDD, 2021, pp. 1339–1347.
- L. Qin, J. X. Yu, and L. Chang, “Diversifying top-k results,” Proc. VLDB Endow., vol. 5, no. 11, pp. 1124–1135, 2012.
- P. Ram and A. G. Gray, “Maximum inner-product search using cone trees,” in KDD, 2012, pp. 931–939.
- S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi, “Heuristic and special case algorithms for dispersion problems,” Oper. Res., vol. 42, no. 2, pp. 299–310, 1994.
- S. Raza and C. Ding, “News recommender system: A review of recent progress, challenges, and opportunities,” Artif. Intell. Rev., pp. 1–52, 2022.
- S. Rendle, W. Krichene, L. Zhang, and J. Anderson, “Neural collaborative filtering vs. matrix factorization revisited,” in RecSys, 2020, pp. 240–248.
- F. Shen, W. Liu, S. Zhang, Y. Yang, and H. T. Shen, “Learning binary codes for maximum inner product search,” in ICCV, 2015, pp. 4148–4156.
- A. Shrivastava and P. Li, “Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS),” Adv. Neural Inf. Process. Syst., vol. 27, pp. 2321–2329, 2014.
- ——, “Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS),” in UAI, 2015, pp. 812–821.
- B. Smith and G. Linden, “Two decades of recommender systems at amazon.com,” IEEE Internet Comput., vol. 21, no. 3, pp. 12–18, 2017.
- Y. Song, Y. Gu, R. Zhang, and G. Yu, “ProMIPS: Efficient high-dimensional c-approximate maximum inner product search with a lightweight index,” in ICDE, 2021, pp. 1619–1630.
- H. Steck, L. Baltrunas, E. Elahi, D. Liang, Y. Raimond, and J. Basilico, “Deep learning for recommender systems: A Netflix case study,” AI Mag., vol. 42, no. 3, pp. 7–18, 2021.
- S. Tan, Z. Xu, W. Zhao, H. Fei, Z. Zhou, and P. Li, “Norm adjusted proximity graph for fast inner product retrieval,” in KDD, 2021, pp. 1552–1560.
- S. Tan, Z. Zhou, Z. Xu, and P. Li, “On efficient retrieval of top similarity vectors,” in EMNLP-IJCNLP, 2019, pp. 5235–5245.
- C. Teflioudi and R. Gemulla, “Exact and approximate maximum inner product search with LEMP,” ACM Trans. Database Syst., vol. 42, no. 1, pp. 5:1–5:49, 2017.
- C. Teflioudi, R. Gemulla, and O. Mykytiuk, “LEMP: Fast retrieval of large entries in a matrix product,” in SIGMOD, 2015, pp. 107–122.
- M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. T. Jr., and V. J. Tsotras, “On query result diversification,” in ICDE, 2011, pp. 1163–1174.
- L. Xiang, X. Yan, L. Lu, and B. Tang, “GAIPS: Accelerating maximum inner product search with GPU,” in SIGIR, 2021, pp. 1920–1924.
- H. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen, “Deep matrix factorization models for recommender systems,” in IJCAI, 2017, pp. 3203–3209.
- L. Yan, Z. Qin, R. K. Pasumarthi, X. Wang, and M. Bendersky, “Diversification-aware learning to rank using distributed representation,” in WWW, 2021, pp. 127–136.
- X. Yan, J. Li, X. Dai, H. Chen, and J. Cheng, “Norm-ranging LSH for maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 2956–2965, 2018.
- M. Yokoyama and T. Hara, “Efficient top-k result diversification for mobile sensor data,” in ICDCS, 2016, pp. 477–486.
- H. Yu, C. Hsieh, Q. Lei, and I. S. Dhillon, “A greedy approach for budgeted maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 5453–5462, 2017.
- L. Yuan, L. Qin, X. Lin, L. Chang, and W. Zhang, “Diversified top-k clique search,” VLDB J., vol. 25, no. 2, pp. 171–196, 2016.
- J. Zhang, D. Lian, H. Zhang, B. Wang, and E. Chen, “Query-aware quantization for maximum inner product search,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 4, pp. 4875–4883, 2023.
- X. Zhao, B. Zheng, X. Yi, X. Luan, C. Xie, X. Zhou, and C. S. Jensen, “FARGO: Fast maximum inner product search via global multi-probing,” Proc. VLDB Endow., vol. 16, no. 5, p. 1100–1112, 2023.
- K. Zheng, H. Wang, Z. Qi, J. Li, and H. Gao, “A survey of query result diversification,” Knowl. Inf. Syst., vol. 51, no. 1, pp. 1–36, 2017.
- Z. Zhou, S. Tan, Z. Xu, and P. Li, “Möbius transformation for fast inner product search on graph,” Adv. Neural Inf. Process. Syst., vol. 32, pp. 8216–8227, 2019.