Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diversity-Aware $k$-Maximum Inner Product Search Revisited (2402.13858v1)

Published 21 Feb 2024 in cs.IR, cs.DB, and cs.DS

Abstract: The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware $k$MIPS (D$k$MIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for D$k$MIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of $1/4$ with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. M. Abdool, M. Haldar, P. Ramanathan, T. Sax, L. Zhang, A. Manaswala, L. Yang, B. Turnbull, Q. Zhang, and T. Legrand, “Managing diversity in Airbnb search,” in KDD, 2020, pp. 2952–2960.
  2. F. Abuzaid, G. Sethi, P. Bailis, and M. Zaharia, “To index or not to index: Optimizing exact maximum inner product search,” in ICDE, 2019, pp. 1250–1261.
  3. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in WSDM, 2009, pp. 5–14.
  4. T. D. Ahle, R. Pagh, I. Razenshteyn, and F. Silvestri, “On the complexity of inner product similarity join,” in PODS, 2016, pp. 151–164.
  5. D. Amagata and T. Hara, “Reverse maximum inner product search: How to efficiently find users who would like to buy my item?” in RecSys, 2021, pp. 273–281.
  6. ——, “Reverse maximum inner product search: Formulation, algorithms, and analysis,” ACM Trans. Web, vol. 17, no. 4, pp. 26:1–26:23, 2023.
  7. A. Anderson, L. Maystre, I. Anderson, R. Mehrotra, and M. Lalmas, “Algorithmic effects on the diversity of consumption on Spotify,” in WWW, 2020, pp. 2155–2165.
  8. V. W. Anelli, A. Bellogín, T. Di Noia, and C. Pomo, “Rethinking neural vs. matrix-factorization collaborative filtering: the theoretical perspectives,” in ICML, 2021, pp. 521–529.
  9. A. Ashkan, B. Kveton, S. Berkovsky, and Z. Wen, “Optimal greedy diversity for recommendation,” in IJCAI, 2015, pp. 1742–1748.
  10. G. Ballard, T. G. Kolda, A. Pinar, and C. Seshadhri, “Diamond sampling for approximate maximum all-pairs dot-product (MAD) search,” in ICDM, 2015, pp. 11–20.
  11. O. Barkan and N. Koenigstein, “Item2Vec: Neural item embedding for collaborative filtering,” in MLSP@RecSys, 2016, pp. 1–6.
  12. Z. Cai, G. Kalamatianos, G. J. Fakas, N. Mamoulis, and D. Papadias, “Diversified spatial keyword search on RDF data,” VLDB J., vol. 29, no. 5, pp. 1171–1189, 2020.
  13. J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in SIGIR, 1998, pp. 335–336.
  14. W. Chen, P. Ren, F. Cai, F. Sun, and M. de Rijke, “Multi-interest diversification for end-to-end sequential recommendation,” ACM Trans. Inf. Syst., vol. 40, no. 1, 2022.
  15. P. Covington, J. Adams, and E. Sargin, “Deep neural networks for YouTube recommendations,” in RecSys, 2016, pp. 191–198.
  16. R. R. Curtin and P. Ram, “Dual-tree fast exact max-kernel search,” Stat. Anal. Data Min., vol. 7, no. 4, pp. 229–253, 2014.
  17. X. Dai, X. Yan, K. K. W. Ng, J. Liu, and J. Cheng, “Norm-explicit quantization: Improving vector quantization for maximum inner product search,” in AAAI, 2020, pp. 51–58.
  18. Q. Ding, H. Yu, and C. Hsieh, “A fast sampling algorithm for maximum inner product search,” in AISTATS, 2019, pp. 3004–3012.
  19. M. Drosou and E. Pitoura, “Search result diversification,” SIGMOD Rec., vol. 39, no. 1, pp. 41–47, 2010.
  20. ——, “Disc diversity: Result diversification based on dissimilarity and coverage,” Proc. VLDB Endow., vol. 6, no. 1, pp. 13–24, 2012.
  21. W. Fan, X. Wang, and Y. Wu, “Diversified top-k graph pattern matching,” Proc. VLDB Endow., vol. 6, no. 13, pp. 1510–1521, 2013.
  22. C. Févotte and J. Idier, “Algorithms for nonnegative matrix factorization with the β𝛽\betaitalic_β-divergence,” Neural Comput., vol. 23, no. 9, pp. 2421–2456, 2011.
  23. S. Gollapudi and A. Sharma, “An axiomatic approach for result diversification,” in WWW, 2009, pp. 381–390.
  24. Q. Guo, H. V. Jagadish, A. K. H. Tung, and Y. Zheng, “Finding diverse neighbors in high dimensional space,” in ICDE, 2018, pp. 905–916.
  25. R. Guo, S. Kumar, K. Choromanski, and D. Simcha, “Quantization based fast inner product search,” in AISTATS, 2016, pp. 482–490.
  26. K. Han, Z. Cao, S. Cui, and B. Wu, “Deterministic approximation for submodular maximization over a matroid in nearly linear time,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 430–441, 2020.
  27. F. M. Harper and J. A. Konstan, “The MovieLens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, 2016.
  28. K. Hirata, D. Amagata, S. Fujita, and T. Hara, “Solving diversity-aware maximum inner product search efficiently and effectively,” in RecSys, 2022, pp. 198–207.
  29. ——, “Categorical diversity-aware inner product search,” IEEE Access, vol. 11, pp. 2586–2596, 2023.
  30. Q. Huang, G. Ma, J. Feng, Q. Fang, and A. K. H. Tung, “Accurate and fast asymmetric locality-sensitive hashing scheme for maximum inner product search,” in KDD, 2018, pp. 1561–1570.
  31. Q. Huang and A. K. H. Tung, “Lightweight-yet-efficient: Revitalizing ball-tree for point-to-hyperplane nearest neighbor search,” in ICDE, 2023, pp. 436–449.
  32. Q. Huang, Y. Wang, and A. K. H. Tung, “SAH: Shifting-aware asymmetric hashing for reverse k maximum inner product search,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 4, pp. 4312–4321, 2023.
  33. M. Kaminskas and D. Bridge, “Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems,” ACM Trans. Interact. Intell. Syst., vol. 7, no. 1, 2016.
  34. M. Karimi, D. Jannach, and M. Jugovac, “News recommender systems–survey and roads ahead,” Inf. Process. Manag., vol. 54, no. 6, pp. 1203–1227, 2018.
  35. O. Keivani, K. Sinha, and P. Ram, “Improved maximum inner product search with better theoretical guarantee using randomized partition trees,” Mach. Learn., vol. 107, no. 6, pp. 1069–1094, 2018.
  36. N. Koenigstein, P. Ram, and Y. Shavitt, “Efficient retrieval of recommendations in a matrix factorization framework,” in CIKM, 2012, pp. 535–544.
  37. Y. Koren, R. M. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
  38. A. Krause and D. Golovin, “Submodular function maximization,” in Tractability: Practical Approaches to Hard Problems.   Cambridge, UK: Cambridge University Press, 2014, pp. 71–104.
  39. M. Kunaver and T. Požrl, “Diversity in recommender systems–a survey,” Knowl.-Based Syst., vol. 123, pp. 154–162, 2017.
  40. C.-C. Kuo, F. Glover, and K. S. Dhir, “Analyzing and modeling the maximum diversity problem by zero-one programming,” Dec. Sci., vol. 24, no. 6, pp. 1171–1185, 1993.
  41. H. Li, T. N. Chan, M. L. Yiu, and N. Mamoulis, “FEXIPRO: Fast and exact inner product retrieval in recommender systems,” in SIGMOD, 2017, pp. 835–850.
  42. S. Li, “Food.com recipes and interactions,” 2019. [Online]. Available: https://www.kaggle.com/dsv/783630
  43. H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding top-k shortest paths with diversity,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 488–502, 2018.
  44. J. Liu, X. Yan, X. Dai, Z. Li, J. Cheng, and M. Yang, “Understanding and improving proximity graph based maximum inner product search,” in AAAI, 2020, pp. 139–146.
  45. S. S. Lorenzen and N. Pham, “Revisiting wedge sampling for budgeted maximum inner product search,” in ECML-PKDD (I), 2020, pp. 439–455.
  46. S. Morozov and A. Babenko, “Non-metric similarity graphs for maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 4726–4735, 2018.
  47. H. Nakama, D. Amagata, and T. Hara, “Approximate top-k inner product join with a proximity graph,” in IEEE Big Data, 2021, pp. 4468–4471.
  48. B. Neyshabur and N. Srebro, “On symmetric and asymmetric LSHs for inner product search,” in ICML, 2015, pp. 1926–1934.
  49. F. Pedregosa, G. Varoquaux et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
  50. A. Pfadler, H. Zhao, J. Wang, L. Wang, P. Huang, and D. L. Lee, “Billion-scale recommendation with heterogeneous side information at Taobao,” in ICDE, 2020, pp. 1667–1676.
  51. N. Pham, “Simple yet efficient algorithms for maximum inner product search via extreme order statistics,” in KDD, 2021, pp. 1339–1347.
  52. L. Qin, J. X. Yu, and L. Chang, “Diversifying top-k results,” Proc. VLDB Endow., vol. 5, no. 11, pp. 1124–1135, 2012.
  53. P. Ram and A. G. Gray, “Maximum inner-product search using cone trees,” in KDD, 2012, pp. 931–939.
  54. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi, “Heuristic and special case algorithms for dispersion problems,” Oper. Res., vol. 42, no. 2, pp. 299–310, 1994.
  55. S. Raza and C. Ding, “News recommender system: A review of recent progress, challenges, and opportunities,” Artif. Intell. Rev., pp. 1–52, 2022.
  56. S. Rendle, W. Krichene, L. Zhang, and J. Anderson, “Neural collaborative filtering vs. matrix factorization revisited,” in RecSys, 2020, pp. 240–248.
  57. F. Shen, W. Liu, S. Zhang, Y. Yang, and H. T. Shen, “Learning binary codes for maximum inner product search,” in ICCV, 2015, pp. 4148–4156.
  58. A. Shrivastava and P. Li, “Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS),” Adv. Neural Inf. Process. Syst., vol. 27, pp. 2321–2329, 2014.
  59. ——, “Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS),” in UAI, 2015, pp. 812–821.
  60. B. Smith and G. Linden, “Two decades of recommender systems at amazon.com,” IEEE Internet Comput., vol. 21, no. 3, pp. 12–18, 2017.
  61. Y. Song, Y. Gu, R. Zhang, and G. Yu, “ProMIPS: Efficient high-dimensional c-approximate maximum inner product search with a lightweight index,” in ICDE, 2021, pp. 1619–1630.
  62. H. Steck, L. Baltrunas, E. Elahi, D. Liang, Y. Raimond, and J. Basilico, “Deep learning for recommender systems: A Netflix case study,” AI Mag., vol. 42, no. 3, pp. 7–18, 2021.
  63. S. Tan, Z. Xu, W. Zhao, H. Fei, Z. Zhou, and P. Li, “Norm adjusted proximity graph for fast inner product retrieval,” in KDD, 2021, pp. 1552–1560.
  64. S. Tan, Z. Zhou, Z. Xu, and P. Li, “On efficient retrieval of top similarity vectors,” in EMNLP-IJCNLP, 2019, pp. 5235–5245.
  65. C. Teflioudi and R. Gemulla, “Exact and approximate maximum inner product search with LEMP,” ACM Trans. Database Syst., vol. 42, no. 1, pp. 5:1–5:49, 2017.
  66. C. Teflioudi, R. Gemulla, and O. Mykytiuk, “LEMP: Fast retrieval of large entries in a matrix product,” in SIGMOD, 2015, pp. 107–122.
  67. M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. T. Jr., and V. J. Tsotras, “On query result diversification,” in ICDE, 2011, pp. 1163–1174.
  68. L. Xiang, X. Yan, L. Lu, and B. Tang, “GAIPS: Accelerating maximum inner product search with GPU,” in SIGIR, 2021, pp. 1920–1924.
  69. H. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen, “Deep matrix factorization models for recommender systems,” in IJCAI, 2017, pp. 3203–3209.
  70. L. Yan, Z. Qin, R. K. Pasumarthi, X. Wang, and M. Bendersky, “Diversification-aware learning to rank using distributed representation,” in WWW, 2021, pp. 127–136.
  71. X. Yan, J. Li, X. Dai, H. Chen, and J. Cheng, “Norm-ranging LSH for maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 31, pp. 2956–2965, 2018.
  72. M. Yokoyama and T. Hara, “Efficient top-k result diversification for mobile sensor data,” in ICDCS, 2016, pp. 477–486.
  73. H. Yu, C. Hsieh, Q. Lei, and I. S. Dhillon, “A greedy approach for budgeted maximum inner product search,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 5453–5462, 2017.
  74. L. Yuan, L. Qin, X. Lin, L. Chang, and W. Zhang, “Diversified top-k clique search,” VLDB J., vol. 25, no. 2, pp. 171–196, 2016.
  75. J. Zhang, D. Lian, H. Zhang, B. Wang, and E. Chen, “Query-aware quantization for maximum inner product search,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 4, pp. 4875–4883, 2023.
  76. X. Zhao, B. Zheng, X. Yi, X. Luan, C. Xie, X. Zhou, and C. S. Jensen, “FARGO: Fast maximum inner product search via global multi-probing,” Proc. VLDB Endow., vol. 16, no. 5, p. 1100–1112, 2023.
  77. K. Zheng, H. Wang, Z. Qi, J. Li, and H. Gao, “A survey of query result diversification,” Knowl. Inf. Syst., vol. 51, no. 1, pp. 1–36, 2017.
  78. Z. Zhou, S. Tan, Z. Xu, and P. Li, “Möbius transformation for fast inner product search on graph,” Adv. Neural Inf. Process. Syst., vol. 32, pp. 8216–8227, 2019.

Summary

We haven't generated a summary for this paper yet.