The Shapley Value in Database Management (2401.06234v1)
Abstract: Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention in the 1950s, the Shapley value has been used for contribution measurement in many fields, from economics to law, with its latest researched applications in modern machine learning. Recent studies investigated the application of the Shapley value to database management. This article gives an overview of recent results on the computational complexity of the Shapley value for measuring the contribution of tuples to query answers and to the extent of inconsistency with respect to integrity constraints. More specifically, the article highlights lower and upper bounds on the complexity of calculating the Shapley value, either exactly or approximately, as well as solutions for realizing the calculation in practice.
- A.Ā Amarilli. Uniform reliability for unbounded homomorphism-closed graph queries. In ICDT, volume 255 of LIPIcs, pages 14:1ā14:17, 2023.
- LearnShapley: Learning to predict rankings of facts contribution based on query logs. In CIKM, pages 4788ā4792, 2022.
- The tractability of SHAP-score-based explanations for classification over deterministic and decomposable boolean circuits. In AAAI, pages 6670ā6678, 2021.
- On the complexity of SHAP-score-based explanations: Tractability via knowledge compilation and non-approximability results. Journal of Machine Learning Research, 24(63):1ā58, 2023.
- Consistent query answers in inconsistent databases. In PODS, pages 68ā79. ACM Press, 1999.
- Databases with uncertainty and lineage. VLDB J., 17(2):243ā264, 2008.
- L.Ā Bertossi. Database repairs and consistent query answering: Origins and further developments. In D.Ā Suciu, S.Ā Skritek, and C.Ā Koch, editors, PODS, pages 48ā58. ACM, 2019.
- L.Ā Bertossi. Repair-based degrees of database inconsistency. In LPNMR, volume 11481 of LNCS, pages 195ā209. Springer, 2019.
- L.Ā Bertossi. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowl. Inf. Syst., 63(1):199ā231, 2021.
- L.Ā Bertossi. Attribution-scores and causal counterfactuals as explanations in artificial intelligence. In Bertossi, L., Xiao, G. (eds.) Reasoning Web. Causality, Explanations and Declarative Knowledge. Springer LNCS 13759, pages 1ā23, 2023.
- Causality-based explanation of classification outcomes. In DEEM@SIGMOD, pages 6:1ā6:10. ACM, 2020.
- L.Ā Bertossi and B.Ā Salimi. Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints. Int. J. Approx. Reason., 90:226ā252, 2017.
- L.Ā Bertossi and B.Ā Salimi. From causes for database queries to repairs and model-based diagnosis and back. Theory Comput. Syst., 61(1):191ā232, 2017.
- P.Ā Buneman and W.Ā Tan. Data provenance: What next? SIGMOD Rec., 47(3):5ā16, 2018.
- N.Ā Burkart and M.Ā F. Huber. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res., 70:245ā317, 2021.
- Counting database repairs entailing a query: The case of functional dependencies. In PODS, pages 403ā412. ACM, 2022.
- H.Ā Chockler and J.Ā Y. Halpern. Responsibility and blame: A structural-model approach. J. Artif. Intell. Res., 22:93ā115, 2004.
- Towards consistency-based reliability assessment. In AAMAS, pages 1643ā1644. ACM, 2015.
- N.Ā Dalvi and D.Ā Suciu. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), 59(6):1ā87, 2013.
- Probabilistic databases: Diamonds in the dirt. Commun. ACM, 52(7):86ā94, 2009.
- A.Ā Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328ā332. Citeseer, 2004.
- ShapGraph: An holistic view of explanations through provenance graphs and Shapley values. In SIGMOD Conference, pages 2373ā2376. ACM, 2022.
- Explanations for data repair through Shapley values. In CIKM, pages 362ā371. ACM, 2021.
- Computing the Shapley value of facts in query answering. In SIGMOD, pages 1570ā1583, 2022.
- Credit distribution in relational scientific databases. Information Systems, 109:102060, 2022.
- Property testing and its connection to learning and approximation. J. ACM, 45(4):653ā750, 1998.
- J.Ā Grant and A.Ā Hunter. Measuring inconsistency in knowledgebases. J. Intell. Inf. Syst., 27(2):159ā184, 2006.
- J.Ā Grant and A.Ā Hunter. Measuring consistency gain and information loss in stepwise inconsistency resolution. In ECSQARU, volume 6717 of LNCS, pages 362ā373. Springer, 2011.
- J.Ā Grant and A.Ā Hunter. Distance-based measures of inconsistency. In ECSQARU, volume 7958 of LNCS, pages 230ā241. Springer, 2013.
- J.Ā Grant and A.Ā Hunter. Analysing inconsistent information using distance-based measures. Int. J. Approx. Reasoning, 89:3ā26, 2017.
- T.Ā J. Green and V.Ā Tannen. The semiring framework for database provenance. In E.Ā Sallinger, J.Ā V. den Bussche, and F.Ā Geerts, editors, PODS, pages 93ā99. ACM, 2017.
- A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1ā93:42, 2019.
- J.Ā Y. Halpern. Actual Causality. MIT Press, 2016.
- J.Ā Y. Halpern and J.Ā Pearl. Causes and explanations: A structural-model approach. part i: Causes. British Journal for the Philosophy of Science, 56(4):843ā887, 2005.
- J.Ā Y. Halpern and J.Ā Pearl. Causes and explanations: A structural-model approach. part ii: Explanations. British Journal for the Philosophy of Science, 56(4):889ā911, 2005.
- A.Ā Hunter and S.Ā Konieczny. Shapley inconsistency values. In KR, pages 249ā259. AAAI Press, 2006.
- A.Ā Hunter and S.Ā Konieczny. Measuring inconsistency through minimal inconsistent sets. In KR, pages 358ā366. AAAI Press, 2008.
- A.Ā Hunter and S.Ā Konieczny. On the measure of conflicts: Shapley inconsistency values. Artif. Intell., 174(14):1007ā1026, 2010.
- Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015.
- M.Ā Khalil and B.Ā Kimelfeld. The complexity of the Shapley value for regular path queries. arXiv preprint arXiv:2212.07720, 2022.
- Maximizing conjunctive views in deletion propagation. In PODS, pages 187ā198. ACM, 2011.
- Quantifying information and contradiction in propositional logic through test actions. In IJCAI, pages 106ā111. Morgan Kaufmann, 2003.
- The Shapley value of tuples in query answering. Log. Methods Comput. Sci., 17(3), 2021.
- E.Ā Livshits and B.Ā Kimelfeld. Counting and enumerating (preferred) database repairs. In PODS, pages 289ā301. ACM, 2017.
- E.Ā Livshits and B.Ā Kimelfeld. The Shapley value of inconsistency measures for functional dependencies. Log. Methods Comput. Sci., 18(2), 2022.
- Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4: 1ā4: 46, 2020.
- Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154ā164, 2021.
- Properties of inconsistency measures for databases. In SIGMOD, pages 1182ā1194. ACM, 2021.
- From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2(1):56ā67, 2020.
- S.Ā M. Lundberg and S.Ā Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765ā4774, 2017.
- Internet economics: The use of Shapley value for ISP settlement. IEEE/ACM Trans. Netw., 18(3):775ā787, 2010.
- The complexity of causality and responsibility for query answers and non-answers. Proc. VLDB Endow., 4(1):34ā45, 2010.
- Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 11 2021.
- C.Ā Molnar. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/, 2019.
- M.Ā Monet. Solving a special case of the intensional vs extensional conjecture in probabilistic databases. In Proceedings of PODS, pages 149ā163, 2020.
- The class of microarray games and the relevance index for genes. Top, 15(2):256ā280, 2007.
- R.Ā Narayanam and Y.Ā Narahari. A Shapley value-based approach to discover influential nodes in social networks. IEEE Trans Autom. Sci. Eng., 8(1):130ā147, 2011.
- J.Ā Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009.
- The impact of negation on the complexity of the Shapley value in conjunctive queries. In PODS, pages 285ā297. ACM, 2020.
- A.Ā E. Roth, editor. The Shapley value : essays in honor of Lloyd S. Shapley. Cambridge University Press, 1988.
- Quantifying causal effects on query answering in databases. In TaPP. USENIX Association, 2016.
- ProvSQL: Provenance and probability management in PostgreSQL. Proc. VLDB Endow., 11(12):2034ā2037, 2018.
- L.Ā S. Shapley. A value for n-person games. In H.Ā W. Kuhn and A.Ā W. Tucker, editors, Contributions to the Theory of Games II, pages 307ā317. Princeton University Press, Princeton, 1953.
- P.Ā Struss. Model-based problem solving. In Handbook of Knowledge Representation, 2008.
- Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
- M.Ā Thimm. On the compliance of rationality postulates for inconsistency measures: A more or less complete picture. KI, 31(1):31ā39, 2017.
- A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min., 8(1):3:1ā3:12, 2018.
- On the tractability of SHAP explanations. J. Artif. Intell. Res., 74:851ā886, 2022.
- M.Ā Y. Vardi. The complexity of relational query languages. In STOC, pages 137ā146. ACM, 1982.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.