Explainable Representations for Relation Prediction in Knowledge Graphs (2306.12687v1)
Abstract: Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.
- Towards a definition of knowledge graphs. SEMANTiCS (Posters, Demos, SuCCESS), 48(1-4):2, 2016.
- Knowledge graphs. ACM Computing Surveys (CSUR), 54(4):1–37, 2021.
- Handbook on ontologies. Springer-Verlag, 2010.
- Constructing knowledge graphs and their biomedical applications. Computational and structural biotechnology journal, 18:1414–1428, 2020.
- Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, 2017.
- Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction–two sides of the same coin? Semantic Web, 13(Preprint):1–24, 2022.
- Knowledge graph embeddings and explainable AI. Knowledge Graphs for Explainable Artificial Intelligence: Foundations, Applications and Challenges, 47:49, 2020.
- Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion, 58:82–115, 2020. ISSN 1566-2535.
- Protein–protein interaction inference based on semantic similarity of gene ontology terms. Journal of Theoretical Biology, 401:30–37, 2016. ISSN 0022-5193.
- PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Research, 39(18):e119, 2011. ISSN 1362-4962.
- Identifying disease genes using machine learning and gene functional similarities, assessed through gene ontology. PLOS ONE, 13(12):1–15, 12 2018.
- Semantic similarity and machine learning with ontologies. Briefings in Bioinformatics, 22(4):bbaa199, 2021.
- Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinformatics, 20(1):308, 2019. ISSN 1471-2105.
- TransformerGO: predicting protein–protein interactions by modelling the attention between sets of gene ontology terms. Bioinformatics, 02 2022. ISSN 1367-4803.
- Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics, 33(17):2723–2730, 2017.
- Local explanations via necessity and sufficiency: Unifying theory and practice. In Uncertainty in Artificial Intelligence, pages 1382–1392. PMLR, 2021.
- Explaining link prediction systems based on knowledge graph embeddings. In Proceedings of the 2022 International Conference on Management of Data, pages 2062–2075, 2022a.
- A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42, 2018.
- Network embedding method based on semantic information. In 2021 3rd International Conference on Advanced Information Science and System (AISS 2021), pages 1–6, 2021.
- A survey on computational models for predicting protein–protein interactions. Briefings in Bioinformatics, 22(5):bbab036, 2021.
- EL embeddings: geometric construction of models for the description logic EL++. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.
- OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics, 35(12):2133–2140, 2019.
- Faithful Embeddings for EL++ Knowledge Bases. In International Semantic Web Conference, pages 22–38. Springer, 2022.
- Settling the score: variant prioritization and mendelian disease. Nature Reviews Genetics, 18(10):599–612, 2017.
- Predicting gene-disease associations with knowledge graph embeddings over multiple ontologies. In ISMB Annual Meeting - Bio-Ontologies, 2021.
- OWL 2: The next step for OWL. Journal of Web Semantics, 6(4):309–322, 2008.
- evoKGsim+: a framework for tailoring knowledge graph-based similarity for supervised learning. In ESWC 2021 Poster and Demo Track, 2021.
- Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings. BMC bioinformatics, 20(1):1–14, 2019.
- Evolving knowledge graph similarity for supervised learning in complex biomedical domains. BMC Bioinformatics, 21(1):6, January 2020. ISSN 1471-2105.
- Biological applications of knowledge graph embedding models. Briefings in Bioinformatics, 22(2):1679–1693, 2021.
- Translating embeddings for modeling multi-relational data. In Proceedings of NIPS 2013, page 2787–2795, Red Hook, NY, USA, 2013. Curran Associates Inc.
- Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 1112–1119. AAAI Press, 2014.
- Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2015.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- RDF2Vec: RDF graph embeddings for data mining. In International Semantic Web Conference, pages 498–514. Springer, 2016.
- OWL2Vec*: Embedding of OWL ontologies. Machine Learning, pages 1–33, 2021.
- Machine learning for science: state of the art and future prospects. science, 293(5537):2051–2055, 2001.
- Explainable machine learning for scientific insights and discoveries. Ieee Access, 8:42200–42216, 2020.
- What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923, 2017.
- Foundations of explainable knowledge-enabled systems. In Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges, pages 23–48. IOS Press, 2020.
- Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pages 80–89. IEEE, 2018.
- Explainable artificial intelligence by genetic programming: A survey. IEEE Transactions on Evolutionary Computation, pages 1–1, 2022. doi:10.1109/TEVC.2022.3225509.
- Investigating robustness and interpretability of link prediction via adversarial modifications. NAACL-HLT, 2019.
- Kelpie: an explainability framework for embedding-based link prediction models. Proceedings of the VLDB Endowment, 15(12):3566–3569, 2022b.
- Adversarial explanations for knowledge graph embeddings. In Lud De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 2820–2826. International Joint Conferences on Artificial Intelligence Organization, 7 2022. Main Track.
- Excut: Explainable embedding-based clustering over knowledge graphs. In The Semantic Web–ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part I, pages 218–237. Springer, 2020.
- Disjunctive shared information between ontology concepts: application to gene ontology. Journal of biomedical semantics, 2:1–16, 2011.
- The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Research, 49(D1):D605–D612, 11 2020. ISSN 0305-1048.
- GO Consortium. The gene ontology resource: enriching a GOld mine. Nucleic Acids Research, 49(D1):D325–D334, 2021.
- The GOA database: gene ontology annotation updates for 2015. Nucleic Acids Research, 43(D1):D1057–D1063, 2014.
- The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research, 48(D1):D845–D855, 11 2019. ISSN 0305-1048. doi:10.1093/nar/gkz1021. URL https://doi.org/10.1093/nar/gkz1021.
- The Human Phenotype Ontology in 2021. Nucleic Acids Research, 49(D1):D1207–D1217, 12 2020. ISSN 0305-1048. doi:10.1093/nar/gkaa1043. URL https://doi.org/10.1093/nar/gkaa1043.
- Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 785–794, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450342322.
- Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008.
- George A Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2):81, 1956.
- Phosphorylation of the integrin alpha-4 cytoplasmic domain regulates paxillin binding. Journal of Biological Chemistry, 276(44):40903–40909, 2001. ISSN 0021-9258. doi:https://doi.org/10.1074/jbc.M102665200. URL https://www.sciencedirect.com/science/article/pii/S002192582077938X.
- Pulmonary alveolar proteinosis. New England Journal of Medicine, 349(26):2527–2539, 2003.
- Dlg5 interacts with the TGF-beta receptor and promotes its degradation. FEBS Letters, 587(11):1624–1629, 2013. ISSN 0014-5793.