keyboard_arrow_up
From Ontologies to Repository Intelligence: A Review of Knowledge Graphs for Mining Software Repositories

Authors

Manuel Stoger, Mario Bernhart and Thomas Grechenig, Research Group for Industrial Software (INSO), Austria

Abstract

Software repositories constitute rich, heterogeneous data sources whose systematic exploitation is essen-tial for understanding software evolution and assessing software quality. Knowledge graphs (KGs) and related graph-based representations have emerged as a promising paradigm for structuring, querying, and reasoning over repository data. In order to characterize how this paradigm has been adopted, this review examines 56 primary studies (2006–2025) and addresses the following research question: “How have knowledge graphs been applied to mining, analyzing, and visualizing software repositories?". Following the Design Science Methodology of Wieringa, the proposed treatments are classified, their validation strategies are assessed, and the results are synthesized into five application clusters: (1) ontology-based repository modeling, (2) code knowledge graph construction and querying, (3) developer and collaboration networks,(4) defect, maintenance, and traceability, and (5) software evolution and dependency analysis. The findings reveal a clear trajectory from early ontology-based approaches (2006-2012), through code knowledge graphs and deep-learning-augmented representations (2013-2019), to LLM-integrated repository graphs (2020-2025). Open challenges include scalability, the lack of standardized cross-tool ontologies, and the maturation of hybrid neuro-symbolic architectures for production software engineering tools.

Keywords

KnowledgeGraph, MiningSoftwareRepositories, StructuredLiteratureAnalysis, Ontology, Soft

Full Text  Volume 16, Number 11