INK: Knowledge graph representation for efficient and performant rule mining

Tracking #: 3147-4361

Authors: 
Bram Steenwinckel
Filip De Turck
Femke Ongenae

Responsible editor: 
Guest Editors NeSy 2022

Submission type: 
Full Paper
Abstract: 
Semantic rule mining can be used for both deriving task-agnostic or task-specific information within a Knowledge Graph (KG). Underlying logical inferences to summarise the KG or fully interpretable binary classifiers predicting future events are common results of such a rule mining process. The current methods to perform task-agnostic or task-specific semantic rule mining operate, however, a completely different KG representation, making them not suitable to perform both tasks or incorporate each other's optimizations. This also results in the need to master multiple techniques for both exploring and mining rules within KGs, as well losing time and resources when converting one KG format into another. In this paper, we present INK, a KG representation based on neighbourhood nodes of interest for improved decision support. By selecting one or two sets of nodes of interest, the designed rule miner on top of this INK representation will either mine task-agnostic or task-specific rules. In both subfields, the INK miner outperforms the currently state-of-the-art semantic rule miners on 14 different benchmark datasets within multiple domains.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Dec/2022
Suggestion:
Minor Revision
Review Comment:

The paper was about a new representation of knowledge graphs (KG) to encode task-specific and task-agnostic aspects for semantic rule-mining applications. The authors proposed encoding the neighborhood of nodes of interest in a binary matrix will improve the accuracy of Bayesian models. Furthermore, they showed that the INK method could outperform two of the most popular rule-mining methods. The review can be divided into Four major groups:

Writing:
The paper is well-written, and the most important concepts have been described in detail. Besides minor mistakes (e.g., “were are” on page 8, column 2, line 15.), there were no significant errors in the writing. Moreover, the organization of the paper seems reasonable and easy to follow.
In general, some parts of the paper can be combined (lots of details are provided, or a fact has been mentioned more than once), making the paper shorter and better to understand.

Proposed Method:
The idea of using neighborhood data for analyzing a node in a graph is well-established and studied. The proposed binary encoding for nodes of interest (T) in a KG (G(V, E)) finds all the possible paths with depth D that started with T. Then, by marking each path with its destination, they create a feature set for a node. By doing this, T will determine the final task. For task-specific applications, T is limited, but for task-agnostic applications, T should be equal to V. This embedding process, as mentioned in the paper firstly, will need a lot of memory (RAM) to keep all the possible paths and also needs lots of iterative computations before starting a task. As a result originality and novelty of the proposed method could be more substantial and can be improved and primarily lies in transferring a general graph analysis method to knowledge graphs.

Evaluation:
The idea has been comprehensively evaluated against different methods and datasets. As a result, the evaluation part strongly supports that the proposed method can embed more info about the relations in different KGs, leading to better results. Still, there must be an argument about how much time is needed for the method to embed the graph before each task and for different datasets. In other words, there should be a discussion about the overhead of the embedding method from both time and memory perspectives concerning the volume of various datasets.
In addition, the discussion of table 4 (on page 11, column 2, line 41 to page 12, column 2, line 51) needs to be clarified as the reader may need to learn about the different features in these datasets.

Code:
The GitHub repository seems well-maintained and complete.

Best Regards

Review #2
Anonymous submitted on 23/Dec/2022
Suggestion:
Major Revision
Review Comment:

Approach: The approach proposed in INK looks more like relational pattern mining. I would like to hear from the authors whether other frequent pattern mining techniques could also be applied in INK. An example could be the paper titled ‘hoPS: Probabilistic Subtree Mining for Small and Large Graphs’ or other works from the same group or others. In comparison to other rule mining tools, the contribution on INK is a bit more on the engineering side by constructing those binary representations that obviously make things a lot more efficient. (The language bias is a different one.) DL-Learner has a more expressive target language and can be coupled with different reasoners, so it's clear that it can take more time. These need to be discussed when reporting the results or the evaluation set up.

"INK does not have such a timing constraint, but is constrained in rule mining’s search space by limiting the neighbourhood’s depth. As shown in the performed experiments, high quality rules can already be found when limiting the neighbourhood depth to three." It hiddenly means that you are using a depth limit (which is 3 in their definition which depends a bit on how you define the hops). The binary representation is heavily optimised for max. 2 hop patterns whereas DL-Learner has no limit in how deep intersections can be. Of course, if you go for efficiency which you probably need for mining rules in an entire KG, it can make sense to compress it down this way. Basically, instead of needing a 3-dimensional tensor for a KG, they then just need a 2-dimensional structure. Please clarify these.

DL-Learner operates mostly on the schema level (it has some support for instance level by using owl:hasValue) whereas INK is also supporting the instance level. For example "BORN IN§England" on Page 7. INK has a way to materialise those out and e.g. get BORN IN.PART OF§UK . All of those neighbourhoods are pre-built in their approach whereas DL-Learner needs to find some at runtime by e.g. first search bornIn.TOP (= the born property exists) and then refining to bornIn.partOf.TOP and from there potentially to bornIn.partOf hasValue "UK" (if "UK" is instance). So the approaches are quite different. Such a comparison can only be accepted when the authors acknowledge the big differences between their approach and DL-Learner and AIME. For example, for any deep patterns DL-Learner is probably more suitable.

Evaluation:
The comparison only to AIME and DL-Learner is very limited. There are several other rule mining tools that would have been expected to be considered such as: full paper version of ‘Rule Learning over Knowledge Graphs with Genetic Logic Programming’ at ICDE
https://ieeexplore.ieee.org/abstract/document/9835403/
Evoda rule miner has better results on the same datasets than INK.

Not all of the metrics considered in the evaluations are suitable. For example, the runtime comparison is somewhat irrelevant as the CELOE algorithm used by the authors is "anytime" i.e. you can stop anytime - that's why the runtime are all similar there. In addition, DL-Learner is also not optimised for this MCC score, which means the results there are worse than for F1. These should be openly discussed in the paper. F score is usually already balancing out between a somewhat uneven distribution of positive and negative examples.

So, I am not sure whether using the MCC score makes sense in terms of fairness and suitability here. I have not seen other related works using it. If the author consider it important, they can always change DL-Learner to optimise for that. For ‘nctrer’, DL-Learner has 0.59 accuracy and 0.73 F-score but 0.01 MCC. How would the authors explain this?There are other cases that DL-Learner is outperforming in accuracy. Maybe it found some solutions which are outside of their target language. What is the opinion or observation of the authors on that?
The results of AMIE on Yago2 are very questionable, what are the example rules that AMIE could not extract but INK and why?

This statement is only partially true: "The INK miner holds both a predictive and time advantage compared to DL-Learning in the context of task-specific rule mining, given a large enough positive and negative set of instances. DL-Learner typically traverses the search space until the predefined amount of time is reached. This is the first advantage of INK over DL-learner, as it doesn’t need to tune this time parameter." The DL-Learner approaches are "any time" which means you always have a result, which one could sell as an advantage over INK.

Conclusion:
The original publication of INK with the title of ‘INK: knowledge graph embeddings for node classification’ seems to be successful in providing vector representation of an underlying knowledge graph for downstream tasks of node classification. The rule mining story built on top of INK is plausible but not novel anymore. Major part of the proposed solution as well as the structure and example rules, SPARQL query overlap with the original paper except the evaluation that is of course compared with some of the known rule mining tools but now alls. However, to be convinced to consider it for SWJ journal, the mentioned points need to be clearly addressed.

Review #3
Anonymous submitted on 05/Jan/2023
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The paper proposes INK, a solution for mining both task-specific and task-agnostic rules from KGs. The main contribution consists in proposing an internal representation, based on neighbourhood nodes, which ultimately come up to a binary matrix. After that, task agnostic rules are obtained via symbol manipulation whilst task-specific rules are obtained by combining the INK representation with an existing Bayesian Rule miner. A comparative experimental evaluation is provided.

The paper targets an important problem and proposes a solution having moderate novelty but showing interesting results particularly in terms of efficiency. The paper lacks a bit of clarity. This particularly applies to sections 3 and 4, illustrating the core of the work. A careful proof reading is also suggested. The paper is rather weakly motivated and some of the motivations that are provided do not result fully precise and appropriate. Also the analysis of the related works requires improvements (detailed comments are reported below). The further discussion illustrated in section 7 has been overall appreciated.

The source code for the proposed solution is publicly available on GitHub (this is highly appreciated) and the content appears clearly organized and accompanied by readme files.

In the following detailed comments are reported.

Reference [1] has been used to address the issue of providing a definition for a KG. Some recent publications targeted this aspect (among the others), see for instance Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, Antoine Zimmermann: Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, pp. 1-257

On page 2, lines 12 - 14 the authors assert "ARM can be applied to semantic data or KGs by converting the internal representation to a set of transactions." This is a quite strong assertion. Indeed, for the best of my knowledge there are cases (listed juts below) where this statement does not apply (as also reported by the authors in section 2.1). I'd suggest to rephrase it considering that this would be a possible approach and not the only one.

- Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian M. Suchanek: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6): 707-730 (2015)

- Claudia d'Amato, Steffen Staab, Andrea G. B. Tettamanzi, Duc Minh Tran, Fabien Gandon: Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases. SAC 2016: 333-338

Additionally, the approaches reported above are significantly inspired by the ILP literature but they do not solve task-specific problems even if they can be applied also for that. As such, also the consideration concerning the second category of approaches introduced in section 1 results not very precise.

The presentation of the rules provided in Section 2.1 is not fully precise. The authors should specify that the atoms of the body are binary predicates, as for the case of the head. With the provided representation it seems that atoms of the body can be of different types.

Section 2.3 should in principle discuss related works concerning rule mining (from KG) however, this subsection actually refer to methods for performing concept learning which is not exactly rule mining. Furthermore, when talking about concept learning a more extensive literature should be taken into account. Some relevant related works are reported in the following:

- Nicola Fanizzi, Claudia d'Amato, Floriana Esposito: DL-FOIL Concept Learning in Description Logics. ILP 2008: 107-121

- Giuseppe Rizzo, Nicola Fanizzi, Claudia d'Amato: Class expression induction as concept space exploration: From DL-Foil to DL-Focl. Future Gener. Comput. Syst. 108: 256-272 (2020)

- Franco Alberto Cardillo, Umberto Straccia: Fuzzy OWL-Boost: Learning fuzzy concept inclusions via real-valued boosting. Fuzzy Sets Syst. 438: 164-186 (2022)

- Francesca A. Lisi, Umberto Straccia: A Logic-based Computational Method for the Automated Induction of Fuzzy Ontology Axioms. Fundam. Informaticae 124(4): 503-519 (2013)

- Stefan Heindorf, Lukas Blübaum, Nick Düsterhus, Till Werner, Varun Nandkumar Golani, Caglar Demir, Axel-Cyrille Ngonga Ngomo:
EvoLearner: Learning Description Logics with Evolutionary Algorithms. WWW 2022: 818-828

Details concerning implementation and optimization of related works (described in different subsections of section 2) could be omitted whilst section 2.4, which is actually relevant for the purpose of the paper could be extended a bit more.

The rational adopted for the representation presented in section 3 (and related subsections) reminds to the one proposed in Petar Ristoski, Heiko Paulheim: RDF2Vec: RDF Graph Embeddings for Data Mining. ISWC (1) 2016: 498-514. The authors should consider this aspect and maybe briefly introduce the difference with the approach (and goal) presented in this paper. This comment is applicable also to text reported at the end of section 1, where the authors present the adopted representation as one of the main novelty with respect to the state of the art.

The key parts of the paper (e.g. section 3.1) require improvements in terms of clarity. Some suggestions are reported below (and in the section MINOR).

- End of section 3.1: "tuple" should be specified.

- Additional details should be provided when presenting the example at the end of section 3.2

- In section 3.3.2, the authors refer to a notion of similar relations. However, it seems that here exactly the same relationship is considered. If so, the notion of similarity is somehow misleading.

In section 4, it would be appropriate to report formal and specific algorithms for extracting rules. Additionally, in this section the authors argue that INK can mine not closed rules and that this leads to more and more advanced rules. This argumentation results a bit vague. A more precise explanation on the reason why this could be an improvement with respect to the state of the art and what is its actual value added should be provided. Furthermore, in the sections opening the papers the authors argue on the lack of negative examples in KGs due to their incompleteness and the open world assumption, which motivates, e.g. for AMIE, different ways to compute metrics such as confidence and support. However, in section 4.2 it seems that the closed world assumption is adopted. The authors should discuss and motivate this difference with respect to the arguments reported at the initial sections of the paper. A limited discussion is reported only almost at the end of the paper, in section 7. This aspect is quite important and needs a more extensive discussion and motivations.

As for the experimental evaluation, some more detailed discussion and information should be provided in section 6.1. Some suggestions are reported in the following: a) the meaning of the numbers within the parenthesis in Table 2 should be provided. b) The authors should provide a rough idea of the values of N. c) The number of rules that are discovered by INK is definitely greater than AIME. This data needs to be accompanied by additional information, e.g. it would be useful to know if a further analysis has been performed in order to understand if there are noisy / wrong rules that are generated by INK and or AMIE and if so what is the difference between the two systems. One of the main reasons for the greater number of rules of INK wrt to AMIE should be due to INK returning also not closed rules. As already highlighted above, the authors should clarify if this actually brings a value added or, at the contrary, if this basically inject noise. Some considerations at this regard are reported in section 7.1. I'd recommend to add at least a short text in section 6.1 announcing that further analysis is discussed in section 7.1 (from my personal taste, I'd report the further discussion immediately after the comparative experimental evaluation (that is in section 6.1 and 6.2) but this is subjective).

The comparison between AIME and INK should be performed also in terms of efficiency. Indeed AMIE is well known also for its efficiency. This aspect has not been considered for the experimental comparison with AIME whilst it has been showed when comparing to DL-Learner which is well know that it is less efficient since it basically adopts a purely symbol-based approach and makes of usage of the reasoner.

As regard section 6.2, the provided qualitative analysis is appreciated. Nevertheless, the authors should provide information concerning the criterion that has been adopted for comparing the rules extracted by the two systems, e.g. for each dataset the rule having the highest accuracy is compared or for each dataset a rule (randomly picked) is compared between the two systems or for each dataset the rules that are compared are those referring to the "Prediction of" column reported in Table 3. The Suramin dataset resulted particularly hard for INK. A possible explanation for that should be provided. Some comments at this regard are reported on page 23 (2nd column) but it is definitely too far away from the comparative analysis showed in section 6.2. A similar situation happens for lymphography dataset, even if, in this case, the values are more close each other for the two systems. Table 3 is displayed but no reference to this table has been reported in the text. It should be introduced and briefly discussed at the beginning of section 6.2.

In section 7, the authors argued on the ability of DL-Learner to exploit the background knowledge and reasoning capability. This aspect (targeted for instance in Claudia d'Amato, Steffen Staab, Andrea G. B. Tettamanzi, Duc Minh Tran, Fabien Gandon: Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases. SAC 2016: 333-338) has not been considered at all by the authors when proposing their solution. Some considerations at this regards should be reported when introducing the solution, at least the reason why this is not taken into account should be provided.

MINOR:

- Beginning of section 2.1.1: "(with r(x, y) a the head atom as explained above)" --> "(with r(x, y) the head atom as explained above"

- page 6, 2nd column, row 30 - 35: this sentence should be rephrased. Additionally, the purpose / meaning of this part does not result fully clear, as well as the example reported immediately below the text.

- page 8, 2nd column, row 15, 16: "While the relation§object columns were are by default extracted by INK" --> this sentence should be rephrased

- page 9, 1st column, row 39 - 41: "For the Bayesian Rule Set model, this results in a a smaller number of rules." --> "For the Bayesian Rule Set model, this results in a smaller number of rules."

- page 10, 1st column, row 7 - 9: "If target values are specified together with the nodes of interest, the INK miner assumes a task-specific mining operation can be executed" --> this sentence should be rephrased

- page 11, 1st column, row 12 - 16: "By incorporating INK within this framework, a fair comparison between INK and DL-Learner, which has already been used in many evaluations within the SMLBench framework." --> this sentence should be rephrased

- page 11, 1st column, row 33 - 35: "DL-learner version 1.5 was used with the by SML-Bench default parameters for each learning task:" --> "DL-learner version 1.5 was used with the SML-Bench default parameters for each learning task:"