Review Comment:
The paper describes a reasoning system which is based on deep learning. As such, it trains a neural network to perform inference on RDF graphs, and it is shown that that the proposed system can learn to mimic standard RDFS reasoners.
In general, the paper is clearly written and well understandable, and it proposes an interesting solution. While the solution is interesting, I miss some details in both the description as well as the experiments.
One of the main weaknesses (which I think the authors can easily fix) is the completeness of the related work section. I am usually not the type of guy who abuses reviews to request citations of his own work, but we have done some work in the recent past centered around using machine learning for approximate reasoning [1,2]. Moreover, I do not agree that "all previous work in the literature about reasoning with noisy Semantic Web data focuses on type inference" - there is also a large body of work dealing with relation/link prediction and/or validation.
There are some places in which I would appreciate more details when describing the experiments. One of those issues concerns the train/validation/test splitting. The split seems to be done based on resources. My question is: given that resource X ends in train, Y ends in test. Given a triple X P Y, does it end in train, test, or both? If it is "both", does that oversimplify the problem?
As for the DBpedia example, I wonder why the authors picked a specific class (i.e., Scientists) instead of a random sample. There are two concerns here: (1) the type and distribution of noise observed in the sample may not be representative for all of DBpedia, and (2) the model might overfit to particular inferences that hold for that sample, but not in general.
As far as the experiments are concerned, I would have liked to see a comparison to the baselines. Figures 6 and 7 stand side by side, but it is not clear where the proposed approach outperforms TransH etc., and where it does not. A more thorough comparison and also a discussion of cases where each of those methods is superior would strengthen the paper.
After definition 2, three cases of non-propagable noise are distinguished, and they are supposed to be mutually exclusive. I do not agree here. Specifically, "when the property of a corrupted triple is corrupted to its super property or sibling property" - there may be different domain/range definitions for those super and sibling properties. Also, there may be cases which are not purely the second or the third case: e.g., the original triple generated 3 triples A,B,and C, where A and B are also generated by others. The new triple generates only C.
In the proof for definition 6, I cannot follow why exactly T' and T should have the same representation. As far as I understand, each property has its own layer in the representation, and if T' has one additional property that T does not have, its layered representation should also have one extra layer, and therefore, it cannot be the same representation.
[1] Heiko Paulheim and Heiner Stuckenschmidt Fast approximate A-box consistency checking using machine learning. In: ESWC 2016.
[2] Christian Meilicke, Daniel Ruffinelli, Andreas Nolle, Heiko Paulheim and Heiner Stuckenschmidt Fast ABox consistency checking using incomplete reasoning and caching. In:RuleML+RR 2017.
|