Beyond efficiency: A systematic classification of RDFS-based Semantic Web reasoners and applications

Tracking #: 2483-3697

Authors: 
Simona Colucci
Francesco Maria Donini
Eugenio Di Sciascio

Responsible editor: 
Guilin Qi

Submission type: 
Survey Article
Abstract: 
In this paper, we present a systematic classification of 48 RDFS-based Semantic Web reasoners and applications, with the aim of evaluating their deductive capabilities. In fact, not all such applications show the same reasoning behavior w.r.t. the RDF data they use as information source and the ability of reasoning is not a binary quality: it can, e.g., consider or not blank nodes denotation, include different subsets of RFDS rules, provide or not explanation facilities. For classification purpose, we propose a maturity model made up by three orthogonal dimensions for the evaluation of reasoners and applications: blank nodes, deductive capabilities, and explanation of the results. For each dimension, we set up a progression from absence to full compliance. Each RDFS-based Semantic Web reasoner/application is then classified in each dimension, based on both its documentation and published articles. We did not consider efficiency from our evaluation on purpose, since efficiency could be compared only for systems providing an equal service in every of the above dimensions. Our classification can be used by implementers of RDFS-based Semantic Web applications, for choosing a suitable reasoning engine, or to decide at what level an in-sourced reasoning service could be implemented and documented.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Sep/2020
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

Strong points:
1. This paper proposes a maturity model to evaluate reasoning engines. The maturity considers three orthogonal dimensions, e.g., consider or not blank nodes denotation, include different subsets of RDFS rules, provide or not explanation facilities.
2. This paper detailed illustrate these three dimensions and classify 48 RDFS-based SW reasoning engines on the maturity model. It makes sense to sort and classify reasoning machines.
3. This paper also provides some guidelines for extending their maturity model to other RDFS-based SW applications. This makes their model extensible.

Weak Points:
1. Language. It would help if you double-checked your grammar, commas, whitespace, etc.
At abstract, "made up by" should be "made up of" and "e.g.," misses a comma. "e.g.," should be ", e.g.,".
At introduction, the first sentence misses a comma after the introductory phrase "In this paper". "this sources" should be "these sources". It would be best if you considered adding a space before "Server by Intellimension". "we contribute to build" should be "we contribute to building".

2. This paper should contain an introduction to the basic concepts, not "We assume the reader familiar to RDF/RDFS syntax and semantics".

3. Even though you've done some research, I don't think it's enough work, nor is it innovative enough.

From the perspective of workload:
When evaluating the deductive capability of the system, you should select some data and queries actually to run the test system and then measure the deductive capacity of the system by comparing the results of the queries.

"In 12 cases, such an ability is not even mentioned, and we cannot suppose it, so these applications are conservatively classified as discarding triples with blank nodes." Although not mentioned, can it be verified experimentally?

From the perspective of innovation:
In addition to the three dimensions mentioned in this article, you can also consider measuring the memory consumption of the system because the memory consumption is one of the critical factors for us to choose the reasoning engine.

Review #2
Anonymous submitted on 21/Sep/2020
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

ORIGINALITY:
The authors present a classification system for RDFS-based reasoners and applications. The idea of the system comes from the software engineering maturity models. According to the maturity model in this paper, RDFS-based reasoners and applications are divided into three dimensions: processing of empty nodes, deductive capabilities and explanation of reasoning.Each dimension is divided into different levels. The idea of applying maturity model to RDFS-based reasoners and applications classification is relatively novel and interesting.
SIGNIFICANCE:
In this paper, 48 applications are selected from the list of Semantic Web Journal and W3C. The authors have devoted considerable efforts to verify whether the selected reasoners and applications conform to the various level of the proposed maturity model.This paper establishes a standard for evaluating the inference engine and its application, which is convenient for the user to select the reasoning tools and helps the developer to understand the current situation of the application.
From the charts provided in the paper, the author has done a lot of experiments. In the experiment, whether the data sets selected by different reasoners and applications are different. Please describe the data set used in the experiment.

Review #3
Anonymous submitted on 11/Oct/2020
Suggestion:
Minor Revision
Review Comment:

Intelligent and automatic information processing requires that data and knowledge are modeled in a structured and semantic related way. Thus more and more knowledge and data are constructed and published based on Semantic Web technologies, especially RDF and RDFS. This has forced the development of many tools and systems to consume RDF(S) data sources and support Semantic Web technology-based applications. However, for end-users, too many choices usually mean no choice. This makes comparisons of the existing tools and systems much more important. Thus the authors of this work have made a thorough comparison of 48 RDFS-based Semantic Web reasoners and applications from the dimensions of considering or not blank nodes denotation, the sets of RDFS rules supported as well as the ability to explain results. These three dimensions compose the novelty of this work. The reviewer has not seen other works comparing or classifying the existing systems and tools from the perspectives provided by this work.

The paper is in general well written. It is also very easy to understand partially because it does not contain theoretical results as well as technique details. However, I have the following main questions of this paper.

----------------
1. In the Abstract, the authors concluded that their classification can be used by implementers of RDFS-based Semantic Web applications, for choosing a suitable reasoning engine, or to decide at what level an in-sourced reasoning service could be implemented and documented.

However, I think a more accurate conclusion is that their classification can be used for the applications where the three dimensions they have considered are crucial to choosing a suitable reasoning engine. Since the classification they have provided can not be referred by the applications where efficiency and scalability are much more important.

----------------
2. For the systems and tools compared, the authors decided to include 32 applications presented in the Semantic Web Journal as well as 16 RDFS-based reasoners officially listed by W3C as of April 2020.

Just considering the systems and tools presented in the Semantic Web Journal sounds strange. I think that other readers may have the same feeling.

Two points make the success of the work that compares the existing systems and tools. One is the dimensions or aspects considered. And the other is the systems and tools considered.

I think the authors should consider the systems and tools recommended by W3C and/or at the same time consider those presented at JWS, ISWC, ESWC, VLDB, VLDB J, SIGMOD and so on which are high quality in the Semantic Web domain and Database domain, such as the following ones (just mention a few):

-. Yuan, P., Liu, P., and so on: TripleBit: a Fast and Compact System for Large Scale RDF Data, In: Proceedings of the VLDB Endowment, 2013.

-. Papailiou, N., Tsoumakos, D., and so on: H2RDF+: An Efficient Data Management System for Big RDF Graphs, In: Proceedings of SIGMOD, 2014.

-.Motik, B., and so on: Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems, In: proceedings of AAAI, 2014.

-.Harbi, R., and so on: Evaluating SPARQL Queries on Massive RDF Datasets. In: Proceedings of the VLDB Endowment, 2015.

I understand that choosing the systems and tools to be included was not easy work. We may need to collect and read a lot of papers. However, if you decided to consider those not listed in https://www.w3.org/2001/sw/wiki/Category:RDFS_Reasoner, in terms of fairness and comprehensiveness, you had better consider the works presented in the Journals and Conferences mentioned above.

----------------
3. Undoubtedly, there are have already existing a lot of works comparing the systems and tools managing RDF(S) data sources, such as the following ones:

-. Bizer, C., Schultz, A: The Berlin SPARQL Benchmark. International Journal of Semantic Web and Information System, 2009.

-. Tamer OZSU, M: A Survey of RDF Data Management Systems. Frontiers of Computer Sciences, 2016.

-. Duan, S., and so on: Apples and Oranges: A Comparison of RDF Benchmarks and Real RDF Datasets, In: SIGMOD, 2011.

-. Banane, M., and so on: RDF Data Management Systems Based on NoSQL Databases: A Comparative Study. International Journal of Computer Trends and Technology, 2018.

-. Uuoc, H., and so on: A Performance Study of RDF Stores for Linked Sensor Data. Semantic Web Journal, 2019.

Although these works compared RDF(S) systems from different aspects and views, the author should have considered these works and compare their work with these.

----------------
4. The last problem is about the dimensions or aspects that the authors have considered. The authors have emphasized multiple times that they compare and classify the systems chose from the aspects of considering or not blank nodes denotation, reasoning rules supported as well as the ability to explain reasoning results.

Although the authors have explained why considering these three dimensions, more compelling reasons should be provided to explicitly describe why these dimensions are crucial or for how many applications, these dimensions are crucial. I imagine that normally people are much more concerned about the efficiency, scalability, throughput as well as the models they use to store data and the dependence of main-memory.

The system BigData mentioned in this paper has very low speed to load RDF graphs, thus I usually ignore this system. Jena-TDB adopts a property table to manage RDF graphs, thus it usually can evaluate star-queries efficiently. On the other hand, Virtuoso adopts a three-dimension table to store RDF graphs, thus when the graphs are very large, it may spend more time evaluating the queries with many joins.

Next, let's take about the dimensions/aspects this paper considered.

--
Blank nodes are used to describe the existence of some unnamed entities. The usage of blank nodes is controversial since they may have strange behavior and they make the reasoning problems to be NP-complete.

-. RDF 1.1 Semantics, https://www.w3.org/TR/rdf11-mt/, 2014.
-. Hogan, A., and so on: Everything You Always Wanted to Know About Blank Nodes. Journal of Web Semantics, 2014.
-. De Bruijn, J., and Heymans, S: Logical Foundations of (e)RDF(S): Complexity and Reasoning. In: ISWC, 2007.
-. Ter Horst, H: Combining RDF and Part of OWL with Rules: Semantics, Decidability, Complexity. In: ISWC, 2007.

For simplicity, applications usually treat the RDF graphs with blank nodes as ground RDF graphs via considering the blank nodes as new terms.

--
Considering whether support reasoning is very useful, since most systems pursuit performance and scalability, while usually ignore reasoning.

--
Explanation. RDFS reasoning is relatively simple. E.g, In the backward reasoning, the explanation can be realized by recording the rewriting procedure. Besides, end users can easily understand that the results are obtained by extending or specializing some concepts or roles. For expressive languages, such as OWL 2 DL, or machine learning-based systems, the ability to explain the results are much more important, since, for end-users, the reasoning procedures are more involved like black-boxes. The importance of the explanation of RDFS reasoning should be explained better.

----
In summary, the authors should better argue that all these three dimensions are crucial.


Comments

Suggestion: Reject

The authors propose a maturity model for RDFS reasoners and RDFS-based applications. The maturity model is based upon three criteria: level of support for blank nodes, deductive capabilities and explainability. Each criterion is assessed according to the following levels of support:
1. Blank nodes support: a) discards blank nodes , b) support without multiple denotations, c) multiple denotations
2. Deductive capabilities: a) absent, b) limited, c) full RDFS support
3. Explainability: a) absent, b) distinguishing implicit and explicit triples, c) rules sequence , d) human-readable explanation

These criteria and levels of support are then used to assess an extensive list of RDFS reasoners and applications.

This survey is indeed needed for Semantic Web researchers and practitioners. However, the chosen criteria in this survey as well as the assessment of some reasoners is questionable.

For instance, the authors rely mainly on the documentation of the tools to assess their capabilities and assume for example that if the documentation does not mention blank nodes then the tool does not support them. This yielded some misclassifications: Corese support for blank nodes is assessed to be absent after a "thorough" study of technical reports and research papers describing Corese reasoner. However, according to this link containing Corese API documentation http://www-sop.inria.fr/acacia/soft/corese/querydoc/node12.html

"By default Corese does not return blank node ID, unless necessary. It is possible to request that blank node ID be returned by using the clause below. In this case, Corese generate a blank node ID that can be reused in another query to retrieve the blank node.
select display blank where"

This leads me to question the validity of the approach relying solely on the documentation to assess the capability of RDFS reasoners. A more systematic approach for assessing RDFS reasoners and applications would be to design a set of TBoxes and ABoxes with the expected inference graph for each level of support. By testing each of the surveyed reasoners using a standardised set of RDF graphs, the authors could add a fourth dimension for usability to assess each RDFS reasoner according to the ease of use, availability of open source or research license, date of the last release, frequency of the updates, size of the community, quality of the documentation etc. Such a dimension could be very valuable for the community.

The different levels of support that the authors propose was built from top down rather than starting from the tools leading to an empty category for one dimension and a coarse granularity in another. For example, no reasoner or application assessed by the authors had the support of blank nodes with multiple denotations making the choice of this level of support impractical. On the other hand, the explainability support level of providing the sequence of rules can be split into two levels: summarized sequence of rules and complete list of rules. For example given the following input graph:

c1 rdfs:subClassOf c2 .
c2 rdfs:subClassOf c3 .
r1 rdf:type c1 .

and the inferred triple

r1 rdf:type c3 .

Jena provides the following summarized explanation:

r1 rdf:type c1 .
c1 rdfs:subClassOf c3 .

rather than a more detailed explanation containing also the explanation of the implicit triple:

c1 rdfs:subClassOf c3 .

Jena was also misclassified by the authors as lacking the support of providing the rules sequence. This is the default behaviour of Jena but the logging of the derivations can be toggled on using "PROPderivationLogging" https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/vocab...

This confirms that relying only on the documentation to assess a feature can be misleading as the authors might have looked for "explanation" or "justification" but not "derivation". Trying to use Jena to generate the explanation would have led the authors to this stackoverflow answer on how to use Jena to get the derivations:

https://stackoverflow.com/questions/49293163/whether-does-jena-derivatio...

These two misclassifications (Corese support of blank nodes, and Jena explanation support level) make me less confident that there aren't many more misclassifications especially that the methodology of relying solely on the documentation seems flawed.

This survey is indeed needed and could be a valuable resource for the community. The authors made a good starting point and great effort by including an extensive list in their survey. The chosen criteria can be augmented by a usability dimension and the assessment needs to be more systematic.