Survey of Tools for Linked Data Consumption

Tracking #: 1849-3062

Authors: 
Jakub Klimek
Petr Skoda
Martin Necasky

Responsible editor: 
Ruben Verborgh

Submission type: 
Survey Article
Abstract: 
There is lots of data published as Linked (Open) Data (LOD/LD). At the same time, there is also a multitude of tools for publication of LD. However, potential LD consumers still have difficulty discovering, accessing and exploiting LD. This is because compared to consumption of traditional data formats such as XML and CSV files, there is a distinct lack of tools for consumption of LD. The promoters of LD use the well-known 5-star Open Data deployment scheme to suggest that consumption of LD is a better experience once the consumer knows RDF and related technologies. This suggestion, however, falls short when the consumers search for an appropriate tooling support for LD consumption. In this paper we define a LD consumption process. Based on this process and current literature, we define a set of 34 requirements a hypothetical Linked Data Consumption Platform (LDCP) should ideally fulfill. We cover those requirements with a set of 94 evaluation criteria. We survey 110 tools identified as potential candidates for an LDCP, eliminating them in 3 rounds until 16 candidates for remain. We evaluate the 16 candidates using our 94 criteria. Based on this evaluation we show which parts of the LD consumption process are covered by the 16 candidates. Finally, we identify 8 tools which satisfy our requirements on being a LDCP. We also show that there are important LD consumption steps which are not sufficiently covered by existing tools. The authors of LDCP implementations may use our paper to decide about directions of future development of their tools. The paper can also be used as an introductory text to LD consumption.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ruben Taelman submitted on 09/Apr/2018
Suggestion:
Accept
Review Comment:

I acknowledge the work done by the authors to address the comments from the reviews.
Other than the two minor comments noted below, I have no further comments on this article,
which is why I recommend an accept.

The revised introduction now makes it sufficiently clear that the goal of an LDCP is not to accomplish _all_ the defined requirements.
Instead, the purpose of these requirements is to show the relevancy of tools for an LD consumption aspect.

My concern regarding the definition of evaluation criteria has been resolved in this revision.
In order to make the conformance to these criteria objective,
the authors added an appendix containing detailed information on when each criterion is passed and when not.

Minor comments:
* page 28: typo: "The entity constrain restricts"
* page 33: "in a form of a dump" -> "in the form of a dump"

Review #2
By Elena Demidova submitted on 25/Apr/2018
Suggestion:
Major Revision
Review Comment:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

Needs further work.

(2) How comprehensive and how balanced is the presentation and coverage.

Large amount of material, good coverage.

(3) Readability and clarity of the presentation.

Presentation needs improvement / further work.

(4) Importance of the covered material to the broader Semantic Web community.
Topic is important to the community.

Summary:

This survey article focuses on tools for linked data consumption. The authors define the concept of a Linked Data Consumption Platform. They propose a set of requirements that such platform should satisfy along with a set of evaluation criteria to verify if these requirements are addressed by a tool. The authors differentiate among different user groups for such platform such as LD-experts and non-LD experts (e.g. data journalists).

Strong points:

The problem of making Linked Data available and usable, in particular for non-LD experts is a timely and an important problem.
The number of papers and tools reviewed in this survey article is impressive.

Weak points:

The authors build the discussion in the article and the evaluation of tools upon the assumption that one linked data consumption platform should satisfy all requirements along the entire data consumption pipeline. Effectively, the requirements discussed by the authors are a union of functionalities of existing tools. In my view this assumption does not seem realistic or desirable. Specialized tools for specific tasks are likely to be more effective.

Although the authors intend to evaluate the requirements from the point of view of non-expert users, no real input from the users of this group is collected or analysed. I strongly recommend that the authors seek explicit feedback on the proposed requirements from real users in the target group. The requirements that are currently collected bottom-up from the existing papers and system functionalities do not necessarily reflect requirements of the users in this user group.

Some of the requirements presented in the article mix up the desired functionalities and the concrete methods to implement them.

Detailed comments:

The authors of this survey article have an ambitious goal of designing requirements for an LD consumption platform that should support non-expert users. The authors conducted an impressive amount of work, both in terms of the number of surveyed papers as well as the details of the tool analysis. Still, at the moment, the results of this work seems to be preliminary and need substantial further development to make the findings of this work really useful.

One of the main results of the analysis, although not particularly surprising, is that currently none of the surveyed tools fulfil all the requirements collected by the authors with respect to the LD consumption pipeline. On the positive side, I think that this result can be taken as a motivation and a first step to create a road map for development of better LD consumption tools in the future.

One of the main points of criticism is that the survey methodology does not fully support the described scenarios, in particular involvement of non-LD users. Whereas, according to the authors, the goal should be to obtain the tools that support different user groups, such as expert and non-expert users, the authors collect the requirements, or better say existing functionalities of tools developed by experts. Such requirements are not necessarily meaningful to the users (especially in the non-expert user group). Instead, to become useful for real non-expert users, the requirements should be collected by directly involving the users into discussions, and allowing them to evaluate existing and request additional functionalities.

Another problem is the assumption that one platform / tool should facilitate the whole LD consumption pipeline, which does not seem very realistic, especially if one takes the quality of the services into account. Furthermore, the requirements might have different priorities, dependent on the target user group. Which of the requirements are the most crucial? For whom?

Criteria in the 3rd elimination round of tool selection is indeed very restrictive (i.e. being able to load a large scale dataset).

Beginning of Section 4 can be significantly shortened (similar processes described in the literature do not need to be repeated in full detail).

In section 4.2.1, please state explicitly that you talk about dataset search. E.g. “Search user interface” “Search query language” etc. are ambiguous (one can think about searching data in a dataset).

“The ability of the users to precisely express their intent strongly depends on the user interface” – I disagree with this. The user interface is just syntactic means to express the intent; what is also needed is the knowledge of data, schema, etc. please rephrase.

Furthermore, there is an issue with the formulation of requirements; some of the requirements mix up the goals and the methods to achieve them.

Requirement 5: Query expansion is a method, not the goal. The goal should be improvement of recall.

Criterion 5.1 should rather talk about finding semantically related results (here again, query expansion is just one possible method to achieve this goal).

Requirement 7: Also mixes up functionalities (ranking) with features (links).

Requirement 8: preview based on vocabularies is in my view very specific. While preview is useful, vocabularies are just one specific way to provide it.

Criterion 8.3: Preview profile ?

Criterion 9.1 … based on automatic querying … - too specific

Requirement 10. Which quality indicators do you mean?
“which model a similar part of reality” – please rephrase
The scores (in particular the averages) in the tool evaluation are not useful for the reasons discussed above.

Overall, the weaknesses in the requirement description again reflect the main weakness of the paper – it is, in its current form, a collection of specific functionalities of existing tools rather than user requirements of the users in the target group.

Review #3
By Daniel Garijo submitted on 13/May/2018
Suggestion:
Minor Revision
Review Comment:

The authors have thoroughly answered all the questions, comments and concerns that I raised in my review. I have reviewed the modified parts of the paper, and I like very much the changes they have done to improve them. The contribution is now clear and tables 2 and 3 provide a nice overview of how existing tools tackle requirements for LDCP from both an expert and non-expert perspective.

My only comment is regarding the response letter. The authors state that: "Our main message is the tools survey (the second part) which is based on the evaluation of the tools using the framework". I did not understand this very well. From the text, I get that the contribution is the description and comparison of tools. In my opinion the takeaway message of the paper is that despite many years of work providing tools for Linked Data consumption, there is still a gap between existing technologies and basic needs from users.

None of the systems are close to a 10% coverage of the overall requirements described in the paper, in average. In fact, 2 of the requirement sections in Table 3 (displaying search results, analysis of semantic relationships) do not seem to be tackled at all. It would be nice to see a couple of sentences in the paper regarding why this could happen, or additional ideas from the authors on how to bridge this gap.

Finally, there are still some typos. For example:
"A number of open data catalogs... It utilizes" --> They utilize
"It is not clear how the authors evaluated whether a given tools" --> given tool.

I recommend a final re-read before the camera ready version.