Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Incremental Schema Integration for Data Wrangling via Knowledge Graphs

Submitted by Javier Flores on 10/12/2022 - 14:59

Tracking #: 3286-4500

A new version of this paper is available

Authors:

Javier Flores

Kashif Rabbani

Sergi Nadal

Cristina Gómez

Oscar Romero

Emmanuel Jamin

Stamatia Dasiopoulou

Responsible editor:

Aidan Hogan

Submission type:

Full Paper

Abstract:

Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.

Full PDF Version:

swj3286.pdf

Revised Version:

Incremental Schema Integration for Data Wrangling via Knowledge Graphs

Previous Version:

End-to-End Incremental Data Integration via Knowledge Graphs

Tags:

Reviewed

Long-term Stable Link to Resources:

https://www.essi.upc.edu/dtim/nextiadi/

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 30/Nov/2022

Suggestion:
Accept

Review Comment:

I want to thank the authors for their endeavors and clear responses to my comments (R1). The revised version of the paper has been enhanced over the prior version, so I recommend the manuscript for publication.

Review #2

By Andriy Nikolov submitted on 04/Dec/2022

Suggestion:
Minor Revision

Review Comment:

===============================================
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
===============================================

The paper has been revised substantially in several respects, most importantly:
- Now it clearly outlines that its scope is limited to the schema integration part of the process rather than the whole data integration problem, thus excluding the aspects like query processing which would have to be discussed otherwise.
- The user evaluation section was added to demonstrate the added value of the system from the perspective of practitioners.
In this way, I think, it covers most of my comments from the original review, either by resolving them or by delineating the intended scope. From my point of view, two remaining aspects are:
- Now, as the scope of the paper has been reduced, the question remains of whether the provided added value for the schema integration part only constitutes a sufficient contribution. In my view, the added user evaluation section supports the claim and provides sufficient evidence for this, but this is something which might be further considered.
- The paper could benefit from another proofreading to fix some writing style issues and typos.
Some typos I noticed:
p. 2, line 21: “Thus, allowing fast and on-demand data exploration.” -> incomplete sentence
p. 2, line 23: “As result” -> „As a result“
p.3, line 32: “There,” -> “There”
p. 9: lines 6-7: “Is candidate” -> “is a candidate”
p. 24, lines 28-29: “Thus, providing an intuitive user interface to use Nextia_DI functionalities.” -> incomplete sentence

Review #3

By David Chaves-Fraga submitted on 03/Jan/2023

Suggestion:
Minor Revision

Review Comment:

First of all, I would like to thank the authors for the effort in providing a very detailed answer to all my comments and improving the paper considerably. It is now better motivated, provides a good overview of the state of the art and the evaluation has been extended with a user study that supports the proposed approach.

Two final comments that should be solved:
1) I still don’t understand why Squerall is included in the state of the art. Maybe, I’m missing something but from what I understood, it is an RML engine that does mostly the same tasks as Ontario, Ontop, or Morph-RDB. So if there is any other task that the engine is able to do (more related to what is proposed in the paper), please clarify it, if not I would recommend to remove it.

2) Ontop does not parse RML mappings. Declarative mapping rules such as R2RML (W3C recommendation), or RML (its main extension) are independent of the engine. If Algorithm 10 is able to generate RML (or R2RML) mappings, I would understand that the instance-level integration can be performed by any [R2]RML compliant engines (virtual or materialized). I would recommend to generalize Section 6.2 and present it in terms of the generation of declarative mapping rules. An up-to-date list of RML and R2RML engines can be found in the following links: https://w3id.org/kg-construct/r2rml-implementation-report, https://rml.io/implementation-report

Log in or register to post comments
2250 reads

Main menu

Editorial Board

Syndicate

Incremental Schema Integration for Data Wrangling via Knowledge Graphs

Tracking #: 3286-4500

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Incremental Schema Integration for Data Wrangling via Knowledge Graphs

Tracking #: 3286-4500

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles