Review Comment:
This paper surveys the use of background knowledge in schema matching, in terms of types of background knowledge sources that have been used; the strategies for linking schema entities to background knowledge; and strategies for exploiting background knowledge within the schema matching process. For each of these dimensions, a classification is proposed: (i) type of background knowledge sources (domain-specific, general-purpose, structured -- lexical and taxonomical, factual database, semantic web dataset, pre-trained neural models, etc. and unstructured -- textual, non-textual); (ii) strategies for schema and background knowledge linking (given links, direct linking, fuzzy linking and WSD); and (iii) exploitation strategies (factual query, structure-based, statistical/neural, logic-based).
While the paper presents a comprehensive overview in the topic with an extensive literature, I have some main concerns.
First, with respect to the scope and positioning. The authors tried to define the scope in several passages: "We introduce the reader to the schema matching problem and its abstraction, the ontology matching task"; "This includes papers that focus on schema matching in a different technological area such as DTD matching (e.g. [33]), XML Schema matching (e.g. [34]), WSDL matching (e.g. [35]), or relational database matching (e.g. [36])." "Nonetheless, most of the papers of this survey are from the ontology matching domain as an ontology can be seen as a universal representation of a schema (see Subsection 3.3)." Despite these efforts, however, the scope is still not clear: the paper mostly describes works on schema matching (TBox) of ontologies and not schema matching in the large sense (at least in the paper, there is no explicit description of works addressing relational schema matching -- works [33-36] appear only in Table 4 and not in Tables 2 and 3 and no description in the text). In fact, an ontology can be seen as a universal representation of a schema, but these different schemes have very different levels of expressiveness, what has not been taken into account at all in the paper. In that sense, I do not agree on the that statement "even though the term ontology is used in this paper -- the presented methods can be equally applied to other matching problems such as database schema matching or XML schema matching [46]." There is no impact in the use of the external resource with respect to the expressiveness of the schemes to be matched? These points have to be clarified in the paper.
Second, particular attention is given to OAEI (track descriptions, evaluation strategies, participating systems -- Figure 2, Figure 3, Figure 7, Table 1). It is true that OAEI is a reference in the field but the survey, in the same sense of the comment above, should go beyond OAEI in terms of schema matching (matching XML, relation schemes, etc., with the specificities of these different "schemes"). Again, there is no information about the specific kind of schema the non-OAEI systems are able to deal with and on which datasets they have evaluated (in particular for the 14 systems in Table 4 and that use WordNet). It sounds more a review on background knowledge in OAEI.
Third, the critical aspect of background knowledge selection has been mostly neglected in the paper. This however is an interesting point. And some guidelines on choosing the "good" background knowledge should be provided in the paper, in particular in the discussion. Furthermore, some words on the quality of background knowledge resources and how they have been constructed -- manually (WordNet), semi-automatically or automatically (BabelNet) are missing (this could also be included as a category in the classification). The quality can have an impact in the matching results. This is also the case of multilingual resources with different language coverage (for instance, the French lexicon in BabelNet has a lot of noise that does not appear in the English lexicon).
Fourth, the discussion has a clear OAEI bias mostly discussing the drawbacks and open challenges related to the OAEI tracks. The discussion should be also directed to the challenges of re-using the solutions in real cases and industrial scenarios and how the different levels of expressiveness of the different types of schemes impact in the matching process and selection of the background knowledge.
For those reasons, the recommendation is major revision.
---------------------------------------------
Minor comments:
1. Introduction
- It is missing a more explicit link between the 4th and 5th paragraphs of the introduction (surveys on matching and background knowledge and context-based).
- "The matching techniques further studied in this survey can be broadly categorized as context-based approaches according to Euzenat and Shvaiko" vs. "Logic-based approaches apply reasoning on or together with the external resources. This class of approach is also referred to as context-based matching [11]"
2. About this survey
- "In this survey, we cover all matching systems that participated in the schema matching tracks of the OAEI from its inception in 2004 until 2020 [13–28]. ==> missing reference to OM 2020
3. Schema Matching and Ontology Matching
- Evaluation of Automated Schema Matching Systems => this subsection seems not to be required
- Background Knowledge in Ontology Matching => in Schema Matching
- Background Knowledge in OM ==> Background Knowledge in OAEI ?
4. Categorization of Background Knowledge in Schema Matching
- Put the tables closer to their citation in the text (Table 2 indicates the kind of schema is used)
5. Categorization of Linking Approaches
- "Our analysis on how concepts are linked into the background knowledge source revealed that most matching systems do not perform elaborated linking approaches but use a direct string lookup".
This statement is quite surprising. Given the number of matching systems exploiting WordNet for which a disambiguation is required. "We did not find matching systems that try to actually disambiguate the sense of a label through Word Sense Disambiguation – despite the heavy usage of WordNet (which is built around senses)" => which similarities ? What is "real" WSD ?
- Lastly, (iv) logic based approaches => Lastly, logic based approaches
- Table 4: should be interesting to indicate the kind of matched schema for the systems not participating at OAEI
- It is important to note that reasoning can also be applied across multiple ontologies: Locoro et al. [11] ==> . Locoro
- Logic-based according to the Figure 11 can be also considered as an indirect matching
- "However, we did not find broad usage of logic-based exploitation approaches in past and current (OAEI and non-OAEI) schema matching systems that go beyond singled out experiments". => LogMap does not apply any reasoning involving UMLS?
- Pre-trained embedding-models and architectures, for instance, are so far rarely used but may be very promising given breakthroughs in other scientific communities. ==> These resources have been made fully available quite recently.
- Structural approaches are almost completely limited to WordNet and their exploration on multilingual datasets and in Semantic Web datasets may yield interesting results given good results on WordNet and given
that this class of approaches is typically intuitive to understand and can be comprehended by humans (unlike neural models). ==> multilingual aspect here should be clarified (do the authors refer to the different versions of WordNet in different languages?)
- If we take a closer look at the domain-specific knowledge sources used, it is striking that almost all datasets are from the biomedical domain. => OAEI bias ?
- Enterprise schema matching and integration challenges in the business world, for example, are not reflected at all in OAEI tracks. => what about the Process Model Matching at OAEI?
- While multiple automatic background knowledge selection approaches have been proposed (see Section 3.3) => very short section
References
Missing ones in background knowledge selection and other surveys.
@inproceedings{tigrine:lirmm-01407888,
TITLE = {{Selecting Optimal Background Knowledge Sources for the Ontology Matching Task}},
AUTHOR = {Tigrine, Abdel Nasser and Bellahsene, Zohra and Todorov, Konstantin},
URL = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01407888},
BOOKTITLE = {{EKAW: Knowledge Engineering and Knowledge Management}},
ADDRESS = {Bologna, Italy},
SERIES = {Knowledge Engineering and Knowledge Management},
VOLUME = {LNCS},
NUMBER = {10024},
PAGES = {651-665},
YEAR = {2016},
MONTH = Nov,
DOI = {10.1007/978-3-319-49004-5\_42},
PDF = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01407888/file/Main.pdf},
HAL_ID = {lirmm-01407888},
HAL_VERSION = {v1},
}
@article{DBLP:journals/semweb/ThieblinHHT20,
author = {{\'{E}}lodie Thi{\'{e}}blin and
Ollivier Haemmerl{\'{e}} and
Nathalie Hernandez and
C{\'{a}}ssia Trojahn},
title = {Survey on complex ontology matching},
journal = {Semantic Web},
volume = {11},
number = {4},
pages = {689--727},
year = {2020},
url = {https://doi.org/10.3233/SW-190366},
doi = {10.3233/SW-190366},
timestamp = {Fri, 28 Aug 2020 15:32:46 +0200},
biburl = {https://dblp.org/rec/journals/semweb/ThieblinHHT20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Some references are incomplete or outdated:
F.J.Q. Real, G. Bella, F. McNeill and A. Bundy, Using Domain Lexicon and Grammar for Ontology Matching, 2020, to appear ==> where ?
E. Thiéblin, O. Haemmerlé and C. Trojahn, Automatic evaluation of complex alignments: an instance-based approach (2020). ==> where ?
S. Hertling, J. Portisch and H. Paulheim, Supervised ONtology and Instance matching with MELT, in: OM@ISWC 2020, 2020, to appear.
D. Faria, C. Pesquita, T. Tervo, F.M. Couto and I.F. Cruz, AML and AMLC Results for OAEI 2020., OM@ISWC 2020 (2019), to appear.
(and all other OAEI 2020 papers)
|
Comments
Related work
Also check "Experiences from the anatomy track in the ontology alignment evaluation initiative" https://doi.org/10.1186/s13326-017-0166-5 that has a section on the use of background in 10 years of OAEI Anatomy.