Towards a new approach for the semantic annotation of semi-structured documents

Tracking #: 2151-3364

Abioui Hasna
Ali Idarrou
Ali Bouzit
Driss Mammass

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Ontologies remain the focal asset for the effective functioning of semantic search approaches, as they’re able to describe concepts based on a uniform and common vocabulary providing a machine-readable and shareable format. Nowadays, the challenge concerning ontologies exceeds their conception and creation, as a multitude of ontologies are proposed in various domain of applications. Thus, the effective challenge consists of evaluating those ontologies in order to choose the fitting and suitable one. In this paper, we present a new approach to select the convenient ontologies from a set of candidate ontologies by ranking them according to predefined criteria. Our approach takes into account not only the taxonomic structure but also the semantic aspect of the ontology. In addition, we insist on both semantic relations and specific concepts, which must be favored since they reflect the semantic richness of the ontology. By comparing it with a concept-based one, our method shows encouraging results regarding the final selection of ontologies for each document to annotate; when comparing both of methods, the sorting order becomes more accurate and precise since the concept centrality and the type of relations linking it to the other concepts are the main factors that made the difference.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Francesco Corcoglioniti submitted on 06/Jun/2019
Review Comment:

The paper deals with the problem of ontology ranking and proposes a scoring method for quantifying the relevance of an ontology wrt. an input document, whose intended application scenario is supporting the selection of the ontologies to use in the semantic annotation of the document. The proposed scoring method is based on matching classes in the ontology to terms in the document, and the score is computed according to the following intuitions: (i) the score increases with the number and frequency (within the document) of classes matched; (ii) the score increases with the specificity (i.e., depth in rdfs:subClassOf hierarchy) of matched classes; (iii) the score increases with the number of relations (rdfs:subClassOf, object properties) involving matched classes. The paper presents some statistics about a use case where 10 ontologies are matched to 13 example documents. Restricting to 7 documents, the paper then discusses and analyzes the application of the proposed scoring method, compared to a baseline method [6] that considers only concepts and not relations.

The paper deals with ontology ranking and is on topic with the journal. As a full paper, this review will focus on the dimensions of originality, significance of results, and quality of writing.

== Originality ==

The submitted paper is an incremental work. The proposed scoring method builds on a technique for assigning weights to classes and relations in ontologies (section 3.1) that was originally described in a prior work [2] by the same authors. The novel part consists in a proposal for using those class/relation weights to compute the score (section 3.2), and in the analysis of the proposed scoring method (section 4).

The ontology ranking problem tackled by the paper is well known in the literature, with many proposals being advanced since early 2000s, such as Swoogle, Watson, Falcons, LOV, the cited OntoMetric [19] and OntoQA [28], etc. A recent survey covering these works, although from a specific perspective (IoT), is [A]. Most of these state-of-the-art approaches rank ontologies based on intrinsic properties (e.g., quantifying popularity or informativeness) and on similarity to a keyword-based user query, as the envisioned scenario is typically the one of a ontology search engine aimed at end users. The scenario considered here, where the user input is a document, has been tackled by comparatively fewer works (see, e.g., [B]) but is not dissimilar from the more common keyword query scenario, as the document terms used for matching can be seen as a keyword query.

The ideas underlying the scoring method of the submitted paper are not particularly novel. For instance, the focus on relations in the proposed method can be seen as a way to take into account how much concepts matching query/document terms are well described in the ontology, which for instance is exactly what the Density Measure of AKTiveRank does [C,D].

Summing up, I don't see much novelty in the submitted paper, and the situation is worsened by the fact that a proper evaluation of the proposed method is basically absent (see next).

== Significance of results ==

My main criticism concerns the inadequacy of the evaluation, which alone is sufficient reason for rejecting the paper.

What the authors show with Figure 6 and the discussion in the preceding page, is that the proposed scoring method behaves differently from a related method where relations are not accounted for [6]. In the discussion, the authors maintain that their method reaches "significant results" by showing that whenever two ontologies have the same number of matching concepts, the ontology with more relations (for those concepts) is scored higher. However, this tells only that the proposed method behaves as by authors' design (cfr. criterion (iii) about relations). It tells nothing about the effectiveness of the method in actually returning ontologies that are *relevant* from a user point of view.

If we rather consider state-of-the-art works (e.g., [B]), a sound evaluation methodology adopted there is to cast the problem as an Information Retrieval task where there is a gold standard of relevance judgments provided by/derived from users, and the ranking method is evaluated based on its capability to rank relevant ontologies higher (as per the gold standard), quantified through standard Information Retrieval measures. I urge authors to adopt a similar evaluation approach. To that respect, they might find of possible interest the benchmark dataset CBRBench provided in [E].

Another shortcoming of the paper is that it does not illustrate the intuitions behind some of the equations defining the scoring method. For instance, where does Eq. 2 (with its exponentiation with exponent 4) come from? Or similarly, how did the authors come up with Eq. 4 or Eq. 5, or the second addend of Eq. 9? An answer like "they can be empirically shown to provide good results" may work, but in that case a proper evaluation framework has to be adopted so to support that answer with numbers.

Finally, another aspect that I find unclear, although central to the application of the proposed method, regards how occurrences of ontology concepts are detected in the input document (i.e., how are document terms matched to ontology concepts). Are concept labels in the ontology looked up in the document? And in that case, is the match exact or approximate? To that respect, I suggest providing a concrete example.

== Quality of writing ==

There are some typos in the paper, but overall the quality of English is adequate. That said, I think the introduction and the related work sections are too broad in scope with respect to the real contribution of the paper (i.e., the scoring method). For instance, I don't see as particularly useful to introduce ontologies in a paper submitted to this journal. At the same time, while I see the utility of the proposed approach for the semantic annotation of documents, I think too much emphasis and space are given to semantic annotation in sections 1 and 2. I suggest the authors to be more concise, and perhaps to use the recovered space to provide a more comprehensive background on ontology ranking.

== Minor comments ==

C1. The property asserted in Eq. 8 does not seem to hold, to me. I tried to derive it based on previous definitions, and I rather get that the two summations, added together, are equal to P_total(O_k) - \epsilon_k * (|O_k| * S_c(O_k) + |R_k| * S_r(O_k)). I.e., their sum is lower than P_total(O_k) by a very small delta (\epsilon_k is very small), and that could be checked when adding up the scores in Figure 4 of authors' prior work [2].

C2. In Eq. 9, the subscript 'i' in P(R_i, O_k) cannot be the same 'i' of C_i (concepts and relations form different sets). At the same time, index 'i' does not occur in any summation nor it is bounded otherwise. So, what relations R_i is this equation referring to?

C3. In Fig. 5, do we really have 6 documents out of 13 referring to pizza's concepts? In general, it would have helped a lot to provide access to the selected 13 documents (so I could have checked that by myself).

C4. "Total des relations" in Table 4 is not English.

C5. This is a minor comment, but distinguishing between "structural" and "semantic" relations seems like implying that the former - which correspond to rdfs:subClassOf - are not semantic, which is definitely not the case.

C6. Another very minor point concerns the phrase "the formalization of the conceptualization means that the ontology format should be machine-readable and based on natural language" (section 2.1). I don't see how "natural language" could be a requirement, or how ontology languages such as OWL or RDFS can be related to "natural language".

== References ==

[A] N. Kolbe, S. Kubler, J. Robert, Y. Le Traon, A. Zaslavsky. Linked Vocabulary Recommendation Tools for Internet of Things: A Survey. ACM Comput. Surv. 2018.
[B] P. Buitelaar, T. Eigne. Evaluating Ontology Search. EON Workshop @ ISWC 2007.
[C] H. Alani, C. Brewster, N. Shadbolt. Ranking Ontologies with AKTiveRank. ISWC 2006.
[D] H. Alani, C. Brewster. Metrics for ranking ontologies. EON Workshop @ WWW 2006.
[E] A. Sahar Butt, A. Haller, L. Xie. Ontology Search: An Empirical Evaluation. ISWC 2014.

Review #2
Anonymous submitted on 21/Jun/2019
Major Revision
Review Comment:

This paper proposes an approach to rank and evaluate ontologies. The approach converts unweighted ontologies to weighted ontologies. Suitable weighted ontologies are selected and are used to annotate semistructured documents. The approach uses semantics as well as taxonomic structure. Following are a few suggestions to improve the paper:

Things should be properly explained. For example, the normalization in Eq 1 and why the equation is multiplied by N? What is the range of the equation? Similarly, rationale for formulating all the equations should be provided. Selection of parameters for different equations should be justified.

Ambiguous and non-scientific phrases should be avoided. For example, for equation 8, instead of saying "We notice that the sum of component weights of an ontology is equal to the total weight already calculated by Eq. (1)", prove the equality.

Things should be formal wherever possible, descriptive text in section 3.1 should be written in form of an algorithm using proper steps and variables.

Discuss how the lengths of documents and ontologies affect the ranking? What if a document is compared to a very relevant, but short ontology and a long, but not very relevant ontology? The long ontology might have certain concepts frequently present in the document.

Results need to be elaborated. For example, in Figure 4, *why* documents 10 and 13 did not match any ontology?

Why results of documents 8 to 13 are not discussed in detail?

The proposed approach is compared with only a single baseline. The proposed approach should be compared with state-of-the-art and well-known similar approaches.

Some minor comments:

Columns are broken, see first paragraph of page 4. Same is the case on page 7 and other places.

There should be a space between "Fig" and its number. Similarly "Doc" and its number.

Review #3
By Anna Lisa Gentile submitted on 18/Jul/2019
Review Comment:

The paper presents an approach to rank candidate ontologies for the task of annotating previously unseen textual documents.
While the topic is extremely relevant for the semantic web journal, the execution of the work is way below the quality standards of the journal.
Even assuming the validity of the approach, the results are not significant: the entire experimental settings is extermely weak and it is not sufficient for a scientific argument.
The authors compare and rank 10 ontologies with little or no overlap among each other (Musin, Bibliography, Pizza...) and perform the annotation of 13 documents.
In terms of novelty the authors ignore robust and mature work solving a similar problem, such as the reccomendation service offered by Bioportal ( where they match the text against 600+ ontologies.