Onto4AIR2: a simple ontology to represent theses from open repositories as products of academic collaboration

Tracking #: 2671-3885

Authors: 
Maria Auxilio Medina
Jorge de la Calleja Mora
Eduardo López Domínguez
Ismael Everardo Bárcenas Patiño
Delia Arrieta Díaz
Araceli Ortiz Carranco
Claudia Zepeda Cortés

Responsible editor: 
Stefan Schlobach

Submission type: 
Ontology Description
Abstract: 
This paper describes Onto4AIR2, a simple ontology to represent theses from open repositories as products of collaboration. The goal is the construction of machine-readable datasets that are semantically labeled for the further deployment of web services of shared interest to managers, developers, and users within educational organizations. The ontology is populated with sample data of theses from the National Repository of Mexico, an initiative promoted by the National Council of Science, and Technology. The paper suggests practical applications derived from the formalisms of the ontology, and describes an assessment technique where participants were managers, developers, and potential users of the ontology. Developers followed a competency questions-based approach and determined that the ontology represents questions and answers using its terminology; whereas potential users participated in a satisfaction survey; the results showed a positive perception about the ontology. Onto4AIR2 is in English and Spanish languages, this fosters unique and formal definitions of concepts from the Mexican repositories domain.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 24/Mar/2021
Suggestion:
Major Revision
Review Comment:

(1) Quality and relevance of the described ontology (convincing evidence must be provided).

The Ontology describes a domain of great relevance for the management of open repositories related to academic resource archives (e.g. thesis). Furthermore, the availability of documenting the resources in two languages ​​(Spanish and English) increases its quality. It may represent a potential tool able to create a machine-readable dataset that is semantically labelled for the further deployment of web services of shared interest to managers, developers, and users within educational organizations.
Anyhow, in my opinion, the paper is still at a primary stage and needs a lot of improvements
The paper does not describe well the relevance of the ontology. Or probably, as the authors themselves argue, the document is intended to represent a preliminary guide for which it does not completely satisfy the requirements of the Journal.
Authors should better emphasize the contribution provided: more detail should be provided on the assessment of the quality and relevance of the ontology.

(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

- The paper needs to be improved in style to provide a clearer and more comprehensive description of the ontology.
- Illustrations are well done, but please increase the description detail of the captions of some figures or tables
- English proofreading is required to improve the readability

More details:

Abstract, introduction, and conclusion
-------------------------------------
They need to be aligned to provide complementary information for an exhaustive overview of the paper contribution. In particular

Abstract. provides some sentences already written in the Introduction, please modify

Introductions. The introduction should be improved. The overview of the current state of play of the institutional repositories for the intellectual resources in Mexico is well described, but
•the section that refers to the issue of metadata heterogeneity should be improved.
•The contribution of the paper with respect to the state of the art is not well highlighted
•Figure 1 is too generic; it does not explain the specific context the proposed ontology refers to. I suggest eliminating or replacing it with a meaningful representation of the Digital Archive context

Conclusions. Provide more arguments about the quality and relevance of ontology.

Related Works
-------------
•In general, most of the references are missing, related to the mentioned repository or relevant features.
•The first sentence of Related Works lists a set of Technological platforms. Authors are advised to provide more information on this subject, highlighting the pros and cons. In particular, you refer to OpenDOAR directory of academic open access repositories: provide more details
•In the second paragraph, please add links to OWLIM, SEKT, TRIPCOM, SwiftOWLIM, BigOWLIM
•Several papers in the references are not in English and a few are recent. It is recommended to update the bibliography with at least 5 journal publications in the period 2015-2021 related to semantic technologies and metadata management as well as standardization

"Onto4AIR2 ontology" Section.
•Add a reference to IEEE standard 1074-1995
•The description is synthetic. For example, the task “Reuse of existing vocabularies” it just a list of vocabularies without mention how they are effectively used.
•In the step “Define classes and construct their hierarchy”, it is not clear what is going with the concepts which are not defined as classes. The authors provide an explanation but it is not clear. I suggest reformulating the sentences.
•In Table 2, and 3 increase the description detail of the captions of some figures or tables

Other few remarks
- English proofreading is needed. I highlight only a few aspects of form
- add references whenever you mention a standard tool or platform
#46- first column. please rewrite the sentence this is the newer version of the ontology described in [9], the number 2 denotes a second version and two languages, English and Spanish.

Pag 2
#41- second column.
ontologies -> ontology
#42 second column.
an -> the

Review #2
By Daniel Garijo submitted on 11/May/2021
Suggestion:
Reject
Review Comment:

This paper describes Onto4AIR2, an ontology for representing academic publications (theses) and the repositories holding them.

The paper is easy to follow, and relevant to the Semantic Web Journal. Having a proper organization of scholarly publications and who were responsible for them is a highly relevant for any institutional repository. Unfortunately, I don't think this work meets the quality criteria required by the journal.
I outline below the main weaknesses I found in my review:

- Originality: The authors state that their ontology has already been published in [9], and that this is a newer version. It's unclear what are the newer changes that motivate this as a new contribution.

- Novelty: The scholarly domain has been covered by a wide range of vocabularies, which the authors do not mention in this paper. For example, BIBO (https://lov.linkeddata.es/dataset/lov/vocabs/bibo) already describes concepts for theses, and the SPAR vocabularies cover the role of a person in a publication (http://www.sparontologies.net/ontologies/pro). Wikidata has properties for describing a knowledge graph of supervisors in theses (https://www.wikidata.org/wiki/Property:P184). It is unclear what novel terms and use cases does this vocabulary address.

- Methodology: The authors claim to address a series of competency questions, but it is not clear 1) how were these CQs collected (users? the authors?); 2) Which use cases are these CQs supposed to support? (are they representative of a general use case?). The competency questions seem incomplete, as no data properties are covered.

- Ontology accessibility: The ontology is not accessible. I tested the URL for "onto" in Figure 2, and I got a 403 access forbidden. The authors should look at best practices for publishing ontologies in the web, and ensure that the URI of the ontology is accessible. In addition, the ontology should have documentation accessible in a human readable format. No URL is provided (besides Figure 2, which does not seem like a long-term URL) so I was not able to inspect the ontology further. The URL provided for "onto" is different in Fig2 and the SPARQL snippet in Section 5.1 (by the way, the 4 first prefixes used in such snippet are not used in the query, so they can be removed).
- Data is not accessible: where can I access the generated RDF? Is there a SPARQL endpoint for browsing annotations? No GitHUb, Zenodo, FigShare DOIs are provided for any of the resources of the paper.

- Extended vocabularies: I just realized when I got to the Conclusions that other vocabularies such as Schema.org had been reused. The paper does not explain how and why are these vocabularies extended.

- Related work: There are many vocabularies for representing scholarly communication artifacts (as I mentioned above), which are not mentioned in the article. The authors focus mostly on platforms that use semantic web technologies, but if the contribution is an ontology, the aim (in my opinion) should be about justifying why current vocabularies are not enough.

- Unsupported claims: The authors claim that their goal is to support the construction of machine readable datasets for deploying web services. But this is part of their related work, so it gives me the impression that the ontology has not been used with real data and examples.

- Minor points:
- How come worksIn is not irreflexive? a teacher cannot worksIn with itself, right?
- This is a semantic web journal, so I think that explaining what semantic repositories are is not necessary.

Review #3
By Rinke Hoekstra submitted on 17/May/2021
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

==
This paper presents the Onto4AIR2 ontology, a revision of the Onto4AIR ontology. It is unclear how the two versions differ other than the support of both English and Spanish. The ontology is designed to support the registration and lookup of theses information in institutional registries in Mexico. The ontology was designed following a standard methodology, and has been represented in OWL, by extending Schema.org, Dublin Core and FOAF. The motivation for the ontology is semantic ambiguity of some of the standard DC fields supported by the OpenAIRE OAI-PMH protocol.
The paper is a bit hard to read in some places, which obfuscates the author's message. Also, the paper explains a lot of information using pictures, which is not the most concise way of explaining everything, and takes up a lot of space.

I have not been able to find a persistent link to the ontology itself.

* Related Work
The related work section contains a lot of references to unrelated work; and it's often not clear why the references are included. Why describe the OWLIM reasoner? Why refer to the LUBM benchmark without discussing its underlying ontology? Other references are missing, e.g. the extensive VIVO ontology. More importantly, a related work section serves the purpose of showing the novelty of the presented approach. In this case, there is no discussion of shortcomings of the related work, nor is there a comparison with the Onto4AIRE2 ontology. How does Onto4AIRE2 improve over the state of the art?

* The ontology

The methodology used is fairly standard (which is a good thing), but there is no discussion of why the competency questions are the right ones to ask. Also, they are a bit hard to understand: what does "how are the theses organized" mean? What kind of answer is expected? The discussion of the steps in the methodology is not thorough enough. For instance, the "define classes and construct their hierarchy" discusses rdfs:isDefinedBy, rdfs:seeAlso and rdfs:comment... which are useful to have, but provide no formal semantics. Also, the necessary and recommended properties are discussed in a table, but it is not shown how they are modeled (why 'identifier' if every instance already has an IRI?).

Table 3 lists properties, with their domain and range. Many of these are listed as functional or inverse functional, where it is not clear that they should be, or vice versa. For instance "isManagedBy" should not be IF: why can an IR manager not manage multiple repositories? Conversely, firstAuthorOf is not IF; while that would allow multiple first authors... but it *is* defined as functional, which makes that any student can only be the first author of a single thesis (can't one graduate in multiple subjects?). Similarly, students can only have one advisor ... overall these definitions are overly restrictive and sometimes incorrect.

* Results

The "description of a thesis" in Fig. 7 shows a number of things that are incorrect in the ontology: "Date" is not xsd:date but rdfs:Literal, the Spanish title has no language tag, but the English one does. The creator is of type rdfs:Literal... why not link to an instance of the Author class? Similarly for the dct:rights property, the knowledge field and the knowledgeArea (and what's the difference between the two?).

* Evaluation

The paper presents an evaluation of the ontology obtained through a survey, and through an NPS score. I would have preferred to see a more qualitative analysis of the ontology, comparing it to related ontologies and to see whether it can capture all metadata for the MSc theses.

Overall the work is not substantial, nor novel enough for publication. The language of the paper needs work, and the ontology itself is not really well defined. I encourage the authors to look at VIVO https://duraspace.org/vivo/ and its underlying ontology.