A Systematic Survey of Temporal Requirements of Bio-Health Ontologies

Tracking #: 1896-3109

Authors: 
Jared Leo
Nicolas Matentzoglu
Uli Sattler
Bijan Parsia

Responsible editor: 
Pascal Hitzler

<
Submission type: 
Full Paper
Abstract: 
The Description Logic SROIQ(D), as the logical core of the W3C standard Web Ontology Language (OWL 2), is a widely used formalism for ontologies in the life sciences. Bio-health applications including healthcare and life science domains commonly have a need to represent temporal information such as medication frequency or stage-based development. Different classes of temporal phenomena may generate different sorts of requirements on SROIQ(D) or extensions of SROIQ(D). In this paper, we deliver the first precise investigation into identifying exactly what kinds of temporal requirements are most important for bio-health ontologies. We conduct an empirical investigation of the OBO Foundry using a bespoke methodological approach by searching each of its ontologies for specific temporal features, and go on to calculate the importance of these features using a sophisticated set of measures. By doing so, we derive a formal set of Temporal Requirements which act as a set of guidelines which a language or logical extension to OWL 2 would need in order to meet the temporal requirements of bio-health ontologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 02/Jul/2018
Suggestion:
Minor Revision
Review Comment:

This paper presents an analysis of the requirements for temporal modeling within bio-health ontologies, specifically, the large suite of ontologies developed in a loosely coordinated way by members of the Open Biomedical Ontologies foundry. The paper is very well written and addresses an aspect of knowledge representation which will be of definite interest not only to the logic community but also to the developers of the scientific ontologies being surveyed. The example provided in the introduction (spermatid cells) clearly demonstrates some of the issues relating OWL modeling and time. Further, the overall approach is good exmaple of how to perform such a requirements assessment across a large suite of ontologies, and hopefully could be adapted to study requirements other features in the future.

The results, which categorize and quantify the usage of different time-related property attributes across the ontologies in the OBO library, are clearly presented and will be useful for analysis of future proposals for temporal language extensions to OWL. Although I don't think it's required, I think it would be valuable if the paper explained more about how such language extensions would benefit the ontologies being studied and applications of those ontologies. For example, the introduction provides a clear presentation of some shortcomings in the modeling of fruit fly spermatid cells. But it would be improved with some description of the additional reasoning features that could be enabled by a more accurate model. What features of new query types or quality control capabilities would make development of temporal expressions worthwhile?

Likewise, the discussion could be expanded somewhat in terms of summarizing the results. There is a wide variety of temporal attributes being tallied, and the reader may be left wondering whether there is a core set of temporal features that would cover a broad range of the ontologies. They do say that "in order to meet the requirements, a language would have to be able to model a large set of temporal features". Perhaps the takeaway is simply that the temporal requirements of these ontologies are large and diverse.

Because I think this paper will be of interest to scientific ontology developers working with OBO ontologies, it would be worthwhile to consider providing more background for certain concepts. In particular, where the results describe computing the "Pareto frontier", it would be useful to briefly summarize what this means and how it relates to metrics.

Some citations seem to be missing (there are empty brackets in the text). Also, it would be useful to point to where these ontologies can be obtained from the web, by providing to URL to the OBO Foundry website, and also the downloadable PURL for the Relation Ontology, which is heavily discussed.

Overall the writing is good, but there are a few grammatical issues that could be corrected:

1. Introduction
- "the underlying formalism for OWL ontologies come with many advantages" - change "come" to "comes"
2.1 The OBO Foundry
- "for which they could be used as" - change to something like "which could be used as"
2.2 Temporal Modelling in the OBO Foundry
- "Drosophila Melanogaster" - species names should be italicized, and have genus capitalized and specific epithet in lowercase: "Drosophila melanogaster"
- Page 6 "where as" - change to "whereas"
4.2.2 Temporal Attributes
- "Domain& Range" - change to "Domain & Range"
4.3.1 Analysis of temporal requirements
- "focus on one type of temporal phenomena" - change "phenomena" to "phenomenon"
5. Discussion
- "who's" - change to "whose"

Review #2
Anonymous submitted on 12/Aug/2018
Suggestion:
Major Revision
Review Comment:

The authors aim to describe the temporal modeling requirements of
ontologies in the healthcare and life science domain. To this end, they
consider ontologies in the OBO foundry, extract temporally relevant relations used in those ontologies according to BFO and RO, and classify those relations according to their temporal attributes in RO. The aim is then to obtain information about the importance of relations with temporal attributes, and on how widespread relations with certain (combinations of) temporal attributes are. According to the authors, the results can then be used to inform ontology language design.

I believe this is an important contribution to the empirical study of
ontologies. I agree with the authors that studies like this provide
important information for the design of future ontology languages.
I also believe that this study contains a lot of useful material which
should be published and made available to the community. The research presented here is not very original, but potentially very significant.
I have one major problem with the paper, however: I found it very hard to extract the relevant information from the paper because important notions are not defined in a sufficiently precise way and not
illustrated by examples.
So I suggest that the paper is accepted subject to a significant revision of the presentation.

My main suggestions for improvement are:

(a) Give a more precise definition of the temporal attributes you consider. Provide more discussion and more examples illustrating
which temporal attribute applies to which temporal relation and why.
For example,

(1) Currently your main example (which is essentially the only one (Sec 2.2)) almost exclusively deals with continuants. Give more examples with occurrents. Is is obvious that a "dynamic" semantics is also needed for occurrents? Give examples of relations (and axioms) using both continuants and occurrents.

(2) I'd like to know which combinations of temporal attributes are LOGICALLY possible. Being able to provide a logical analysis is also a good test for having sufficiently precise definitions of the attributes. You indicate at various places that the DOMAIN&RANGE attribute heavily influences what's possible for the remaining attributes. Make this much more explicit. Also, introduce formally
the abbreviations (IC, SDC, GDC, etc) used - and each category should come with at least one illustrating example (relation with axioms).

(3) Currently, starting with DOMAIN&RANGE, the paragraph introducing the categories states that there are 23 attributes, eight between C and O, 11 between C, etc. All this is not very useful. What is needed is an in-depth description of the 4-6 most important DOMAIN&RANGE combinations and how they occur in ontologies. The same applies to the remaining five categories.

(4) For TIME, it seems that implicit there is a distinction here between time points and time intervals. How does this work?

Without a very clear idea of the definition of the temporal attributes and their relationship, the empirical results presented later are impossible to interpret and apply. So, this should be a core part of the paper.

(b) Make clearer what you expect from an empirical analysis of the combinations of temporal attributes (rather than the single temporal attrbutes). I can fully follow the idea that one would want to know which temporal attributes of relations are important. This is less clear for the combinations. Do you believe that relations with the same combination of temporal attributes will be modeled in the same
way in an appropriate ontology language? If so, you need to give an argument. Again, examples could help.

(c) How to read Tables 9, 10 and 11? It should be possible to understand a table without consulting the appendix. So what is A68? What R19? If these tables are telling something important, one has to give instances of at least some of the combinations denoted by AX and RY. At the moment I find this discussion too detached from concrete
temporal relations.

Minor suggestions:

(1) Page 1: Reference missing for 2ExpTime.

(2) Page 2: Reference missing for the Relation Ontology.

(3) The contributions of the paper are not made sufficiently clear in the introduction. What is a temporal encoding of an ontology? What is a entity importance measurement system? A brief description of temporal
requirement sets is needed. Without this information the reader does
not learn what the contribution is.

(4) Page 3. The concepts of continuants and occurrents have been discussed extensively in the literature. A few pointers to the most relevant papers would be helpful here. It might also be useful to give pointers to the literature for temporal attributes.

(5) In Def 1, why union in front of Y?

Review #3
Anonymous submitted on 20/Aug/2018
Suggestion:
Major Revision
Review Comment:

% Summary:
In this survey, the authors have presented: 1) a temporal encoding of the Relation Ontology, 2) mechanisms for measuring the importance of an entity in a set of ontologies, and 3) a set of temporal requirements that show what sort of temporal information is utilized by existing ontologists and thus acts as a guideline for further temporal extensions to OWL. This work (the encoding and objective measurement of importance) is timely and necessary. It also examines a large number of modern ontologies that are currently in-use.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Questions: In this section, I am transcribing some of my notes that I took while reading the survey.

Page 2: If I read between the lines of this introduction, it seems to me that you're trying to say that by looking at all the ways the temporal features are used we can see the needs of a community of ontology (or both). This should probably be made more explicitly clear.

It is not immediately clear (using the information from the introduction) what a temporal requirement is. It also seems like definition of a temporal requirement changes between introduction and your explicit definition. Is this intentional?

Section 3: why is RO in parentheses?

Section 3.2:
par. 2: it is not clear how the temporal information was extracted. Nor is it clear that the second set of numerals are supposed to be talking about the first. The example is useful, but it could definitely use some improvement as to how it is presented to the reader.

D&R: what is IC? I could not find the abbreviation anywhere.

Time, States: I do not understand how these are different.

Identity: give an example

Last two paragraphs: it is not clear that these are supposed to be distinct from AHFAT.

Last sentence, second to last paragraph is unclear

Last paragraphs: how were these implications determined?

Section 3.3:
definition 1: what is the union before Y?
First sentence after the list: I can not figure out what you mean. Does it mean the "subsumptive closure" of implications? Completeness is ill-specified in this context.

Definition 2: \leq is what? Intuitively, I assume that it is that it falls "lower in the subsumption hierarchy" Is this standard usage? If not, define or make explicit for the reader.

Figure 2: Give an appendix with all the acronyms. The figure provides no technical insight or ability to fact check with out knowing what they mean. Are the boxes arranged in this order for a reason?

Section 3.4:
last sentence (the i.e. part): this does not follow based on previous information in the subsection. How do we know that exact matches refer to the correct usage? If this is an assumption, say as such.

Section 3.5:
I am still unsure as to what the functional difference between a temporal feature and temporal attribute is

Section 3.6:
what are some other measures?
Give further explanation of the last sentence of paragraph 1.

"As previously discussed, neither..." was it? if so where?

Why are Cov and Nec written as such? why not continue overloading, as with importance?

The use of cov vs nec is not well-specified at an intuitive level.

The example for considering co-occuring temporal annotations is unclear to me. What else could \script{R}_1 be?

"On the flip-side" ... I do not understand the point being made here. Provide an example?

Section 4.1:
"In terms of the axioms the relations are used in" ... I do not think this statement is true? 30% of ontologies does not imply 30% of all axioms?

Table 2: CAT == category? If so, why not use notation

Section 5.1:
"Ontologies may exhibit..." such as what others?

-----------
Smart matching: how closely does "smart matching" relate to (complex) ontology alignment? Are there techniques from that subfield that can be applied here?

In general, the results seem useful, but I have some trouble drawing final conclusions from the data. Did you explicitly present a TR set for the OBO foundry?

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Errata
Missing citation 2exptime-complete

notation for cardinality of sets is inconsistent throughout definitions.

Section 5: w.r.t
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Overall

The work itself is valuable to the community, timely, and necessary. However, the presentation of the work makes it very difficult for the reader to draw conclusions from the data. Definitions should be clarified and the reader provided examples. The reader should be able to use the methods section to replicate this study on an arbitrary ontology, but many pieces of the method are abstracted to mathematical generalizations, leaving the reader to guess at the method of extraction used by the authors (in particular, the extraction example is not wholly helpful).

I suggest that the authors rework their presentation (in general) and discussion (in particular) to present the reader with concrete conclusions and examples.


Comments

There is typo located in the States hierachy in Figure 2. The ordering should be:
Domain:Birth < Domain:Changed
Domain:Death < Domain:Changed
Range:Birth < Range:Changed
Range:Death < Range:Changed