Comparison and Evaluation of Ontologies for Units of Measurement

Tracking #: 1708-2920

Authors: 
Jan Martin Keil
Sirko Schindler

Responsible editor: 
Boyan Brodaric

Submission type: 
Survey Article
Abstract: 
Measurement units and their relations like conversions or quantity kinds play an important role in many applications. Thus, many ontologies covering this area have been developed. As a consequence, for new projects aiming at reusing one of these ontologies, the process of evaluating them has become more and more time consuming and cumbersome. We evaluated eight major ontologies for units of measurement and the relevant parts of the Wikidata corpus. We automatically collected descriptive statistics about the ontologies and scanned them for potential errors, using an extensible collection of scripts. The computational results were manually reviewed, which uncovered several issues and misconceptions in the examined ontologies. The issues were reported to the ontology authors. In this paper we will present the evaluation results including statistics as well as an overview of detected issues. We thereby want to enable a well-founded decision upon the unit ontology to use. Further, we hope to prevent errors in future by describing some pitfalls in ontology development—not limited to the domain of units of measurement.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Steve Ray submitted on 14/Sep/2017
Suggestion:
Minor Revision
Review Comment:

Overall, I found the article fairly well written, and a thorough, balanced and interesting survey of the major units ontologies. It gives those starting in the field a good view of what significant options are out there. Sections 1-3 do a good job of reviewing the literature already published, and laying out the definitions they use later in the paper. There are some minor grammatical and language usage issues that at times were distracting – some of these are listed later in this review.

I congratulate the authors in sharing the scripts and queries they used. It took me a little while to track down references 21 and 22. It would have helped me a lot to have included a URL in the references section, such as https://zenodo.org/record/823686 for 21. I, for one, wasn’t familiar with Zenodo, but I may just be out of date. It looks like a great service.

The authors claim that a full list of issues found was sent to the authors of the respective ontologies. I should state here that I have joined in the activity of creating version 2.0 of QUDT, which is more than an order of magnitude larger that version 1.1, and therefore am not completely unbiased. However, the principal author of QUDT tells me that he does not recall ever hearing from the authors regarding any issues reported in version 1.1. The good news is that I did not find the inconsistency error in version 2.0 (but have not gone back to double-check version 1.1). It would be good to clear up any miscommunication with the QUDT principal author before publishing.

In summary, this is a very useful, painstaking analysis that should be quite helpful to the Semantic Web community.

Tables and Figures
Sections 4 and 5 contain figures and tables that are very difficult to interpret. Here are some specifics:
Figure 2. There are three tables spread across the page, with different headers. Are these three different and unrelated pairs of properties and values? If so, it would be less confusing if they were listed one above the other rather than side-by-side, so that the reader doesn’t try to see some relation across each row for all three tables. If that is impractical, find some other way to visually separate the tables from each other to avoid this misinterpretation.
Similarly, Figures 3-6 all have three columns, also unrelated to each other, correct? This is similarly confusing. In the text, they talk about the ordering of terms when introducing Figure 4, but I don’t see that issue in the figure.
Table 1 needs more explanation for the reader to be able to interpret the numbers. For example, I was left guessing the meanings of the terms “app”, “dim” (dimensions?), “qk” (quantity kinds?), etc. A legend would help in this regard, perhaps making reference to the terminology in Section 3.
In Table 2, do these numbers represent the union of all the ontologies? This could be stated explicitly.
In Table 3, are the numbers 1-9 at the top representing the 9 ontologies studied? If so, why are they not labeled as such? Why are these numbers in the table different from the corresponding row of Table 1?

Grammar and language suggestions:
The authors use the terms “individuals” and “instances” to mean the same thing. Suggest they choose just one and stick with it.
Page 2, Column 1, last paragraph: “However, the following subset of ontologies was selected, as they seem to be the most promising candidates regarding the amount of individuals and concepts modeled” -> “However, the following subset of ontologies was selected, as it seems to contain the most promising candidates regarding the number of instances and concepts modeled”
Several places: “amount of instances/individuals” -> “number of instances/individuals”
Page 5, Column 1, Paragraph 1: “as a well enough substitute” -> “as a good enough substitute”
Page 5, Column 1, last paragraph: “its detection was automatized” -> “its detection was automated”
Page 5, Column 2, first paragraph: “with actual comparing ontologies” -> “with actually comparing ontologies”
Page 6, Column 1, Paragraph 2: “which in turn led to new checks” -> “which in turn leads to new checks”
Page 8, Column 1, Paragraph 4: “almost as much units” -> “almost as many units”
Page 8, Column 2, Paragraph 4: “seems to be little consent” -> “seems to be little consensus”
Page 10, Column 2, Paragraph 2: “did chose not to” -> “did not choose to”
Page 15, Column 1, Paragraph 3: “One occasion, where deciding upon the correct value can be difficult are conversion factors. Through it is” -> “One occasion where deciding upon the correct value can be difficult is that of conversion factors. Though it is”

Review #2
By Hajo Rijgersberg submitted on 22/Sep/2017
Suggestion:
Minor Revision
Review Comment:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

The paper is very suitable as introductory text. It gives an overview of the most important ontologies of units of measure presently available, and their quality.

(2) How comprehensive and how balanced is the presentation and coverage.

The presentation is very comprehensive and balanced: sufficient related work is described, both about ontologies of units and their evaluation; the terminology is described; there is a clear explanation of the performed method; and there are substantial results, which are being discussed.
All important ontologies of units of measure are included, as far as I’m aware. All important evaluations that have been performed so far are referred to.

(3) Readability and clarity of the presentation.

Very readable and clare.

(4) Importance of the covered material to the broader Semantic Web community.

This subject is very important. It is one of the basic subjects in representing quantitative knowledge in the Semantic Web, and has so far been underexposed. Units and quantities are almost as crucial as numbers, in expressing quantitative knowledge, so the Semantic Web should do something with this. The paper gives a clear motivation why this work is important: besides the importance to indicate units in computer representations, it is important for the ICT community to map out the existing vocabularies of units of measure (i.e., what has been done in this paper).

The paper is very good, to my opinion. I only have some minor issues.
I think it’s great that the authors created the scripts for evaluating unit ontologies on instance level. And the scripts are publically accessible for review.

Issues that I have are:
1. Pages 1 and 2: It would be good if it would be indicated in the abstract what the different ontology authors have done with the issues reported by the authors (as the authors do in section 5.3). This should also be added to the second bullet at the end of the introduction.
2. Page 1: Not all ontologies have been catered to specific needs, which the authors argue, namely this is not the case with OM – OM was meant to be generally applicable.
3. Page 11: It is stated by the authors that the existence of several units of the same name in different systems of units has been ignored. However, in OM this is taken into account, e.g. with ounce (Apothecaries’) and ounce (Troy).
4. Page 13: The issues that are reported for OM and have been fixed in OM 2.0.3 – which is stated correctly by the authors –, are also fixed in the previous version of OM: OM 1.8. (OM 1.8.3, to be precise).

The evaluation could be extended with how the different ontologies deal with absolute and relative temperatures. E.g., 1 °C can be converted to 1 K or 274.15 K. Is this supported in the ontologies, and how? This is a very important metrological issue.

Shouldn’t the ontologies be evaluated on coverage?

Would/should available support of the ontologies be a criterion?

Furthermore I only have typo’s and sort of things:
- Last paragraph in introduction: ‘over’ should be ‘of’.
- Page 3, line 1: ‘oftentime’ is incorrect English, I think.
- Page 3, column 2: ‘several “incorrect information in the ontologies’. Incorrect grammar.
- Page 3, column 2: ‘the conversion of units and the modelling of their mathematical relation’ -> ‘the conversion of units and the modelling of their mathematical relations’
- In the last paragraph on page 3 the method OntoClean might be mentioned/described.
- Page 5: ‘is need along with’ -> ‘is needed along with’
- Page 5 and 8: ‘the amount of units’ -> ‘the number of units’
- Page 6, ‘it is important to note, that the results just contain lists of potential errors or a representation, which makes it easy to spot them.’ What is exactly meant with ‘a representation’?
- Page 6: ‘a RDF schema’ -> ‘an RDF schema’
- Page 6 and 7: ‘straight forward’ -> ‘straightforward’
- Page 6, ‘A missing query for a specific concept in an ontology is treated as though that concept is not present in that particular ontology.’ This seems not fair towards the particular ontology? Or do I perhaps misunderstand?
- Page 6, ‘the position of a potential system of units’: what is meant with ‘position’? And why ‘potential’?
- Page 7: ‘Beside’ -> ‘Besides’
- Page 7, column 2, paragraph 3: kilo is detected in kilogram, but will hecto be detected in hectare? (only ‘hect’, not ‘hecto’ is present in that term.)
- Page 7, column 2, paragraph 3 (= last paragraph of section 4.4): I have difficulties following this (entire) paragraph. Please rephrase or explain in more detail.
- Page 8, last paragraph, ‘(...) less than three quarters of the units of one ontology are part of the other.’ Shouldn’t that be ‘one quarter’?
- Page 10, ‘While on the one hand ontologies will include next to no prefixed units (...)’: ‘next to no’ is not correct English, I think.
- Page 10: ‘A unit also consist of (...)’ -> ‘A unit also consists of (...)’
- Page 11: in the heading of table 5, there is ‘...’ three times, which I think should be removed.
- Page 12: ‘a spaces’ -> ‘a space’
- Page 13: ‘(...) to be a prefix to.’ -> ‘(...) to be a prefix too.’
- Page 14: ‘a individual’ -> ‘an individual’
- Page 14, ‘We found for duplicates of units (...)’: should ‘for’ be ‘four’?
- Page 14: ‘this general issues’ -> ‘these general issues’
- Page 15, ‘One occasion, where deciding upon the correct value can be difficult are conversion factors.’ Incorrect or unfinished sentence.
- Page 15: ‘Through it is usually easy (...)’ -> “Though it is usually easy (...)’
- Page 15, ‘(...) helps both, users and developers, to (...)’: remove the commas.
- Page 16: ‘a extensible’ -> ‘an extensible’

Review #3
Anonymous submitted on 06/Oct/2017
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

This paper provides an evaluation and comparison of a set of ontologies for units of measure
that are currently in use in the Semantic Web.

Given the (surprising) plethora of such ontologies for units of measure, a paper such as this
is of critical importance for the field. Unfortunately, the authors fall short of this ambition.

The first part of the paper (Sections 1-4) provide only a superficial analysis and is
frustratingly short on details for a journal paper. This is especially the problem with Section 2,
where different approaches are briefly mentioned with a sentence or two of commentary.
The authors need to identify the key ideas that they want to use, and then focus on how these
ideas are being applied in the rest of the paper. In particular, it seems that Section 2 should
inform the methodology used by the authors in Section 4.

I don't see the point of the Wikipedia discussion halfway through Section 2.1.
It is not sufficiently connected to the rest of the paper.

I'm not sure about the purpose of Section 3. If it is simply a glossary, perhaps it is best to
appear as an Appendix. It is a poorly written section that consist of a series of Definitions
without any intervening text (even if these were formal definitions, which they are not).

Section 4 is muddled.

In Section 4.1, there is an odd mix of shallow metrics (number of individuals per concept)
and more significant ones (completeness of an ontology regarding certain relations between the individuals).
In some respects, the authors skip over some of the more interesting forms of evaluation,
such as the correctness of an ontology; ultimately, this kind of ontological analysis must be done.

A more serious problem arises with the following claim:
"In absence of such a reference corpus, we decided to use the union of all individuals used in the different available ontologies."
First, it is not clear what this notion of union means -- can it lead to inconsistencies?
Second, this alone does nothing to specify mappings, yet the authors seem to indicate in the next
paragraph that mappings among the ontologies are a byproduct of the union.
I cannot see how this can possibly be the case.

In Section 4.2, the authors state that
A strictly manual assessment of the given ontologies is quite cumbersome and oftentimes lacks the advantages of an automated, systematic approach.

but no convincing argument is given.

A similar problem arises a few paragraphs later:

Whereas for some concepts a mere comparison by name might be sufficient, others like the units themselves need a more sophisticated approach, which also takes the ordering of terms into account.

Exactly what do the authors have in mind here? Again, a frustrating lack of detail ...

In Section 5, the authors finally hit their stride, and the observations and results found
in this section are both interesting and significant. It is interesting to read how preliminary
results have already led to the revision of at least one of the ontologies being studied in the paper.
It would be more substantial to have a thorough ontological analysis, but regardless, the results
presented in this Section are eminently useful for ontology users and practitioners.

The benefits of this paper ultimately outweigh its flaws.
The paper should be accepted, albeit with major revisions to the first four sections.

Editorial Comments:

The paper is marred by numerous stylistic and grammatical errors that appear throughout the paper.
I have identified most of these below, but I suggest that any future version of this paper be given
to an editor who is a native English speaker.

Awkward Sentences:
Section 2.1:
However, the following subset of ontologies was selected, as they seem to be the most promising candidates regarding the amount of individuals and concepts modeled

Section 2.2
This analysis determined a lack of an ontology containing all important concepts of this domain.

Their approach requires relations between units themselves like, e.g., unit composition to be present in the ontologies, which unfortunately is not guaranteed for all ontologies.

Section 4.1
Although this probably does not reach the same quality level, it can serve as a well enough substitute

Section 5.1
Just looking for the ontology with the most units WD takes first place with almost as much units as all other ontologies combined.

A complete coverage in the discussed way is rarely achieved by any ontology.

Section 5.3
QUDT, which is also one of the medium sized ontologies, contained the following issues: Most important, the ontology is inconsistent.

The first two paragraphs of Section 6 need to be rewritten.

Stylistic Problems:
In Section 2.1, there is the following one sentence paragraph:
Besides these specialized domain ontologies, linked data initiatives also provide data on units of measurement.

At the end of Section 2.1, there is the following hanging sentence:
– Wikidata (WD); community driven repository of factual data for Wikipedia12

The third paragraph of Section 2.2 has conflicting tenses,
as does the first paragraph of Section 5.

Typos
Section 4.1:
One is coverage, which evolves around what concepts are modeled as well as how many and what kind of individuals are included.

evolves --> revolves

In absence of such a reference corpus,

- should be "the absence ..."

Section 5.1
Also notable is the fact that there seems to be little consent which units are essential to an ontology

consent --> consensus?