Review Comment:
The paper describes an ontology for documenting governmental decisions and actions, in order to support transparency of government. The topic is interesting and relevant, however, the paper is unfortunately not very well written, has an unclear focus, and there are also potential issues in the quality and reusability of the resource itself.
First of all, the paper is overly long to be an ontology paper, which is recommended to be a short paper, typically below 10 pages. This paper is 25 pages in total. Still, this is not the main problem of the paper, since there are quite a few things in the paper that could be cut, see comments below.
What I instead would claim is the main shortcoming of the paper is the unclear purpose and scope of the ontology, as well as the lack of a sufficient evaluation/application of the ontology. As it is presented now, the paper neither makes clear exactly why this ontology is needed, i.e. what gap it fills and what the benefit of having an ontology for this data is, nor what exactly it is to be used for. The title of the paper is pretty generic, and from that it sounds as if the ontology covers all kinds of decisions and acts (does this mean actions or official documents by the way?), but in the end the only focus seems to be on financial decisions, such as procurement and funding decisions. This delimitation is not made explicit anywhere in the paper but rather can be understood by the reader from studying the CQs of the ontology. Although some generic use cases are presented, the paper would benefit from a concrete application - how has this ontology been used in the end? This is both for setting the scope of the ontology, but also for evaluating it through its usage, with actual data. As it stands now, the paper lacks an application-focused evaluation of the ontology. It is debugged, and analysed in terms of its characteristics, but the only thing that gets close to an evaluation are sections 5 and 6.2. Section 5 though reads more as an example of structuring data in accordance with the ontology, which is then loaded in a triple store and queried through SPARQL. To me it seems highly unlikely that the end usage is to visualise and query the data through the GraphDB GUI, I assume that this is just a way to verify the CQs? Section 6.2 discusses some inferences of the ontology. But neither section really describes the intended application that should use the ontology, why it is useful and correct, with an appropriate coverage, complexity etc., and how will it benefit the end-users?
This also relates to the fact that the discussion of related work in the paper is poor. Some related work is listed, but mainly in terms of ontologies that are reused by the proposed ontology, and not in terms of alternatives or “competing” ontologies. For instance, how does this ontology relate to other open government initiatives, such as data.gov and their vocabularies? Or the Australian AGRIF ontology (https://raw.githack.com/agldwg/agrif-ont/master/agrif.html)? A search in LOV also reveals a number of other potentially related ontologies (see https://lov.linkeddata.es/dataset/lov/vocabs?&tag=Government ). From a related work section I do not only expect to see a description of the reused ontologies/previous work, but more importantly a discussion on related efforts and why/why not they were reused or taken into account - what are the gaps? Why is this new ontology needed? In particular since this ontology seems to be quite specific, one needs to consider the contribution of publishing this paper. Such a contribution could be if this ontology, although quite specific for Greece at the moment, models something that none of these other government data ontologies do (which could then be useful to apply also in other countries), but this is not clear from the paper. Which in turn makes the contribution of the ontology paper quite unclear - who will benefit from reading this, or reusing the ontology?
With this said, the next thing to consider is the quality of the ontology itself, which is quite hard to judge based on the paper and associated resources. The paper contains a link to the actual ontology, which does contain some comments, but partially in Greek. Also other links in the paper takes you to pages in Greek, and Figure 3 in the paper is completely opaque to someone who does not read Greek and would need some more explanation to understand what the document is actually about. Additionally, here suddenly it sounds as if the usage of the ontology will be some kind of information extraction task from text documents, or is it manual annotation? A proper online documentation of the ontology (e.g. as a html page generated from the ontology), at its URI, would also greatly help. Additionally, Figure 4 is quite confusing. The notation of the figure is not explained, e.g. what the boxes and arrows mean in terms of actual implementation in the ontology, i.e. OWL constructs. There is no standard UML notation for OWL as far as I am aware. Nevertheless, I assume that the boxes are classes, and arrows represent object properties? But why are the object properties called “connections”, both in the text and in the figure? I assume “existing” means imported, and “new” means locally defined in the ontology? But what does indirect mean? Inferred? There are also types of arrows used that are not in the legend, e.g. the long dashed one, such as between legalResource and Value. I further assume that the lists inside the boxes are datatype properties, but what does it mean that some have bullets, some have a + and some nothing in front of them? Some things also seem a bit strange and might require some explanation, e.g. why Document is a subClassOf Contract and not the other way around. There is no connection between organisations an agents - does this mean that you don’t see organisations as agents, as in the W3C org ontology, or is it just omitted in the figure? Figure 2 is actually even more confusing - both in terms of its legend/notation (totally different from Figure4), but also in terms of the relation between these ontologies. Is this a picture of the same ontology, or a different one? If the former: why have two figures describing the same ontology in two notations? If the latter: what is the relation between the two ontologies? My guess is that the presented ontology is somehow an extension of a previous ontology, which is somehow hinted in the text, but the paper needs to describe this relation in detail. Is the old ontology reused (imported?) and extended, or is it replaced and remodelled completely?
Some more detailed comments on parts of the paper, in addition to the discussion above:
- What is the name of the ontology? Just d2kg or d2kg-OWL as in the title?
- Section 1, second to last paragraph: why is an ontology needed in this case? Does it have to do with data integration, i.e. that the organisations represent decisions in different ways? Just having to upload some data does not necessarily require an ontology.
- Introduction to section 2.4: is it Internet or the Web? And is it really 28 million decisions per day??
- Both the use cases on page 7, and the CQs on page 8 are quite vague and ambiguous, and in particular I am not sure about the terminology. What is an economic operator? Is it an organisation or some kind of measure/formula? And what does “top” refer to in CQ1 - receiving most funds? Most recent? Most frequently receiving funds? How does the notion of “contracting authority” in CQ4 relate to the general notion of “organization”, and to “economic operator”? What does appointment mean in CQ5, are these meeting appointments or something else? It seems detached from the other CQs - why is this interesting in terms of transparency of spending? Use case 2 seems to be about tenders, but CQ3 mentions instead decisions/acts, how are they related to tenders? Also in use case 3 there are highly ambiguous CQs, containing terms such as “most popular” (how can this be measured?) and “appointed” (are these employment contracts?).
- What does “document analysis” refer to in section 4.1? Is this a part of the ontology engineering methodology, i.e. the way the ontology has been built, or is it another use case of the ontology, i,e. to annotate documents or do information extraction? Similar for 4.2 - is this really about building the ontology, or about annotating documents using the ontology?
- In section 4.3.1 the authors mention importing other ontologies, but it is unclear to what extent this is actually done. In the ontology file directly linked from the paper there seems to be no imports at all, only external elements referenced by their URIs. However, in the github repository the version of the ontology there seems to import one ontology. It should be made clear what the architecture of the ontology is and how it technically reuses the other ontologies.
- It is unclear what Figures 5 and 6 represent. For sure these do not show the URIs of the two resources. It seems rather like the results of some DESCRIBE query over a specific URI, displayed for some reason in XML rather than in the return format of a SPARQL query.
- What do you mean with Semantic Graph Database in section 5.1?
- Figure 8-10: there is no point of showing screenshots of the query interface. If the authors want to show some example queries and their results this may be relevant, but not in the form of screenshots. Similar arguments hold for figures 11 and 12, which are screenshots from Protégé.
- Section 6.2 needs clarification. Do you mean that you infer a domain restriction for the property “hasAwardCriterion” somehow? Or are you using a domain restriction to perform the inferences? Similarly I do not understand what you mean that you “infer concepts” for the other bullet points. Please clarify.
- Please introduce the OntoMetrics measures in 6.3 briefly, so that the table is interpretable without looking up the reference. Also, I am not sure about the discussion in the bullet points - could this be made more specific? Statements such as “good coverage in the range of concepts” are quite vague and unclear.
- The conclusions section needs to be re-written to state specific conclusions that can be drawn from the presented results. In some parts it now more reads like a general discussion, and several paragraphs are unclear and ambiguous. Paragraph 2 in this section, for instance, is completely unclear to me. Also in paragraph 3 there are unclear statements, such as “The benefit is evidently the scalability of the ontology…” while no such scalability assessment has been done in the paper.
- Reference list can be improved. In many cases the name of the conference is used instead of the full title of the proceedings volume, also volume numbers and series are often missing. Style is sometimes different, i.e. sometimes the year is in parenthesis, sometimes not.
|