Referring to multiple unspecified objects of a type: multi-instance fact pattern family

Tracking #: 693-1903

Authors: 
Vojtěch Svátek
Martin Homola
Ján Kluka
Miroslav Vacura

Responsible editor: 
Krzysztof Janowicz

Submission type: 
Other
Abstract: 
We introduce the problem of capturing multi-instance facts: modeling situations when a given object is related to multiple unspecified objects of a certain type. We describe the situation in an abstract way (using the PURO ontological background modeling method) and provide three alternative patterns for representing it in OWL. The alternatives make use of different ontology pattern structures: logical structures, naming conventions and annotations; pattern-based invention of new ontology terms is considered aside with reuse of existing terms. The multi-instance fact problem is also aligned with the closest one among the popular logical pattern families, namely, CPV.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Enrico Daga submitted on 28/Aug/2014
Suggestion:
Reject
Review Comment:

The authors introduce the modelling issue of referring to multiple unspecified objects of a type and propose a “multi-instance” pattern family (MIF).
The article starts from describing the problem, referring to the PURO modelling method, and then proposes three potential solutions in RDFS/OWL.
Each solution makes use of different modelling constructs: logical structures, naming conventions and annotations. Finally, the authors discuss the relation between the proposed pattern and the Class as Property Value pattern (CPV).

The problem is described as follows:
“There is a distinguished real-world entity that is related, by the same kind of relationship, to (possibly even one, but usually) multiple undistinguished objects of a certain type."
An example may be something like "Amazon sells Books”. Clearly the problem here is not to relate Amazon to the set of Books for sell, but to each of the undefined instances without materialising all of them (because still to be known, for example).
This is an interesting “encoding” problem, in the sense that it is obvious that modelling this in RDFS/OWL is not straightforward.
The authors propose three solutions, all extending existing partial solutions: 1) relying on an existential quantifier that restricts the type of the entity with the property and the target type 2) the usage of an intermediate entity that acts as a placeholder for all the unspecified objects and 3) the specialisation of the property making it pointing to the type of the target entities directly.
All these preexisting solutions have some limitations, the authors say. For example, the usage of an existential quantifier, while it is sound at the logical level, does not reflect the semantic of multiplicity rigorously. This is true, but the authors say nothing about why to express this semantic distinction is useful.

I think the main contribution of the article is the specification of the modelling issue. My main problem with this paper is that the usefulness (or potential usefulness) of the pattern is not clear. This is also a requirement in ontology pattern paper submissions.
In addition, reading the paper, it looks like there is the assumption that making explicit a semantic distinction in a model would be a value on itself, which I doubt - but this is my personal opinion.

Abstract:
Authors say that their problem is going to be solved in OWL, but two of the three solutions do not need OWL expressiveness.

1. Introduction:
The paper starts from saying that referring to multiple unspecified instances of a type is a common problem. While I agree that it is a common situation - as the authors demonstrated by reporting the presence of it in e-commerce datasets - I am less keen to see it as a problem without having clear the advantages of having it solved.
In other words, I am quite convinced that patterns should be (possibly optimal) solutions that bring some advantages. In this paper the authors discuss three ways of achieving their goal, but none of them are discussed in terms of concrete benefits. The lack of use cases where the raised limitations of existing solutions can cause problems might be the reason why, as a result, the reader can not be convinced that this is a problem at all.

As part of the motivation, the main argument about the limitation of the existential quantifier in OWL is that its semantic is strictly “at least one”. Authors say that the interpretation of existential quantifiers in OWL is psychologically biased against multiplicity. This is true, because the specification of the existential quantifier is focused on the logical implications of it and not on the psychological ones.

It should be proved that this limitation has important implications before searching for a solution to it. I do not see it at the moment.

The other alternative solution is the usage of a blank node as existential quantifier. It is written that blank nodes are "considered as bad practice by a significant part of the linked data community and have no meaning to the description logic community”. These two statements are a bit shallow. The first need a citation and the second is only the consequence of the fact that the authors focus on a solution at the OWL level. Blank nodes have no special meaning in OWL, and not for the DL community. RDF, RDFS and OWL layers have different semantic, that’s all.

The characteristics of the multi-instance fact pattern does not seem to satisfy clear requirements (competency questions). The authors list a number of reasons why the proposal should be a contribution. The only one that I see as a concrete requirement is "possibilities to approximate the cardinality of the relationship considered”. However, this paper does not go beyond proposing an annotation property to specify the kind of multiplicity, leaving out the analysis on the possible values of it as future work. Having this discussion here would have probably enforced the motivation for having the pattern.

2. Multi-instance fact: background model
This section is very clear, and specifies very well the modelling issue.

3. Pattern modeling inventory
In this section the authors summarise different aspects of pattern based modelling that have a role in the propose solution: logical constructs, naming conventions, annotations, entity reuse.
This is useful.

4. MIF pattern family
This section of the paper is focused on the description of the MIF pattern family. There are three alternative concrete solutions:

4.1. Existential restriction with annotation
This solution extends the OWL based existential restriction. The authors insist that "the existential restriction pattern does not allow to express the multiplicity of the relationship to anonymous instances at the logical level”. It is not clear to me whether it does have any sense. The value of the multiplicity property should be a fuzzy value - ‘many’^^xsd:string, for example. Why we should desire it? In addition I do not see how adding an annotation property that specifies the multiplicity as a string value should express the multiplicity at the logical level, going beyond the simple OWL-based solution.
4.2. Linking to placeholder individual
The second option is extracted from Good Relations, and make use of a placeholder individual as the range of the property, representing the unspecified multiplicity of entities. .
This method has been also proposed as “Template Instance" pattern [1] in a very similar situation. There, the problem was to collapse a number of equivalent entities to a single one to reduce the space of the data. Authors might want to also discuss their work with relation to it. Similarly to the previous solution, authors discuss the need of specify the kind of multiplicity and the various options about how to encode this in RDFS/OWL.

4.3. Shortcut property with name and annotation
This last option is based on a special property that points directly to the type of the unspecified objects. This solution is also extracted from Good Relations. However, the paper does not say much about the benefit of it with respect to the other two.

5. Overview and selection criteria and 6. Implicit pattern usage
Here is discussed the pros and cons of the three solutions. I think this should be extended, or a real comparison of the three solutions performed, thus to elect a single optimal solution. This is another reason why I feel the work to be not mature enough for publication in a journal.

7. Relationship to the CPV family
In this section the authors discuss the relation with CPV, and again the main advancement of the proposed method (section 4.3) is that it specifically accounts for the multiplicity of the relationship between the source object and target class.

---

As a summary:

On Quality of the pattern:
The problem is very interesting, and I believe it is a hard one in RDFS/OWL.
I do not like much the three alternative solutions. One of the motivations of having patterns is that they provide good practices. From the overview of the alternative patterns in Table 1 it seems that the OWL-based solution does not have special limitations (the consequence is a requirement instead).

On Usefulness (or potential usefulness) of the pattern
The proposed pattern does not seem to provide any significant advantage with respect to existing solutions.

On Clarity and completeness of the descriptions
The paper is well written and clear.

[1] Nyulas, Csongor, Tania Tudorache, and Samson W. Tu. "The Template Instance Pattern." WOP. 2012.

Review #2
By Ronald Denaux submitted on 12/Sep/2014
Suggestion:
Major Revision
Review Comment:

The paper introduces the Multi-instance fact (MIF) pattern and
describes 3 alternatives to represent this pattern. The 3 alternatives
are easy to understand as they are accompanied by diagrams, textual
descriptions and OWL axiomatizations (as well as some examples from
the GoodRelations vocabulary). Also, design decisions are mentioned.
However, the paper lacks a convincing discussion of use-cases for the
pattern and thus fails to motivate the need for the pattern. Another
issue is that some comparison to related work is missing or has not
been included as part of this paper. Furthermore, the comparison
between the 3 proposed alternatives is missing some important details.
Addressing these issues would require a major revision of the paper.

- Summary of paper and contribution

The paper focuses on a data modelling problem of representing a fuzzy
number of undistinguished (but typed) instances related via a property
to a distinguished instance. It argues that current OWL and RDF(S) are
insufficient for modelling this situation faithfully. It then
describes the problem using an abstract schema and relates it to an
existing pattern (classes as property values). Next, the paper
provides an overview of currently available data modelling resources
and trends in the community: OWL and RDF pattern collections (some of
them reusable as mini ontologies), naming conventions and annotations
vocabularies. Then, the 3 alternative patterns for modelling the
pattern are presented in detail. A short comparison of the approaches
is given and some usage data is provided for the pattern usage within
the GoodRelation vocabulary. Finally, the work is compared to previous
related work.

The MIF pattern addresses a combination of 2 data modelling
problems: relating an instance to a subset of a class and assigning a
fuzzy number for the size of that subset. The first problem is fairly
straighforward to model in OWL, but the second subpattern cannot be
expressed within OWL (and hence neither is the combination). For the
second problem, the paper suggest using an annotation property with a
fuzzy value, which can be attached to either an axiom or a placeholder
entity.

- Lack of convincing use-cases and motivation

The main issue with the paper is that it does not provide a good
motivation for the pattern. It simply states that (i) it is common to
need to model a set of multiple unspecified entities, (ii) that it
typically occurs when modelling something possibly happening in the
future and (iii) gives a rough example about "offers to sell". It then
moves on to describe basic modelling approaches in OWL and RDF and
arguing why these are not sufficient. The next sections then go into
details about what is being modelled and alternatives for representing
the data. Only in page 5 do we see a specific example of a data
modelling problem in the GoodRelations ontology which requires part of
this pattern (it does not require the fuzzy multiplicity value).
Giving a specific use-cases and examples early in the paper would
greatly help to motivate the need for the pattern.

- Incomplete discussion of related work

The discussion about related patterns focuses on the CPV work by Noy,
Uschold and Welty. However it refers to a previous work by the authors
stating that MIF only relates to a subcategory of CPV and draws a
relation between alternative 1 in the paper to Approach 4 in CPV. It
would be better to summarise the analysis in [5] to make this easier
to understand within this paper. Also, at first glance, Approach 2
in CPV seems closely related to the placeholder alternative while the
shortcut property (alternative 3 in the paper) seems like a special
case of Approach 1 in CPV. Obviously in all of these cases you are
also introducing the multiplicity annotation, but this is only an
extension.

The discussion about related patterns does not include a discussion
about representing fuzzy multiplicity. I am not an expert in this
area, although there were some approaches trying to integrate fuzzy
logic with OWL ontologies, which may be worth comparing to. Since the
paper does not go into detail about actual use-cases and requirements
for such fuzzy numeric values for cardinality, it is difficult to
mention suitable alternatives. But simple OWL cardinality approaches
may work, for example by combining an existential restriction (i.e. at
least one) with a maximum cardinality restriction with a high enough
number (e.g. at most 10K?).

- Section 5 provides a rather sparse comparison and selection
criteria.
+ you say that alternative 1 is best-suited for OWL, but all 3
options produce valid OWL-DL, it's just that alternative 1 more
closely captures the intended meaning without relying on
human (or additional non-standard) reasoning, like special
consideration for placeholder and punned individuals.

+ You mention that alternative 2 has the advantage of allowing you
to declare additional features for the undistinguished
individuals. However, this is also possible with alternative 1,
since the existential restriction does not have to use T as a
filler, but can use an anonymous class with further restrictions.

+ Some of the consequences in Table 1 are not mentioned in section 5
and vice versa.

+ All 3 alternatives use the same mechanism for capturing the fuzzy
multiplicity problem

+ Since you've found example usages in the GoodRelations vocabulary,
it would be good to discuss why the GoodRelations designers chose
for the placeholder and shortcut property alternatives rather than
the existential restriction (especially since they did not have to
deal with the fuzzy multiplicity requirement). This suggests the
latter 2 alternatives have some practical advantages over
alternative 1, which are not clarified in the paper.

- Minor details
- p2 "blank nodes... considered as bad practice" would benefit from
a reference to a paper discussing this
- p3 at the end of section 2: affairs (as in 1) -> as in Figure 1
- section 5, mentionts Table 5, but should be Table 1.
- Section 4, the patterns are described following a common structure:
+ motivation of approach
+ figure, details and syntactical form
+ examples of pattern usage (if available).

It would aid readability to keep this format for the 3 patterns,
which is not occurring: ther first pattern does not show a usage
example (you could mention that no usages have been found, after
following some search approach). Section 4.3 provides the example
before providing the syntactical form.

Review #3
Anonymous submitted on 14/Oct/2014
Suggestion:
Accept
Review Comment:

The paper discusses the problem of capturing multi-instance facts (MIFs), which are described as modeling situations when a given object is related to multiple unspecified objects of a certain type.
The authors tackle this problem by describing three different approaches for modeling MIFs that differ from the baseline of using existential restrictions or blank nodes, i.e.:
* the first based on the extension of existential restrictions by means of annotations aimed at capturing the arity of the relations on which the restrictions are formalized;
* the second based on the usage of named individuals used as placeholders for capturing the MIF;
* the third based on punning.

These three approaches characterize the MIF pattern family.
The use case supporting the need of this pattern family is convincing: modeling something happening in the future, such as a situation that involves a company that offers to sell an unspecified number of physical products of some type. Additionally two of the solution proposed (i.e., the second and the third) are adopted for modeling real data in the Linked Open Commerce dataset with the GoodRelation ontology.

Each solution is well discussed by using an appropriate terminology, examples, diagrams and formal notations. Additionally the are compared to each other. Probably few lines about the DL expressivity deriving from each solution could be helpful to the reader.

The relationship to the CPV family is clearly explained, but more detailed description of CPV is needed.


Comments