BimSPARQL: A case of extending SPARQL functions for querying linked building data

Tracking #: 1628-2840

Authors: 
Chi Zhang
Jakob Beetz

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Full Paper
Abstract: 
In this paper, we propose to extend SPARQL functions for querying building data. Building models represented by the Industry Foundation Classes data model are the target data sources to develop extended functions. By extending these functions, we attempt to 1) simplify writing queries according to requirement checking use cases, and 2) retrieve useful information implied in 3D geometry data with an open and extensible approach. Extended functions are modelled as RDF vocabularies and classified into groups for further extensions. We combine declarative rules used in the Semantic Web field with procedural programming to implement extended functions. Compared with query techniques developed in the conventional Building Information Modeling domain, we show the added value of such approach by providing an example of querying building and regulatory data, where spatial and logic reasoning can be applied and data from multiple sources are required. It demonstrates an approach that can be extended and applied for many other use cases. Based on the development, we discuss the applicability of proposed approach, current issues and future challenges.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 24/May/2017
Suggestion:
Accept
Review Comment:

(1) Originality

This is an extended version of a paper published in ECPPM 2016 conference by the same authors. Taking this into
consideration, this is definitely original research.

Moreover, it addresses a major problem of the linked building data: the complexity of IFC data model. Now that BIM
models (in the IFC format) can be translated to RDF graphs based on the ifcOWL ontology, the huge stock of existing BIM
models can be brought to the realm of Linked Data. There are many ways to utilise those models in design coordination,
supply-chain management, facility management, asset management, indoor navigation, and so on. However, the utilisation
of that data is significantly hampered by the complexity of the IFC models.

(2) Significance of the results

The approach presented in the paper is to extend SPARQL to support easier and more powerful queries to models based on
ifcOWL. The simplification problem has been addressed in other ways such as with simplified ontologies, but the
advantage of the approach in the paper is that it is directly applicable to any existing models. There is mostly no
materialisation needed (with the exception of some geometry data) since the reasoning works in backward-chaining
manner.

The simplifications are shortcuts implemented in different ways. There are shortcuts defined at the level of IFC schema,
and also for property sets and quantity sets. Geometrical information is treated with its own functions. All these
extensions address well-known and relevant problems, and they have been implemented in the work.

The paper contains many examples that are well-chosen and interesting. The validation of the approach presented in the
paper is based on a non-trivial building code checking example that is detailed in the appendix.

(3) Quality of writing

The paper is well-written. It is easy to read, line of reasoning is easy to follow, and it is almost without grammatical
errors or typos.

Here are some minor issues I spotted in the text:
- Abstract
- Could you make the abstract a little bit more compact?
- 1: Introduction
- "impossible to retrieve" - remove the word "impossible", things in computer science are only impossible if you can prove undecidability or intractability
- 2. Background
- "The conversion program between IFC and ifcOWL is almost a one-to-one process" - I understand what you are aiming at but this sentence does not communicate it properly. The conversion of IFC data to RDF is straightforward, but the conversion of IFC schema to ifcOWL is nothing but
- "they are usually not obligated and not always reliable due to lack of rigidness in IFC data model and AEC domain" - I don't understand this sentence
- 4. Vocabularies
- "a) domain semantics that usually explicitly represented" - verb is missing
- Table 1: "Properties for single product based on geometry data (see section 4.3))" - two closing parentheses
- Table 2: "schm:hasMatrial" - e is missing
- "However, many geometry types to describe a Body geometry in IFC including e.g. Boundary Representation (Brep), Constructive Solid Geometry (CSG) or Non Uniform Rational B-Splines (NURBS)." - verb is missing
- Listing 4 (and some other listings as well) - Can you format the listings so that line breaks don't come to middle of variable names?
- Table 3 and 4: You could adjust the widths of the columns to save some space
- 5. Implementation
- Listing 8: Describe that the argument to schm:isContainedIn is passed to the query of Listing 8 as arg1.
- I would prefer to have some explanation of portability already here. This should run everywhere where Jena runs, or am I correct?
- 6: Application example
- Table 6 reference is wrong - should be Table 8
- Appendix
- The separate listings could be combined into one or two listings
- The article is a few lines too long to fit to 16 pages

Review #2
Anonymous submitted on 04/Jun/2017
Suggestion:
Major Revision
Review Comment:

I have enjoyed reading the article and the authors presented an interesting approach to extending SPARQL for linked building datasets. There are, however, a few issues that I deem need to be assessed.

There is no evaluation. The authors provided a demonstration that I found not compelling (which I will explain later on). One of the aims was to simplify the SPARQL queries that one needs to formulate. The authors, however, did not report on an evaluation involving users, which makes this aspect difficult to assess. I can see how that is hypothetically the case, but users need to learn a new set of functions (in different namespaces) next to understanding the different models.

Unlike GeoSPARQL, which provides a vocabulary, a set of topological relations, and a set of functions for extending SPARQL, the approach in this article is intended to be extensible. This was demonstrated in Section 6 where additional rules for IBC needed to be defined prior to executing the query in Listing 9 that uses the functions and relations (backed by SPIN) of Section 4. This poses, for me, several problems. First, in order for one to fire a similar query to other BimSPARQL enabled endpoints, those additional rules need to be installed as well.

Second, stakeholders need to go through some knowledge engineering activities for every use case or query they need to support. This demonstration shows how the BimSPARQL ontology (let's call it like that) and rule base can be extended to support a wider set of queries, while I feel that the authors should have also demonstrated, or evaluated BimSPARQL "core", which has been defined. The evaluation of extensions and the creation thereof is different from the evaluation of the core functions and relations proposed by the authors.

As a side note: the extension is focused on IFC. So, in my opinion, IfcSPARQL seems a more appropriate name than the more generic sounding BimSPARQL.

The title implies the use of Linked Data, yet the article does not provide examples of federated queries, nor does it explicitly state that one needs to centralize different datasets. This will make the problem harder, especially if some patterns and functions need to be processed by other services that are not BimSPARQL enabled. However, I do feel that the authors should elaborate more on this aspect, or rework the title a bit to mention RDF building data (for instance).

The article needs careful proofreading. In various places, articles were omitted, sentences were difficult to parse, some informal speech was added, or verbs were not correctly conjugated. I will provide a few examples in the remainder of the review.

Section 1

The importance of integrating data in AEC should be explained in a few sentences, or at least a pointer to a reference provided.

I suggest replacing "more and more" by "increasingly more" as the former is rather informal. (observed in various places in the article).

Remove the adverb "higly" in "highly limited". Adverbs such as "very", "highly", etc. should be avoided. There is a limitation, period.

In "RDF equivalent of [the] IFC data model", use "counterpart" instead of "equivalent" as the latter can be interpreted differently.

The statement "The extended functions in this research are compatible with existing SPARQL environments" is not true. Or the authors mean that it does not extend SPARQL's grammar. But even that does not mean it is compatible as other non-BimSPARQL enabled endpoints may throw an error if it does not recognize a particular filter function. The authors should carefully rephrase this.

In the sentence ending with "... hence many platforms". That depends whether the platform allows one to easily incorporate functions. E.g., by extending Jena's base classes, or relying on SPIN, or... I suggest putting a forward pointer to your specific implementation.

Figure 1 does not provide any information; I think it can be omitted.

Section 2

Explain proxy elements. The others are used later on in the paper or pretty self explanatory.

"This extent" --> "This extension"

Rephrase "still way beyond", as it is informal.

"Hardcoding" instead of "hard-coding".

NL-sfb code is not defined here.

"Table X" is missing the actual reference to a table.

In "... building models formatted in e.g[.] IFC", I suggest replacing "formatted" with "represented".

The following is a comment for most of the queries in the article. Listing 1 retrieves resources that fulfill particular conditions, it does not check. The author should carefully rephrase the description of the queries, or rewrite the queries. The result of the query can be subsequently used to check something, e.g., an empty result implies there are no violates. Or an ask query can be formulated to do a bunch of checks on a dataset. The Data Cube vocabulary [1], for instance, prescribes a set of SPARQL ASK queries to check datasets for potential violates.

Can you add a reference to the construction of the ifcOWL ontology? This would strengthen the rather vague statement that it is "almost a one-to-one mapping".

What do you define to be "domain end users", or do you have a couple of examples?

In the last paragraph, you talk about wrapping relations as functions. I think it be worthwhile dedicated a subsection or paragraph (not in Section 2, but elsewhere with a forward pointer in Section 4) explaining the different and formalizing how you create functions for relations, and/or formulate now filter functions. This would help the reader in understanding that you (implicitly state) that rules are functions. A lot of your functions are actually rules, but the use of "function" might be due to the adoption of SPIN as a rule language in which functions can be defined. In other words, an implementation might be seeping through the design of the approach. Though the author did state later on that other rule languages can be adopted, a careful revision of the text could make design and implementation less tightly coupled.

Section 3 – "Related Work" instead of "Related [R]esearch"

Section 3.1

What do you mean with "full" in "full CRUD"? Why not state that it provides CRUD functionality?

Put the reference for R-Trees behind R-Tree instead of at the end of the sentence.

"... and door etc." needs a comma after "door".

Why is BimRL "technically" a domain specific query language? Explain in one sentence or omit "technically".

In the last paragraph, the authors state: "Although some of them have provided programming interfaces for further extensions, the development work needs significant efforts and are usually limited by the data captured in its internal data model." I feel that this is, to some extent, also the limitation of the approach adopted by the authors. Except for the "core" of BimSPARQL, one needs to create additional rules to support more queries. This involved knowledge engineering (limited, yes, but still), getting familiar with SPIN (the internal data model of the author's approach) and being able to code the function, etc... So at the end of the article, the authors should put things in perspective with respect to that statement.

Section 3.2

"in that paper" instead of "this"

article before "EYE reasoning engine" is missing.

"alternative to" instead of "alternative of"

Section 3.3

This section is missing some related work. [2] presented an approach to adding custom functions to SPARQL queries. Though not for SPARQL, both [3] and [4] presented an approach for representing and using function in generating RDF from non-RDF data sources. The former contains the description in the mapping, and the latter retrieves the function via Linked Data principles.

Recently, [5] presented an approach to extend a Triple Pattern Fragments [6] client with GeoSPARQL. In this approach, it is the client that is extended and the approach thus makes it backward compatible with any server. That allows one to avail of said functions if the data is available on servers. One can also connect to multiple servers, hence formulating – in effect – federated queries. A similar approach can be undertaken for BIM models.

Section 4

The first sentence needs to be rephrased; "target source" is probably the focus, and "extended functions" should be "extension" as it otherwise implies functions to be extended (rather than the set of functions.).

Typo in "implemented"

I would appreciate to see how the authors went about extracting properties and relationships from the sources. Is there a dataset available with some sort of matrix (properties vs. source) where cells are checked and an explanation provided? Can the authors say something about the coverage? Are they sure they have provided a complete set of properties and relationships? Where the use cases complete? This also sounds like a knowledge engineering exercise – as they will provide the input for the different functions in subsequent paragraphs. How was that set validated or evaluated?

Towards the end of the first paragraph, the authors already hint about the extensibility of the approach. I feel that this is where the difference between BimSPARQL "core" and its extensions should be brought up. As mentioned above, this distinction will make the contribution more clear, as well as its extensibility (and when its necessary/appropriate to do so).

"independent from" instead of "independent with"

The end of section 4.1 does not highlight a potential downside of RDF property function, nor does it in Section 7.3. The downside is that people might have the impression that they are retrieving triples that have been explicitly asserted, while the use of filter functions gives them more control. And how would one deal with inconsistencies between things that have been asserted and inferred? Finally, the authors state materialization can improve performance. But they have not conducted an experiment demonstrating that. So I suggest the authors would rephrase that bit.

Section 4.1

Second half of the first paragraph needs to be rephrased. And, again, the query does not check, but retrieves resources fulfilling certain conditions.

The authors should provide URIs for the namespaces in Table 1.

What are the meaning of the different arrows in Figure 2. I see a mixture of different formalisms, and hence the meaning of arrows are difficult to comprehend; even though it is "intuitive".

Section 4.2

"[The] semantics of these..." misses the "The"

I feel that there is an implicit criticism of a) IFC OWL being complex, and b) a lot of the semantics being stored in literals (strings) rather than URIs. Am I correct?

Section 4.3

In this study you transform geometries in a particular representation for analysis purposes. Would that be something you would prescribe for all BimSPARQL implementation, or is that left up to the implementation? This is something that should be made clear. Figure 5 does *not really* explain the sentence it is put in.

Section 4.4.1

There is no indication of related work or comparing the coverage with other vocabularies/standards such as GeoSPARQL. Table 4 only highlights four, please indicate how many you have and how broad your coverage is. The descriptions in Table 4 need to be rewritten and the one for intersects contains a mistake (copy/paste error of disjoints).

In Listing 5, walls touch a door. As a naïve reader, I would think that walls contain a door. But walls have void elements that contain doors, windows, and whatnot. It might be worth clarifying that to the reader.

Section 4.4.2

You state you currently offer the function in Table 5, but the caption of that table indicates these are examples. Is that table complete or not?

Section 4.5

You choose to materialize RDF, but later on allude to process geometry data on the fly at query runtime. I wonder whether this really matters at this point in the article as these could be considered implementation strategies. But I do understand why geometries are treated differently from the other functions. That said, it might be worthwhile referring to other related work looking at a particular problem in GeoSPARQL (duplication of representation of geometries); storing the literal and the geometry representation for indexing. This has been touched upon in [7].

Section 5

I suggest calling this section "an implementation", emphasizing that it is a way to implement the system.

Misses aforementioned related work about representing and reusing function in RDF, which facilitates interoperability, reuse, transparency and even traceability.

It also refers back to Listing 2, but it might be good to list the full SPIN listing of that function here (as it is how it is implemented). Having at least one SPIN listing gives the reader an idea how the other queries in the appendices are wrapped.

Since you mention here that SPIN rules can call other SPIN rules, it might be worthwhile mentioning the possible problem of tractability and recursion; how can you ensure that your system terminates (in a reasonable amount of time) if it relies on your extended functions; functions that have been created for particular use cases (see Section 6), and reliance on other libraries (e.g., those for geometries).

Section 6

I already elaborated on the issues with this demonstration; the need to create additional functions for the use case. These extensions require additional knowledge engineering and coding activities, as well as deploying those.

Section 7.1

Discusses the interoperability and flexibility of the approach. I agree to some degree, but from the moment one creates custom functions for use cases in one deployed system, these won't run on other systems as they need to be deployed first. The representation of functions in [3] and [4] in a Linked Data context might be a suitable path fetching function from the Linked Data cloud, but that will not be of use if the functions also relies on the deployment of specific libraries.

The authors could spend some more time elaborating on the limitation of class instantiation.

For Section 7 (and other parts of the paper) the authors could refer to [8]; the present an overview of weaknesses and challenges of geospatial semantic web data that is relevant for this paper, of which one is query performance.

References

Be careful, some references contain mistakes in terms of capitalization (2), formatting (15), spacing (19 and 44), and so on.

As for the appendix, again make sure the descriptions match the queries and I suggest to use suitable long-term preservation platforms for the vocabularies and rules.

[1] https://www.w3.org/TR/vocab-data-cube/
[2] Blake Regalia, Krzysztof Janowicz, Song Gao: VOLT: A Provenance-Producing, Transparent SPARQL Proxy for the On-Demand Computation of Linked Data and its Application to Spatiotemporally Dependent Data. ESWC 2016: 523-538
[3] Christophe Debruyne, Declan O'Sullivan: R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings. LDOW@WWW 2016
[4] Ben De Meester, Wouter Maroy, Anastasia Dimou, Ruben Verborgh, Erik Mannens: Declarative Data Transformations for Linked Data Generation: The Case of DBpedia. ESWC (2) 2017: 33-48
[5] Christophe Debruyne, Eamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments. LDOW@WWW 2017
[6] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37-38: 184-206 (2016)
[7] Blake Regalia, Krzysztof Janowicz, Grant McKenzie: Revisiting the Representation of and Need for Raw Geometries on the Linked Data Web. LDOW@WWW 2017
[8] Kostas Patroumpas, Giorgos Giannopoulos, Spiros Athanasiou: Towards GeoSpatial semantic data management: strengths, weaknesses, and challenges ahead. SIGSPATIAL/GIS 2014: 301-310

Review #3
Anonymous submitted on 07/Jun/2017
Suggestion:
Reject
Review Comment:

General Comments.
This paper describes a general framework to extend functions in Sparql to query IFC based building data. These data are stored in a triplestore in a IfcOwl Ontology structure. The extension concerns functions for schema level semantics, functions for instance level semantics, functions for product geometry and functions for spatial reasoning.

In my opinion, the proposal is too general to fit with the focus and the quality required for the journal
I suggest the author to develop the idea, to implement the differents functions and to resubmit the paper with a more detail description an results.

Specific comments
This approach is original in the "IFC environment". But, in other domains, the use of specific data types (i.e. in geographical information systems with 2D shape types) it is usual. Then, the idea is good but I don't believe that all the extensions are clearly required according to the nature of the ontology. For example, the authors propose to convert the various geometry description in triangulation boundaries. Then some analysis algorithm will be developed. What type of algorithm ? Some made with Sparql ?
On the abstract: Please, could you explain why extended functions are required and not only the goal of this extension? Is it related to the nature of the IFC data model which can be extended or related to the nature of the IfcOwl, an ontology which can be linked to external resources?
I think it will be better to explain directly the value of your approach. Do you reach the goal you attempt in the third sentence of this abstract?

The term “methodology” defines the science that study the method. Please could you check if you make a confusion with the term “method”. Then the question is if BIM is a method why the "M" means modeling. Nevertheless I agree with the fact that IFC is an open canonical model define to share building data between heterogeneous software developed for the AEC (and facility management) domain.
“Using BIM and IFC….”, I’m annoyed by this sentence. I can accept the term “BIM application” but the term IFC application is not clear. IFC is a model. Applications export or import IFC files.
“As many researchers have discussed…”. The term many researchers is related to only one paper and made by one of the authors of this present paper. Please, more precision (closed world nature of your sentence). Maybe you can place this sentence at the end of the paragraph (just before “Using the Ressource…”)
“However, BIM data models…” the term creation is not really true. But I agree with the term exchange. One of the main problems of the IFC is that people want to store them, not only to make them as an exchange data model. Moreover people think that we can natively aggregate different IFC files by merging the lines of the file. As you present in your paper, it’s not so easy.
[38,3,33], if the ordering is not important, please make the reference is ascending order. Maybe you can delete the reference 3, previously cited.
“The AEC domain [33]”. Already cited. Maybe you can choose one place or another reference.
“In this context…” The sentence is not clear. SPARQL seems to be one of the most popular language to query RDF graph (then ontology made from RDF graph). The term “more and more practical” is not well suited.
“in comparison with…” some ambiguity in this sentence. [6] develops a specific language to make possible 3D topological analyses of a building. I’m not sure that SPARQL can do that. Maybe we can consider that the reasoning issues you target is one of the numerous domain-specific query languages. I think that the first part of the sentence until the references can be deleted.
“By just using…AEC domain”. Please could you give us some use case or function that motivate this paper?
“For example…”. In the Geospatial environment they used new concepts such as those defining the 2D description of a ground. They have to develop specific operators to manipulate geodata. Do you want to do the same thing ? Please rather than justifying you approach by explaining that the other did it, present the type of operators you add in your BIMSPARQL.
“There are currently…. 1)”: a set of functions for what?
“There are currently…. 2)”: What do you mean by lower-level constructs. Is it an issue?
Fig1. What is new in this figure according to other approaches made in the LDAC group?

“In the last two decades…”: IFC files are not used only for AEC sector. Once the building is built, other stakeholders for the facility management used the IFC (see papers on this domain wrote since 2002 or the reference [22] in your paper)

For the rest of the paper, the quality of writing is correct.
Unfortunately, sections 4 to 6 are not really detailed to convince me.
The proposal is not complete with the description of all the functions; It is an intention of research and development without benchmarks.