Review Comment:
I have enjoyed reading the article and the authors presented an interesting approach to extending SPARQL for linked building datasets. There are, however, a few issues that I deem need to be assessed.
There is no evaluation. The authors provided a demonstration that I found not compelling (which I will explain later on). One of the aims was to simplify the SPARQL queries that one needs to formulate. The authors, however, did not report on an evaluation involving users, which makes this aspect difficult to assess. I can see how that is hypothetically the case, but users need to learn a new set of functions (in different namespaces) next to understanding the different models.
Unlike GeoSPARQL, which provides a vocabulary, a set of topological relations, and a set of functions for extending SPARQL, the approach in this article is intended to be extensible. This was demonstrated in Section 6 where additional rules for IBC needed to be defined prior to executing the query in Listing 9 that uses the functions and relations (backed by SPIN) of Section 4. This poses, for me, several problems. First, in order for one to fire a similar query to other BimSPARQL enabled endpoints, those additional rules need to be installed as well.
Second, stakeholders need to go through some knowledge engineering activities for every use case or query they need to support. This demonstration shows how the BimSPARQL ontology (let's call it like that) and rule base can be extended to support a wider set of queries, while I feel that the authors should have also demonstrated, or evaluated BimSPARQL "core", which has been defined. The evaluation of extensions and the creation thereof is different from the evaluation of the core functions and relations proposed by the authors.
As a side note: the extension is focused on IFC. So, in my opinion, IfcSPARQL seems a more appropriate name than the more generic sounding BimSPARQL.
The title implies the use of Linked Data, yet the article does not provide examples of federated queries, nor does it explicitly state that one needs to centralize different datasets. This will make the problem harder, especially if some patterns and functions need to be processed by other services that are not BimSPARQL enabled. However, I do feel that the authors should elaborate more on this aspect, or rework the title a bit to mention RDF building data (for instance).
The article needs careful proofreading. In various places, articles were omitted, sentences were difficult to parse, some informal speech was added, or verbs were not correctly conjugated. I will provide a few examples in the remainder of the review.
Section 1
The importance of integrating data in AEC should be explained in a few sentences, or at least a pointer to a reference provided.
I suggest replacing "more and more" by "increasingly more" as the former is rather informal. (observed in various places in the article).
Remove the adverb "higly" in "highly limited". Adverbs such as "very", "highly", etc. should be avoided. There is a limitation, period.
In "RDF equivalent of [the] IFC data model", use "counterpart" instead of "equivalent" as the latter can be interpreted differently.
The statement "The extended functions in this research are compatible with existing SPARQL environments" is not true. Or the authors mean that it does not extend SPARQL's grammar. But even that does not mean it is compatible as other non-BimSPARQL enabled endpoints may throw an error if it does not recognize a particular filter function. The authors should carefully rephrase this.
In the sentence ending with "... hence many platforms". That depends whether the platform allows one to easily incorporate functions. E.g., by extending Jena's base classes, or relying on SPIN, or... I suggest putting a forward pointer to your specific implementation.
Figure 1 does not provide any information; I think it can be omitted.
Section 2
Explain proxy elements. The others are used later on in the paper or pretty self explanatory.
"This extent" --> "This extension"
Rephrase "still way beyond", as it is informal.
"Hardcoding" instead of "hard-coding".
NL-sfb code is not defined here.
"Table X" is missing the actual reference to a table.
In "... building models formatted in e.g[.] IFC", I suggest replacing "formatted" with "represented".
The following is a comment for most of the queries in the article. Listing 1 retrieves resources that fulfill particular conditions, it does not check. The author should carefully rephrase the description of the queries, or rewrite the queries. The result of the query can be subsequently used to check something, e.g., an empty result implies there are no violates. Or an ask query can be formulated to do a bunch of checks on a dataset. The Data Cube vocabulary [1], for instance, prescribes a set of SPARQL ASK queries to check datasets for potential violates.
Can you add a reference to the construction of the ifcOWL ontology? This would strengthen the rather vague statement that it is "almost a one-to-one mapping".
What do you define to be "domain end users", or do you have a couple of examples?
In the last paragraph, you talk about wrapping relations as functions. I think it be worthwhile dedicated a subsection or paragraph (not in Section 2, but elsewhere with a forward pointer in Section 4) explaining the different and formalizing how you create functions for relations, and/or formulate now filter functions. This would help the reader in understanding that you (implicitly state) that rules are functions. A lot of your functions are actually rules, but the use of "function" might be due to the adoption of SPIN as a rule language in which functions can be defined. In other words, an implementation might be seeping through the design of the approach. Though the author did state later on that other rule languages can be adopted, a careful revision of the text could make design and implementation less tightly coupled.
Section 3 – "Related Work" instead of "Related [R]esearch"
Section 3.1
What do you mean with "full" in "full CRUD"? Why not state that it provides CRUD functionality?
Put the reference for R-Trees behind R-Tree instead of at the end of the sentence.
"... and door etc." needs a comma after "door".
Why is BimRL "technically" a domain specific query language? Explain in one sentence or omit "technically".
In the last paragraph, the authors state: "Although some of them have provided programming interfaces for further extensions, the development work needs significant efforts and are usually limited by the data captured in its internal data model." I feel that this is, to some extent, also the limitation of the approach adopted by the authors. Except for the "core" of BimSPARQL, one needs to create additional rules to support more queries. This involved knowledge engineering (limited, yes, but still), getting familiar with SPIN (the internal data model of the author's approach) and being able to code the function, etc... So at the end of the article, the authors should put things in perspective with respect to that statement.
Section 3.2
"in that paper" instead of "this"
article before "EYE reasoning engine" is missing.
"alternative to" instead of "alternative of"
Section 3.3
This section is missing some related work. [2] presented an approach to adding custom functions to SPARQL queries. Though not for SPARQL, both [3] and [4] presented an approach for representing and using function in generating RDF from non-RDF data sources. The former contains the description in the mapping, and the latter retrieves the function via Linked Data principles.
Recently, [5] presented an approach to extend a Triple Pattern Fragments [6] client with GeoSPARQL. In this approach, it is the client that is extended and the approach thus makes it backward compatible with any server. That allows one to avail of said functions if the data is available on servers. One can also connect to multiple servers, hence formulating – in effect – federated queries. A similar approach can be undertaken for BIM models.
Section 4
The first sentence needs to be rephrased; "target source" is probably the focus, and "extended functions" should be "extension" as it otherwise implies functions to be extended (rather than the set of functions.).
Typo in "implemented"
I would appreciate to see how the authors went about extracting properties and relationships from the sources. Is there a dataset available with some sort of matrix (properties vs. source) where cells are checked and an explanation provided? Can the authors say something about the coverage? Are they sure they have provided a complete set of properties and relationships? Where the use cases complete? This also sounds like a knowledge engineering exercise – as they will provide the input for the different functions in subsequent paragraphs. How was that set validated or evaluated?
Towards the end of the first paragraph, the authors already hint about the extensibility of the approach. I feel that this is where the difference between BimSPARQL "core" and its extensions should be brought up. As mentioned above, this distinction will make the contribution more clear, as well as its extensibility (and when its necessary/appropriate to do so).
"independent from" instead of "independent with"
The end of section 4.1 does not highlight a potential downside of RDF property function, nor does it in Section 7.3. The downside is that people might have the impression that they are retrieving triples that have been explicitly asserted, while the use of filter functions gives them more control. And how would one deal with inconsistencies between things that have been asserted and inferred? Finally, the authors state materialization can improve performance. But they have not conducted an experiment demonstrating that. So I suggest the authors would rephrase that bit.
Section 4.1
Second half of the first paragraph needs to be rephrased. And, again, the query does not check, but retrieves resources fulfilling certain conditions.
The authors should provide URIs for the namespaces in Table 1.
What are the meaning of the different arrows in Figure 2. I see a mixture of different formalisms, and hence the meaning of arrows are difficult to comprehend; even though it is "intuitive".
Section 4.2
"[The] semantics of these..." misses the "The"
I feel that there is an implicit criticism of a) IFC OWL being complex, and b) a lot of the semantics being stored in literals (strings) rather than URIs. Am I correct?
Section 4.3
In this study you transform geometries in a particular representation for analysis purposes. Would that be something you would prescribe for all BimSPARQL implementation, or is that left up to the implementation? This is something that should be made clear. Figure 5 does *not really* explain the sentence it is put in.
Section 4.4.1
There is no indication of related work or comparing the coverage with other vocabularies/standards such as GeoSPARQL. Table 4 only highlights four, please indicate how many you have and how broad your coverage is. The descriptions in Table 4 need to be rewritten and the one for intersects contains a mistake (copy/paste error of disjoints).
In Listing 5, walls touch a door. As a naïve reader, I would think that walls contain a door. But walls have void elements that contain doors, windows, and whatnot. It might be worth clarifying that to the reader.
Section 4.4.2
You state you currently offer the function in Table 5, but the caption of that table indicates these are examples. Is that table complete or not?
Section 4.5
You choose to materialize RDF, but later on allude to process geometry data on the fly at query runtime. I wonder whether this really matters at this point in the article as these could be considered implementation strategies. But I do understand why geometries are treated differently from the other functions. That said, it might be worthwhile referring to other related work looking at a particular problem in GeoSPARQL (duplication of representation of geometries); storing the literal and the geometry representation for indexing. This has been touched upon in [7].
Section 5
I suggest calling this section "an implementation", emphasizing that it is a way to implement the system.
Misses aforementioned related work about representing and reusing function in RDF, which facilitates interoperability, reuse, transparency and even traceability.
It also refers back to Listing 2, but it might be good to list the full SPIN listing of that function here (as it is how it is implemented). Having at least one SPIN listing gives the reader an idea how the other queries in the appendices are wrapped.
Since you mention here that SPIN rules can call other SPIN rules, it might be worthwhile mentioning the possible problem of tractability and recursion; how can you ensure that your system terminates (in a reasonable amount of time) if it relies on your extended functions; functions that have been created for particular use cases (see Section 6), and reliance on other libraries (e.g., those for geometries).
Section 6
I already elaborated on the issues with this demonstration; the need to create additional functions for the use case. These extensions require additional knowledge engineering and coding activities, as well as deploying those.
Section 7.1
Discusses the interoperability and flexibility of the approach. I agree to some degree, but from the moment one creates custom functions for use cases in one deployed system, these won't run on other systems as they need to be deployed first. The representation of functions in [3] and [4] in a Linked Data context might be a suitable path fetching function from the Linked Data cloud, but that will not be of use if the functions also relies on the deployment of specific libraries.
The authors could spend some more time elaborating on the limitation of class instantiation.
For Section 7 (and other parts of the paper) the authors could refer to [8]; the present an overview of weaknesses and challenges of geospatial semantic web data that is relevant for this paper, of which one is query performance.
References
Be careful, some references contain mistakes in terms of capitalization (2), formatting (15), spacing (19 and 44), and so on.
As for the appendix, again make sure the descriptions match the queries and I suggest to use suitable long-term preservation platforms for the vocabularies and rules.
[1] https://www.w3.org/TR/vocab-data-cube/
[2] Blake Regalia, Krzysztof Janowicz, Song Gao: VOLT: A Provenance-Producing, Transparent SPARQL Proxy for the On-Demand Computation of Linked Data and its Application to Spatiotemporally Dependent Data. ESWC 2016: 523-538
[3] Christophe Debruyne, Declan O'Sullivan: R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings. LDOW@WWW 2016
[4] Ben De Meester, Wouter Maroy, Anastasia Dimou, Ruben Verborgh, Erik Mannens: Declarative Data Transformations for Linked Data Generation: The Case of DBpedia. ESWC (2) 2017: 33-48
[5] Christophe Debruyne, Eamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments. LDOW@WWW 2017
[6] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37-38: 184-206 (2016)
[7] Blake Regalia, Krzysztof Janowicz, Grant McKenzie: Revisiting the Representation of and Need for Raw Geometries on the Linked Data Web. LDOW@WWW 2017
[8] Kostas Patroumpas, Giorgos Giannopoulos, Spiros Athanasiou: Towards GeoSpatial semantic data management: strengths, weaknesses, and challenges ahead. SIGSPATIAL/GIS 2014: 301-310
|