# SPARQL with Property Paths on the Web

Authors:
Olaf Hartig
Giuseppe Pirrò

Guest Editors ESWC2015

Full Paper
Review #1
By Pedro Szekely submitted on 13/Jan/2016
 Suggestion: Accept Review Comment: The paper discusses the use of SPARQL queries with path patterns in a distributed evaluation context. It first discuss the impossibility to evaluate in a formal way SPARQL queries against the web. It then defines three different semantics for the evaluation of such path patterns and extend them to the usual SPARQL connectors. Finally, because some of these semantics are still not simply practical, the paper defines the notion of Web-safe queries. The paper ends by two experiments which are very illustrative of the properties of the different semantics. The paper is an extension of a previous ESWC paper. This is a very welcome extension. I find it far more clearer than the initial paper. This paper draws clean formal foundations for the distributed evaluation of SPARQL queries. The proposed semantics enable the implementation of evaluators along sensible precise rules. Most decisions are convincing and well explained. The experimental evaluation is also surprisingly insightful. So, it definitely should be accepted. In make hereafter a few form reservation. I would love if the authors could take them into account, but they are no obstacle to the publication of the paper. In several instances, the authors would do a good job in connecting the concepts that they develop to other ones which are well known in the domain and which are barely mentioned in the document: - I would consider that the context-based semantics formalises the follow-your-nose way of traversing the semantic web, why is this not mentioned? - dereferencing is only mentioned in the second part (after p16 if I read well). However, it seems to me that dom\not\bot is the set of dereferenceable IRIs. Why not saying so? - a similar easy explanation is to explain the meaning of web-safe queries as those queries in which any variable used in subject is necessarily previously bound during the evaluation of the query... not such a simple explanation is given to the reader. It is possible, that the explanations that I offer are incorrect. However, it would still be nice to explicitly explain why. There are, in my opinion, other vocabulary problems: - in the abstract, and later, the oposition is between "data sources available as linked data"; "on the web", "at the data level", "web-graph navigation". Nothing is clear here. People who query dbpedia in a SPARQL endpoint think, legitimately, that they are querying linked data... I appreciate the difficulty of adopting an adequate vacabulary, but we are not there yet. I feel that the (three) key opposition are between; distributed non-specified data sources (or distributed open-ended querying), federated dataset querying (that is distributed specified) and centralised querying. - similarly, opposing "reachability-based" and "context-based" is misleading because "context-based" is also "reachability-based" but in another way. - finally, the term Web-safe, is not really appealing. There could be many ways to be web-safe and this is only a specific one. Safe-navigational queries could be an option (there are others below). I have a few questions about the experimental setting: - is there any cache enabled or not? - is it possible to display the proportion of duplicate answers? (and following the discussion below, several categories of those if there are). Suggestions: - First, the resort to Figure for expressing the most important part of definitions is ugly. This may be due to the format, but this is ugly anyway. - on page 3, it would be good to have a picture of the "web of documents" and web of data illustrating minimally their features (i.e., less complex than Figure 4. I am also surprised that the Web of documents seems to exclude SPARQL endpoints. - p4 "PP patterns with blank nodes can be simulated using fresh variables" not in predicate position. - p5: the result contains THE solution mapping - P7, 4.2: The definition of ScP-reachable should be a definition. Moreover it is difficult to read. At least put d' is ScP-reacheable in W'' right after d'\in D'' - p8, l4: "we defineD" - p8: cPPMatch should also be a definition - Definition 10. This is ugly because the reachability condition depends on P while it seems like a function [[.]]_Gr was defined that which is not true. - p10 "both patterns are semantically equivalent under context-based semantics" it seems that they are equivalent in all semantics defined in this paper no? - in separation -> in isolation? - p11 and followers: it would be clearer to keep P for path pattern and to use the symbol G for graph patterns - One could use \phi-safeness since this property actually depends on the used semantics. That would have the merit to recall that web-safeness is not an intrinsic property but that this depends on the given semantics. - Table 1 contains redundencies. Indeed, 15 entails 16, 10 entails 11 and 10 entails 12. So, the meaning of this table is the same if 15 and 10 are suppressed. - note 28: recursive: recursively enumerable? Well we are talking about a definition, not a function. - for experimental results, please use patterns because in grayscale, it is difficult to distinguish colors. - p18: "duplicates which results from finding a greater number of alternative path"? I am surprised. I though that duplicate could not be generated through "path" in SPARQL 1.1 semantics, only through the use of operators UNION, AND (as presented on p5)? How can duplicate through path happens? It would be interesting to compare this with the set-semantics instead of the pseudo-bag-semantics.