Best Practices for Publishing, Retrieving, and Using Spatial Data on the Web

Tracking #: 1711-2923

Authors: 
Linda van den Brink
Payam Barnaghi
Jeremy Tandy
Ghislain A. Atemezing
Rob Atkinson
Byron Cochrane
Yasmin Fathy
Raúl García-Castro
Armin Haller
Andreas Harth
Krzysztof Janowicz
Şefki Kolozali
Bart van Leeuwen
Maxime Lefrançois
Josh Lieberman
Andrea Perego
Danh Le Phuoc
Bill Roberts
Kerry Taylor
Raphael Troncy

Responsible editor: 
Pascal Hitzler

Submission type: 
Survey Article
Abstract: 
Data owners are creating an ever richer set of information resources online, and these are being used for more and more applications. With the rapid growth of connected embedded devices, GPS-enabled mobile devices, and various organizations that publish their location-based data (i.e., weather and traffic services), maps and geographical and spatial information (i.e., GIS and open maps), spatial data on the Web is becoming ubiquitous and voluminous. However, the heterogeneity of the available spatial data, as well as some challenges related to spatial data in particular make it difficult for data users, web applications and services to discover, interpret and use the information in large and distributed web systems. This paper summarizes some of the efforts that have been undertaken in the joint W3C/OGC Working Group on Spatial Data on the Web, in particular the effort to describe the best practices for publishing spatial data on the Web. This paper presents the set of principles that guide the selection of these best practices, describes best practices that are employed to enable publishing, discovery and retrieving (querying) this type of data on the Web, and identifies some areas where a best practice has not yet emerged.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Xiaogang Ma submitted on 02/Oct/2017
Suggestion:
Minor Revision
Review Comment:

The manuscript #swj1711 “Best Practices for Publishing, Retrieving, and Using Spatial Data on the Web” is suggested for publication after minor revision. For comments and suggestions see below.

The manuscript gives a nice summary about best practices of publishing geospatial data on the Web. The manuscript covers three major parts: Principles for selecting, evaluating and describing best practices; A list of key requirements and corresponding best practices; and A list of gaps and some emerging practices. The manuscript is informative and is written in fluent English, which makes the reading enjoyable. The manuscript will be of great value to researchers in the domain of geospatial semantic web, and will also benefit the web science community in general. I only have a few small comments on suggestions.

First, I have a concern is about the scope of ‘spatial data’ stated in Section 1.1 and the coverage of examples in the other sections. In Section 1.1, the definition of spatial data ‘Any data that has a location component can be viewed as spatial data’ and the explanation of location ‘A location component is a reference to a place on earth or within some other space (e.g., another planet, a shopping mall, or a person’s brain)’ indicate a very broad coverage. However, in Sections 3 and 4, most examples are relevant to georeferenced data. Will the OGC-W3C Spatial Data on the Web working group cover more non-geospatial examples in their future works? A few sentences in the text may help readers understand the scope.

Second, I suggest that the authors may draw a diagram to show the relationships between principles listed in Section 2 and the key requirements listed in Section 3. As mentioned above, this is a very informative manuscript and it covers many knowledge items in GIScience and web science. A conceptual diagram will help readers get a quick overview of the key messages.

Third, since the OGC-W3C Spatial Data on the Web working group already published a note https://www.w3.org/TR/2017/NOTE-sdw-bp-20170511, it could be better if the authors make a citation to that document. This might help readers to find more details if they are interested in a certain best practice or if they want to know more works of the working group.

A few small issues:
In the author affiliations: g and l are the same.

There are a few embedded web links in the text, such as Section 1.2 and Section 2. Full URLs or citations might be made.

In the second paragraph of Section 3.3 there is a sentence “Yet a Spatial Thing is a real or conceptual phenomenon”. It is hard to understand it in the context of that paragraph.

The last sentence in Section 3.4: ‘… which is…’ should be ‘… which are…’.

The fourth paragraph of Section 3.9: The sentence ‘Google’s Structured Data testing tool shows if schema.org markup on a web page is recognized’ should be revised.

Review #2
By Christoph Schlieder submitted on 09/Oct/2017
Suggestion:
Major Revision
Review Comment:

The paper describes the best practice recommendations identified by the joint W3C/OGC working group on Spatial Data on the Web. Given the ubiquity of spatial data, the topic is of interest not just to those who are involved in creating and maintaining spatial data infrastructures but also to other data providers and users who are no experts on geospatial data.

It is difficult to think of a group of people who are in a better position for communicating the results of the working group. The first three authors of the paper have edited the OGC/W3C Working Group Note "Spatial Data on the Web Best Practices", a note that has been published as a W3C technical report in May 2017 (https://www.w3.org/TR/sdw-bp). The paper submitted to the SWJ shares almost half of its contributors with the technical report.

This raises the question of how the paper relates to the W3C document. As a reader of both texts who did not participate in the discussion process that led to the recommendations I have difficulties in figuring out what exactly the authors intend to add to the technical report by writing the paper. This is my main criticism of the otherwise very interesting and readable paper.

The abstract suggests that the paper provides a kind of digest, which abridges the 96 pages of the W3C document ("summarizes some of the efforts"). Not all readers of the SWJ might need that kind of assistance, though. W3C best practice recommendations are usually written in a quite accessible way. The technical report, for instance, numbers and lists the recommendations in an overview section. Unfortunately, such an overview is missing in the paper making it less readable in this respect than the W3C document. The fact that currently several of the paper's subsections refer to the recommendations by their numbering in the technical report creates confusion.

Besides abstracting the W3C document, the paper seems to pursue additional objectives, but they are less clear. The introduction mentions the objective of publishing the recommendations ("This paper is the first publication of the best practices"). I do not see a basis for this claim. After all, the technical report has already been published.

Another objective consists in providing background information ("presents the rationale underlying the selection of best practices"). While the technical report contains explanations of geospatial concepts, best practice examples, and a rationale for each recommendation, the paper contributes further material. This is where I see most of the value of this companion paper. For instance, the comment in section 2 on the "perceived RDF bias" of prior versions of the recommendations is interesting since it highlights a point of disagreement and helps to better understand that recommendation 3 on linking data is not necessarily one on linked data.

I assume that the authors endorse all 14 OGC/W3C recommendations, but they never say so explicitly. More importantly, they do not explain why they organize the recommendations in a different way as in the technical report. This refers to section 3 of the paper and its mapping onto section 12 of the report, which lists the best practice recommendations. From the current version of the paper, it is difficult identify the individual recommendations or to even tell their number. Some of the subsections (e.g. 3.6 Thematic layering and spatial semantics) seem to relate to several of the OGC/W3C recommendations while recommendation 9 on relative positioning is not covered at all.

The issues mentioned can be addressed by improving the paper's introduction and by providing additional information in some subsections of sections 3 and 4. Detailed remarks follow below.

In conclusion, I think that the paper deals with a timely and highly relevant topic but that it should undergo a major revision before being published.

---
Comments on individual sections
---

1 Introduction

The existence of the W3C document and the role of the submission as a companion paper should be mentioned in the introduction. Currently, the W3C document does not even appear in the list of bibliographic references.

The intention of the companion paper should be clearly stated: a digest? an update? a comment? As a reader, I should understand immediately whether with my specific interests I should read the paper, the W3C document or both.

To make the paper self-contained, provide a table with the 14 OGC/W3C best practice recommendations. It would help a lot if the table also shows how to map the subsections of section 3 onto the recommendations. Alternatively, the table could go into an appendix. In both cases, the subsections could continue to refer to the recommendations by the numbering used in the technical report.

3.1. Spatial Reasoning

"Spatial reasoning" is a misnomer since the section deals with how to best publish geometries.

Wherever best practices are identified which involve standards, like GML or GeoJSON in this section, references to the standards should be included.

This section mentions geospatial concepts, which are defined in the following section, e.g. "coordinate reference system". The text should inform the reader that definitions are going to follow. A definition of "reference geometry format" is missing altogether.

3. The Key Requirements and Best Practices

The introduction states as a criterion for the best practice cases that they are "linked to at least one example of a non-toy dataset". The subsections of this section should provide at least one reference to an example.

3.2. Coordinate Reference Systems and Projections

Curiously, the paper does not identify the most commonly used projected reference system, namely WGS 84 / Pseudo-Mercator (EPSG:3857). I am aware of the fact that the part of the cartographic community has still some objections to Pseudo-Mercator, but it is used by virtually all online map providers (Google, Bing, OSM, etc.) and it is mentioned in the W3C document as "de facto Web-standard CRS".

Fig. 3: It should be mentioned that the tool shows the distortions for a specific projected CRS, very likely WGS 84 / Pseudo-Mercator (EPSG:3857). Another flattening of the sphere may produce rather different distortions.

The text first refers to figure 4 and then to figure 3. This could be avoided by renumbering the figures.

3.3 Spatial identifiers

Provide references (links) to the examples of ressources (DBpedia, GeoNames) and, more importantly, to standards (R2RML) and tools (LDproxy).

3.6

"One of the best practices ..." Which one?

3.7. Temporal Dimension

The subsection repeatedly refers to individual recommendations of the W3C document by the numbering used in that document. As stated in the general comments, a table should list the recommendations together with their numbering to make the paper self-contained.

Why would an accuracy of 3m not be acceptable for street-level directions to a shop?

3.9. Crawlability

"Several examples of spatial things published in this way are provided in Best Practice 2." Include at least one example in the paper.

4.1. Representing geometry on the web

This repeats much of what has already been said in section 3.1: for instance, GML and GeoJSON as the most popular ways to publish geometries on the web or the explanation given for EPSG codes. While some of these redundancies are present also in the original technical report, they are more disturbing in the much shorter paper.

Note that not all common CRS have an EPSG code. Probably the most prominent case is the encrypted GCJ-02 datum, which causes the China GPS shift problem. This datum is massively used for feature geometries in that part of the world but it is very unlikely that it will ever be registered by EPSG because this would involve publishing the details of the encryption. For a different reason, it was far from clear that Pseudo-Mercator obtains an EPSG code.

4.2. A spatial data vocabulary

Add references to the data set examples from the Netherlands, similar to those provided for the data sets from France.

4.3. Spatial aspects of metadata

Add a reference for the national general data portal of the Netherlands mentioned in the text.

Review #3
Anonymous submitted on 20/Oct/2017
Suggestion:
Major Revision
Review Comment:

Anonymous review for this paper (handled outside the system):

This paper describes a best practice established collaboratively by the W3C and OGC standards bodies. It is in this sense authoritative, representing the major practitioners in the field, and is in scope for SWJ by targeting the "application" subject area. The topic is also timely and should be of interest to all publishers and users of geospatial data on the web, which is a vast audience. The paper is generally well written and structured, and contains significant content worthy of publication, with its innovation being the selection and collation of optimal best practices into a single new body. However, several deficiencies prohibit publication in its current form, thus a recommendation to undertake a major revision is in order.

Major issues:

The main problem with the paper is its mode of presentation, as it is written a little too much in the style of a technical report than a journal article. To correct this, the paper will need to clarify its categorization, namely research or survey article, and then re-focus accordingly. As a research article (SWJ "Full Paper" or "Application Report"), which I recommend, it needs to be better focus on innovation. For each sub-section of 3, this would include more on existing best practices (e.g. various national) and then more rationale for selection of the optimal. Perhaps by starting each sub-section with the current state of practice, followed by the new recommendation to highlight its benefit. Each sub-section should, generally, also clarify what is special about geospatial vs non-geospatial on the web. If the paper is to go the survey article route, then it needs a more comprehensive literature review for each section; as it is, referencing is targeted rather than comprehensive.

The second significant problem is the amount of content - the paper is somewhat bloated with unnecessary material and some repetition, which should be eliminated. It needs to concentrate on issues arising from web representation of geospatial entities, rather than general issues in geospatial representation. For example, it should eliminate lengthy descriptions of core GIS concepts, such as CRS in section 3.2 or scale in 3.4, which appear to be background. Likewise, it should avoid repetition of non-geospatial best practices, such as much of section 3.3 (spatial ids). In other cases, it needs to avoid repetition with the paper itself, such as in the WFS discussion in 3.3 or the geometry discussion in 4.1.

A final major problem is conceptual: the paper needs to better clarify what spatial means in 1.1. Is it limited to things located in physical space, or does it also include things located in abstract spaces, e.g. mathematical spaces or classification spaces, such as the linear space of the Celsius temperature scale. The paper is unclear here, i.e. "a Spatial thing is real or conceptual" (isn't a concept real too?), and geospatial seems to be used synonymously with spatial without being explicitly equated. Moreover, 1.1. also suggests that temporal data is somehow similar to spatial data ("Temporal data has similar characteristics"), without elaboration, whereas the nature of space and time, as well as the behavior of things in them, is usually seen as quite distinct. Finally, the paper should use these concepts and terms consistently throughout.

Minor issues:

- Language: there are a few typos and grammar problems sprinkled throughout. A good English proofread will catch these. A partial list includes:
o 1.1: "Earth" not "earth"
o 2: "helped a lot" - colloquial
o 3.4: "for metadata" not "of metadata"
o 3.7: "for enhancing" not "of enhancing"
o 4.3: "discoverable" perhaps instead of "findable"
- References needed for all standards in Table 1.
- Fig 1: needs more explanation in the text about overlaps and non-overlaps.
- 1.2: contributions need to emphasize innovation, i.e. new authoritative collection of best practices
- 3: "publishing [geo?]spatial data [on the web]"
- 3.1: "Spatial Reasoning" - is mistitled, as it does not discuss inferencing, only geometry representation. That said, this is now a gap in the best practice - web reasoning with geospatial data is not covered and should be discussed in 4.
- 3.2: remove and integrate into into 3.1, or condense significantly.
- 3.3: "exposing attribute [values]"
- 3.3: RESTful API not just for WFS, but could be for any OGC web service
- 3.7: is general, without anything particularly new for web application
- 3.9: clarify "Spatial Data services" = OGC web services and/or other APIs?
- 3.10: "0 to 3 or more dimensions", clarify, e.g. 0 to 3 spatial dimensions, and others such as temporal or thematic.
- 3.10: the discussion of "production environment" should be moved to the top in the guidelines
- 4.2: "geometrical descriptions of their boundaries" - boundaries of the datasets or featured contained therein? Also, this para needs better explanation.


Comments