Challenge-derived design principles for a semantic gazetteer for medieval and early modern places

Philipp Schneider
Jim Jones
Torsten Hiltmann
Tomi Kauppinen

Christoph Schlieder

Survey Article
In recent years gazetteers based on semantic web technologies were discussed as an effective way to describe, formalize and standardize place data by using contextual information as a method to structure and distinguish places from each other. While research concerning semantic gazetteers with regard to historical places has pointed out the importance of enabling the creation of a global and epoch-spanning gazetteer, we want to emphasize the importance of taking a domain oriented approach as well - in our case, focusing on places set in medieval and early modern times. By discussing the topic from the historians’ perspective, we will be able to identify a number of challenges that are specific to the semantic representation of places set in these time periods. We will then do a survey of existing gazetteer projects that are taking historical places into account. This will enable us to find out which technologies and practices already exist, that can meet the demands of a gazetteer that considers the time specific geographic, social and administrative structures of medieval and early modern times. Finally we will develop a catalogue of design principles for such a semantic gazetteer. Our recommendations will be derived from these existing solutions as well as from our epoch-specific challenges identified before.
Review #1
By Carsten Keßler submitted on 08/Aug/2018
This paper focuses on challenges in the design of historical gazetteers with a focus on medieval and early modern places. While the paper discusses an interesting topic and is generally well written (barring the occasional typo), I have several concerns about it, especially in relation to its submission to SWJ.

My main criticism is that the paper claims to provide design principles, but it really only summarizes design challenges, without clear guidance how to tackle/address them in the design of a historic gazetteer. None of these are new as such, and the paper does not really present any clear design principles to tackle these challenges (even though the title suggests that, and despite the fact that clear principles already exist for some of them). For publication in SWJ, I would expect such a paper to clearly (and ideally formalize, e.g. as ontology design patterns) the design principles that help tackling those challenges.

Moreover, the paper claims on several occasions that the requirements for a historical gazetteer focusing on medieval and early modern times are different from the requirements for other historical gazetteers, but fails to clearly state why. One example is the understanding of governance over people vs. a clearly defined region – isn't this also the case for other historic periods? The same goes for the argument that a domain-oriented focus is different from a federated system – IMO one of the main reasons to use semantic web technologies for gazetteers is integration across gazetteers (p. 2, right column). The authors seem to imply that a domain-oriented focus prohibits that.

Having said that, the paper can still be a very useful contribution, but I think it would be a better fit for a digital humanities outlet, rather than SWJ. The audience for a DH outlet would probably appreciate this overview to avoid pitfalls in historical gazetteer design, and also be able to better follow the numerous examples, some of which are hard to follow for someone with a history background. Finally, I'n not so sure the topic is well-suited for a survey paper. The authors seem to be making the point that what they want to do is different from what everybody else has done so far around historical gazetteers. That sounds more like the "motivation" chapter of a PhD thesis to me, than the outset for a survey paper. I think one could build a nice survey paper around section 4, but then the paper would also need to add a substantial amount of references to existing modeling approaches (with an appropriate discussion of them).

Some more specific comments that may help to improve the paper:

- p.2: "Thus, to understand and work with places, it is crucial to distinguish them from other ones on a conceptual as well as a technical level." – Explain what you mean by the technical level.
- In section 2, it sounds like you are mixing the administrative hierarchy of place instances and the class hierarchy of place types. This should be rephrased/clarified that those are two different things.
- At the end of section 2, [1] would make a good reference. (In fact, I was a bit surprised that this is not cited, especially given that Jim is an author on both of them...)
- For the problem described in section 3.1, [2] could be useful
- p.4: "Physical objects are places that can be located in the actual world" – I would replace "located" with "observed" here – otherwise you are implying that fiat objects cannot be located
- The comparison of different models/ontologies in section 4 is really useful!
- p.12: "This open world assumption has the great advantage that Wikidata theoretically allows any given form of classification and structure for modelling historical place data. On the other hand, this makes querying the data much more difficult, since a user has no overview over all concepts, their meanings, and how they are related to each other." – this applies to ANY approach based on semantic web technologies and it is in fact a strength, not a weakness! And you even ask for it yourself in section 5.1, where you (correctly) argue for the use of both generic and specific place concepts!
- p. 17: "To capture these features in a gazetteer for the medieval and early modern world, a new model that understands ruling as an interconnection between places, non-governmental institutions and people has yet to be developed." – I was hoping to see an approach that shows how to tackle this in the paper.

[1] Jim Jones, Werner Kuhn, Carsten Keßler and Simon Scheider (2014) Making the Web of Data Available via Web Feature Services. In Joaquín Huerta, Sven Schade, Carlos Granell: Connecting a Digital Europe Through Location and Place. Springer Lecture Notes in Geoinformation and Cartography 2014: 341–361. DOI:10.1007/978-3-319-03611-3_20

[2] Johannes Trame, Carsten Keßler and Werner Kuhn (2013) Linked Data and Time – Modeling Researcher Life Lines by Events. In Thora Tenbrink, John Stell, Antony Galton, Zena Wood: Spatial Information Theory. 11th International Conference, COSIT 2013 Scarborough, UK, September 2013 Proceedings. Springer Lecture Notes in Computer Science Volume 8116: 205–223. DOI:10.1007/978-3-319-01790-7_12

Review #2
By Martin Doerr submitted on 06/Sep/2018
Minor Revision
This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

This paper is very insightful about the historical reality and the complexity of modelling place-related entities.
An overall excellent analysis, a good introductory text for researchers and practitioners. The coverage of aspects of complexity of place-related historical phenomena is very good, in particular all things that do not fit well to most of the current practice of designing gazetteers, but also naive application of logic. Its value lies in this deep understanding of the domain, more than in the analysis of the ontological representations.

Since there is currently not any satisfactory implementation for cultural spatiotemporal semantic gazetteers, all struggling with inadequate simplifications, this paper is quite important for the Semantic Web. It is well written.

The literature references are rich and quite comprehensive. I suggest the authors to add Papadakis, M., Doerr, M., & Plexousakis, D. (2014). Fuzzy Times on Space-Time Volumes. eChallenges e-2014, 2014 Conference, Belfast, 29-30 October 2014 (pp. 1-11). IEEE (978-1-905824-45-8). The latter may give the authors a further understanding of the possibilities of the CIDOC CRM and its extensions.

The paper could say a bit more about topological relations - spatial, temporal and spatiotemporal, and that and how semantic relations and topological ones can justify each other, such as creation becoming a terminus antequem for use etc.

I would like to shortly comment some fundamentally alternative views the authors may still at least mention, ones they are core ideas of the CIDOC CRM, being suited to resolve a lot of the riddles about places to my opinion:

The paper uses Yi-Fu Tuan's definition: "a center of meaning constructed by human experience". Personally, I regard this well received definition a major source of ontological confusion.
A clearcut concept of identity must be based on an adequate definition of substance (see David Wiggins,"sameness and substance renewed", ISBN-13: 978-0521456197
ISBN-10: 0521456193). Here we confuse the substance of (1) geometric extent, (2)the materiality present or activities going on within a geometric extent or the human intentions about a geometric extent, and (3) the relationships of meaning.

page 2: "The name as well as the location of a place is just a designation, while the cultural and temporal setting in which it has certain properties, make it unique as an entity".

The identity providing substance is not that of "place", but that of the concepts (2) above. What is observable are the latter. Place is where they happen to be, not what they are.

What makes them a "place" is a psychological-linguistic projection, like a pars-pro toto, focussing on things having spatial projections and some spatial stability, and herefore calling them "place". Tuan confuses the identity of "having a place" with "being a place".

The authors should mention, when describing the CRM, that E53 Place has this different definition (dependent pure geometric extent), and because of that it can describe the relations to events and spatial ambiguity or indeterminacy in a consistent way:
The authors write: "Such a contextual approach offers a more accurate description of what distinguishes
a place from another one." What is the substance of this context that provides an objective distinction?
and Page 3: "Because most places and territories prior to the 19th century lack clearly defined borders, this approach has many advantages for modelling the fuzziness of historical places."

Also, a lot of subjectivity and domain-specific views ("This paper will take a step back from this global perspective") the authors describe would not be necessary if the substance of place were adequately defined. A medieval rulership is a quite objective fact, comparable to ruling all over the world.("One should note, that such a conceptualization is not universal but represents a specific view on reality [14, p. 84].")

I suggest the authors to question a bit more the concept of "hierarchical relations" on page 3 and page 5. Firstly, the phenomena described are often not a tree structure, but rater a DAG. Secondly, what is called hierarchy is a semantic compression of geometric inclusion (missing overlaps), and part-of of material or social phenomena. The idea to press named places into a hierarchical structure simultaneously being geometric inclusion and semantic part-of, as if they were terminology, is a major cause of inadequacy of current gazetteer models. Semantic part-of can justify geometric inclusion, but not otherwise round. Semantic part-of has different semantics for different phenomena, such as administration, building parts of geological areas. IsA hierarchy of terms is not among them.

Page 4: "Firstly, one can distinguish between fiat objects and physical objects. Fiat objects are virtual spaces conceived by men [4, p. 135]." The authors should notice that the CRMgeo declarative places are "fiat" objects. The opposite are "bona fide" objects, which are the phenomenal places, including extents of physical objects.

I regard this as wrong: "But less clearly defined spaces such as the Finno-Ugric language group or ’the Christian world’ can be understood as fiat objects, too.". It confuses fuzziness with virtual conception. The extent of the Finno-Ugric language group is defined by observation, not by conception, all be it fuzzy and prone to subjective differences.

Page 4, third and last paragraph is confusing and should be rewritten: "Place concepts like duchy, republic, prince-bishopric, or parish are specific terms, and their meaning is related to a certain historical context, while general concepts can be understood as concepts of broader categories." and "The assumption that a duchy and a republic can be grasped by the same concept, like secular dominion, is a very broad historical simplification."

The authors confuse here the question of (1)IsA generalization, with the (2) adequacy of specialization, and (3) the use of terms in a system. All concepts have broader generalizations (IsA). No concept grasps the specificity of an individual. Even "duchy" doesn't. The question is the adequacy of the term for the intended documentation. "duchy" and "republic" are secular dominions, regardless all obvious differences. The idea that a classification makes a prejudice about missing more detailed features is wrong. It may only be insufficient for the purpose of a documentation system, but not a "historical simplification"

Page 5: . "A more accurate way would be to model ruling as relations between agents, privileges and places."
In the CRM, this is modelled either as instance of Period or Activity.

page 7: "Using GIS practices as an example, the easiest way to model time is to understand the whole dataset as a
representation of the world at a certain point in time." In principle, this is really never known, not just hard to achieve. It will always be an interpolation of events and fuzzy boundaries.

Page 8: "Like the Julian and Gregorian calendars, these can exist simultaneously.": It is more practical to normalize all calendars with known relations to the modern one. Only Egyptian kings lists etc. pose a problem of unknown point of reference.

"Sometimes it may not be possible to state a date as a definite starting point of an event, but only a terminus
ante quem (or post quem);" : A date is neither a point, but both a terminus ante quem and post quem, from the beginning of the day to its end. Rather, all time information should be understood as approximations, never "definite". To regard a day a point in time is an arbitrary restriction of precision.

"This could be achieved by either simply adding a fictitious or mythological tag to the affected places, or
by using a copy of a class tree with an actual root for first and a fictitious root for the second." :
A tag is ontologically a bad solution, because fictitious items do not behave like real ones. There are no fixed topologies, they have unlimited conceptual variants etc. Cases, in which real places inspire fictitios ones should be modelled by linking.

See also: Theodoridou, M., Bruseker, G., Daskalaki, M., & Doerr, M. (2016). Methodological tips for mappings to CIDOC CRM. 44th Computer Applications and Quantitative Methods in Archaeology Conference (CAA 2016) "Exploring Oceans of Data", Oslo, Norway, March 29 - April 2, 2016.

page 10:
"The second one, SP6 Declarative Place, is the place as we represent it, based on historical source, archeological findings or guessed approximations."

This is not correct. It is not "THE place as...", SP6 is A place in its own right, a FIAT place in the proper sense. It can be used to approximate a phenomenal place. All geometric representations of Phenomenal Places are declarative approximations. In case of legal claims of land, the claim may be described in a purely declarative way. The words "declarative" and "fiat" are virtually synonymous.

Page 11, fig 1: the P160 property does not go from E4 to E92 but only from E92 to E52 and E4 is a subclass of E92

Page 16: "The distinction between fiat places and physical places as well as general and specific concepts is done only by the GOV ontology." See above, except for using the term "declarative" instead of "Fiat", the CRMgeo makes exactly the same distinction, even better, because it generalizes to phenomenal, which is more consistent with historical concepts.

"All ontologies introduced in this paper solve the problem of multiple names. However, only Pleiades takes into account that names retrieved from historical sources can be flawed data and should therefore be modeled as such."
The CRMinf extension of the CIDOC CRM describes rich belief states in general, and FRBRoo describes name use activities of the past.

"By separating the historical source from the editor of a dataset, only the Pleiades ontology allows in parts
a model of provenance considering the needs of academic research."
This is not correct. CRMInf describes provenance and inferencing. It relies on a Named Graph approach. CRM itself has the Attribute assignment construct to describe provenance, a sort of reification.

Page 18: "As shown with the Pleiades ontology, it is to be preferred to also represent the trustworthiness of a source." CRMinf introduces the concept of "Belief Adoption".

In general, Provenance of knowledge should not be integral to some domain specific ontologies, because the respective epistemology is generally applicable to any class and property of many domains. Adding provenance to place-specific classes confuses the situation, because it leaves others without provenance, which cannot be. There is, e.g. PROV-O as general purpose provenance ontology, but not necessarily geared for historical discourse.