Linking Women Editors of Periodicals to the Wikidata Knowledge Graph

Tracking #: 2741-3955

Katherine Thornton
Kenneth Seals-Nutt
Marianne Van Renmoortel
Julie M. Birkholz
Pieterjan De Potter

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Stories are important tools for recounting and sharing the past. To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the example of the WeChangEd research project on women editors of periodicals in Europe from 1710 - 1920 we detail how to move from archive, to a structured data model and relational database, to a Linked Open Data model and make this information available on Wikidata, to the use of the Stories Services API to generate multimedia stories related to people, organizations and periodicals. As more humanists, social scientists and other researchers choose to contribute their data to Wikidata we will all benefit. As researchers add data, the breadth and complexity of the questions we can ask about the data we have contributed will increase. Building applications that syndicate data from Wikidata allows us to leverage a general purpose knowledge graph with a growing number of references back to scholarly literature. Using frameworks developed by the Wikidata community allows us to rapidly provision interactive sites that will help us engage new audiences. This process that we detail here may be of interest to other researchers and cultural heritage institutions seeking web-based presentation options for telling stories from their data.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Lydia Pintscher submitted on 17/Apr/2021
Review Comment:

The paper describes the steps taken by the authors to add data about women editors of periodicals to Wikidata and then make use of that data in a web application called WeChangEd Stories.

The authors detail the steps taken from collecting the data, applying for a new property in Wikidata, getting approval for an import bot, importing data, augmenting and interconnecting the imported data and finally querying it to find interesting new information and make use of the data in their web application.

The paper stands out for two reasons.
First, the authors clearly explain the benefits of opening up research data in the humanities via Wikidata for Wikidata and the world at large. But equally, they highlight the benefits they themselves received from adding this data to Wikidata in the form of error correction by the Wikidata Community, wider reach of their research, augmentation of their data through a myriad of other data on Wikidata and being able to rely on an ecosystem of tools for further work with the data. This mutual benefit is often overlooked or not described with such clear examples as in this paper.
Second, the authors describe the individual steps of getting data into Wikidata, including often overlooked parts like using the EditGroups tool. This will make it easier for future researchers to follow similar processes for their own data.

Minor typo fixes:
* page 1 column 2 line 45: "in gap" -> "a gap"
* the concepts mentioned in the description of Fig.1 do not correspond to the concepts highlighted by the red and blue boxes in the image
* footnote 5 should be attached to "OpenRefine" and not "Wikidata" in the text
* page 6 column 2 line 43: "us" -> "use"
* page 6 column 2 line 44: "MediWiki" -> "MediaWiki"
* page 9 column 1 line 40: "four" -> "fourth"

Review #2
Anonymous submitted on 22/Apr/2021
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Originality and Significance of results

This paper is a case study of adding data about women editors of periodicals in Europe from 1710 to 1920 to Wikidata. The case study is very well motivated, the methods used are clearly described, and the final applications very compelling. The paper does not offer original computer science or Semantic Web contributions, but it offers a very compelling example for why and how to contribute data to Wikidata. The topic of the contribution is also important, and the contribution to Wikidata ensures that the information is recorded and accessible to a very wide audience for a very long time. I commend the authors for their thorough work, accurate description and the outcome. The paper provides a clear blueprint for others to follow.

The demonstration is very well done, and serves as a good showcase of the type of compelling applications that can be built using Wikidata.

Quality of writing

The quality of the writing is very good. There are a few typos that need to be corrected before publication.

I found the use of the term LOD to be misleading. The data is not being published as a dataset in the LOD cloud, but rather, it is integrated into Wikidata, a large public knowledge graph, and as such does not have many links to other LOD datasets (it has the links to external databases and resources, but most of those are not LOD datasets).

I question the need to state that the work publishes or uses the LOD cloud. The real contribution is in the curation and integration of the data into a large public KG. By reusing existing entities and adding new ones, Wikidata has become a better knowledge graph. The paper clearly articulates the benefits in the application from the data having become part of wikidata. The LOD term is present everywhere in the paper, but the sprit is integration with a public KG. I think this is different and important. My suggestion is cast the contribution in those terms.

Review #3
By Filip Ilievski submitted on 05/May/2021
Major Revision
Review Comment:

Inspired by the underrepresentedness of women in Wikidata and the richness of information in social science archives, this paper proposes to link women editors of historical journey periodicals to Wikidata. It then uses a web interface to allow non-semantic-web users to navigate the data.

The strong points of the paper are that: 1) it addresses a real problem with representativeness of knowledge, in terms of both gender and time; 2) it establishes a natural two-way connection between Wikidata and social sciences, where each can benefit the other; 3) it sets an example for future projects that integrate social science and the semantic web; 4) the online demo is really nice.

My main complaint about this paper lies in its focus. A quarter of the paper (2/8 pages; or a third if not counting the intro/conclusion) describes the tools that have been used in much detail. Also, I find section 4 to be obsolete for a SWJ issue. Yet, description of the key technical contributions from an SW perspective is critically missing. Specifically:

1) The data model is vaguely described. It would help to have a figure that depicts the schema, or an example subgraph.

2) The alignment between WeChangEd and Wikidata entities is vaguely described. Please clarify.

3) The storytelling interface is also briefly described, with the main focus on the tools used to build it. The paper needs to explain what is the user input to the storytelling demo, describe (or show a figure of) its appearance, and be specific about how it makes it easier for users to explore the data. In fact, the online demo answers many of these questions, but it should be exposed better in the paper. It would also be useful to show which aspects of the interface are facilitated with the Wikidata integration as opposed to by the original data.

My second remark is that I am not fully convinced in the proposed benefits of this work. Various claims are made to support the integration of WeChangEd and Wikidata, and some of them are convincing (e.g., that Wikidata gives extra information about the publications), but others less. The authors say that 80% of the people in Wikidata are male - by adding 1.5k women entities, this is unlikely to have changed. Or am I missing something here?
A second example is the arguments about formats: it seems that the original database already had exports to CSV and JSON, so integration with Wikidata is not necessarily beneficial here. I suggest that the authors clarify or remove these claims.

Finally, I am not sure whether the code and the data of this paper is available for reproducibility purposes. I did not find a link to it in the paper.

Other comments and typos:
* I don't understand Figure 3 - what does it tell us? Are these many IDs for a small number of nodes. Or the same identifier specified for many nodes?
* us to us -> us to use
* used the using the -> used the