Micro Analysis of Linked Open Data Quality and Graph Traversals for Cultural Heritage Research

Tracking #: 2296-3509

This paper is currently under review
Go Sugimoto

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
In cultural heritage, many projects have generated a large amount of Linked Open Data (LOD) in the hope that it transforms scattered data into connected global graphs, which are supposed to advance our research with machine-assisted intelligent tools. However, the investigation of aggregation and integration of heterogeneous LOD is rather limited partly due to data quality issues. To this end, the author examines end-user’s “researchability” of LOD, especially in terms of graph connectivity and traversability. Three W3C recommended properties (owl:sameAs, rdfs:seeAlso, and skos:exactMatch) as well as schema:sameAs are inspected for 80 instances/entities for ten widely known data sources in order to create traversal maps. In addition, data content (literals, rdf:about, rdf:resource, rdf:type, skos:prefLabel, skos:altLabel) is assessed to capture the overview of the data quantity and quality. The empirical micro study with network analyses reveals that the major LOD provides relatively low number of outbound links, proprietary RDF properties, and few reciprocal vectors. These quality issues suggests that the LOD may not be fully interconnected and centrally condensed, confirming the outcomes of previous studies. Thus, their homogeneousness casts a doubt on the possibility of automatically identifying and accessing unknown datasets, which implies the needs of traversing strategies to maximise research potentials.
Full PDF Version: 
Under Review