Europeana Linked Open Data – data.europeana.eu

Paper Title: 
Europeana Linked Open Data – data.europeana.eu
Authors: 
Antoine Isaac, Bernhard Haslhofer
Abstract: 
Europeana is a single access point to millions of books, paintings, films, museum objects and archival records that have been digitized throughout Europe. The data.europeana.eu Linked Open Data pilot dataset contains open metadata on approximately 2.4 million texts, images, videos and sounds gathered by Europeana. All metadata are released under Creative Commons CC0 and therefore dedicated to the public domain. The metadata follow the Europeana Data Model and clients can access data either by dereferencing URIs, downloading data dumps, or executing SPARQL queries against the dataset. They can also follow the links to external linked data sources, such as the Swedish cultural heritage aggregator (SOCH), GeoNames, the GEMET thesaurus, or DBPedia. The latest dataset release has been published in February 2012.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised manuscript after "accept with minor revisions" - now accepted. The reviews from the first round are below.

Solicited review by Francois Scharffe:

The paper presents the pilot Europeana dataset. The dataset is important, rich and complex. It is a pilot as lessons learnt will enable to revise the publication. The paper is well written and gives a good overview of the dataset structure.

two minor remarks:
- Section 2 it is said that semantic markup is available on Web pages. It would be good to cite the technology used for the markup RDFa ? schema.org ?
-dereferencable -> dereferenceable

Solicited review by Dave Kolas:

This paper describes a prototype Linked Data version of the Europeana dataset.

* Quality of the dataset

The Europeana data on the museum / library resources is aggregated from a number of holders of the physical resources, thus the original providers have motivation to make the data accurate. It is possible that the aggregation of many sources means that some sources produce different subsets of data for the schema. The schema addresses the problem of multiple potentially conflicting records about a resource with proxies. It is not clear whether this is a better or worse approach than reification or named graphs for this purpose, but it appears sufficient. The other schema modeling is reasonable, though light on the interlinking (as noted in the paper). The authors do a good job of linking to other datasets, though it would be interesting to see percentages as well as raw links.

* Usefulness (or potential usefulness) of the dataset

This dataset could be potentially useful to a large number of people involved in or interested in the arts in Europe. It could also be combined with travel applications to know where to see particular works of interest. The prototype nature of the dataset leaves out much of the content currently in the non-linked-data Europeana dataset, somewhat mitigating its utility for the moment.

* Clarity and completeness of the descriptions

The paper is written clearly and concisely. The main classes in the data model are described well, and there is a good diagram of how these classes interact. An example record with properties might have been nice however.

Solicited review by Amit Joshi:

The paper is about the Europeana linked open data which contains open metadata with more than 2.4 million text,images, videos and sounds related to books, paintings, films, museum objects and archival objects throughout Europe. Data is gathered by Europeana from multiple data providers. Metadata is obtained from data providers, formatted according to ESE XML Schema and then converted to EDM for generated linked data version. The dataset is live and can be accessed either by downloading data dumps or executing SPARQL queries against the dataset. The significance of such unique dataset being open is, without any doubt, high. However, the paper has following weaknesses:

1. Use of provider proxy and Europeana proxy is not clear. Is it even required?
2. It would be good to provide examples of the items/resources in a dataset that uses existing ontologies and connects to other LOD datasets.
3. Number of references is very few (only two). Please revisit earlier sections and add additional references (ex: linked data principles)

Tags: 

Comments

We want to express our gratitude for the received reviews. Here are detailed answers directly addressing their comments. A revised manuscript has been produced, which implements the modifications we propose in these answers.

Reviewer #1 (Francois Scharffe):

Comment: Two minor remarks:
- Section 2 it is said that semantic markup is available on Web
pages. It would be good to cite the technology used for the markup
RDFa ? schema.org ?
-dereferencable -> dereferenceable

Response: We considered all comments and fixed all typos in the revised manuscript. A mention to Europeana’s current use of RDFa has been added to section 2.

Reviewer #2 (Dave Kolas):

Comment: The schema addresses the problem of multiple potentially conflicting records about a resource with proxies. It is not clear whether this is a better or worse approach than reification or named graphs for this purpose, but it appears sufficient. The other schema modeling is reasonable, though light on the interlinking (as noted in the paper). The authors do a good job of linking to other datasets, though it would be interesting to see percentages as well as raw links.

Response: The proxy approach is inherited from adopting the OAI-ORE model, which introduced them because named-graphs were not yet part of the RDF model when the specification was finalized. We are fully aware of the complexity introduced by this approach and will certainly investigate alternate approaches for future releases EDM releases. Meanwhile, we have emphasized the issue by adding a couple of sentences on proxies in sec. 5. We have also added, for all enrichment categories mentioned in section 3.2, the percentage of all objects that are enriched.

Comment: This dataset could be potentially useful to a large number of people involved in or interested in the arts in Europe. It could also be combined with travel applications to know where to see particular works of interest. The prototype nature of the dataset leaves out much of the content currently in the non-linked-data Europeana dataset, somewhat mitigating its utility for the moment.

Response: Reuse of data for different use cases is a primary goal in Europeana and a motivation for organizing the so-called Hackatons, mentioned in the paper.

Comment: An example record with properties might have been nice however.

Response: We cannot add another figure, because of the page limitations in this journal. However, at the beginning of Section 4 we now refer to another, more technical paper [1], which includes the sample record.

Reviewer #3 (Amit Joshi):

Comment: 1. Use of provider proxy and Europeana proxy is not clear. Is it even required?

Response: Proxies allow Europeana to distinguish the original metadata for the object from the metadata that is created by Europeana. Provider proxies carry the item-specific metadata added by Europeana. Indeed some--including us--question the value of such complex structure for a general linked data publication. We have emphasized the issue by adding a couple of sentences on proxies in sec. 5.

Issue: 2. It would be good to provide examples of the items/resources in a dataset that uses existing ontologies and connects to other LOD datasets.

Response: data.europena.eu currently makes use of OAI-ORE, Dublin Core, SKOS, and FOAF. The example shows resources with ore-prefixes, the elements in the sections show the other namespaces. We believe that this aspect is already sufficiently described in Section 4.7. On connection to other LOD datasets (i.e., instance-level) we have added some stats in Section 3.2
If editors judge it appropriate and possible, we have assembled two data examples for uploading as additional material on the journal site.

Issue: 3. Number of references is very few (only two). Please revisit earlier sections and add additional references (ex: linked data principles)

Reference: We added a reference to Heath, Bizer, Linked Data: Evolving the Web into a Global Data Space in the introduction and converted into a reference a footnote that was pointing to a submission to the same SWJ, which has been meanwhile accepted for publication (with revisions).
We could transform some footnotes into references. But the 6-page limitation is an issue. We would therefore wait for editors to recommend us to do this, given the preferred policy of the Semantic Web Journal.