Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Meta-Data for a lot of LOD

Submitted by Laurens Rietveld on 01/06/2016 - 08:44

Tracking #: 1282-2494

A new version of this paper is available

Authors:

Laurens Rietveld

Wouter Beek

Rinke Hoekstra

Stefan Schlobach

Responsible editor:

Aidan Hogan

Submission type:

Dataset Description

Abstract:

This paper introduces the LOD Laundromat meta-dataset, a continuously updated RDF meta-dataset describing documents that are crawled, cleaned and (re)published by the LOD Laundromat. This meta-dataset of over 110 million triples contains structural information for more than 650,000 documents (and growing). While traditionally dataset meta-data is often not provided, incomplete, or incomparable in the way they were generated, the LOD Laundromat meta-dataset provides a wide variety of structural dataset properties, including the number of triples in LOD Laundromat documents, the average degree in documents, and the distinct number of Blank Nodes, Literals and IRIs. This makes it a particularly useful dataset for data comparison and analytics, as well as for the global study of the Web of Data.

Full PDF Version:

swj1282.pdf

Revised Version:

Meta-Data for a lot of LOD

Previous Version:

Meta-Data for a lot of LOD

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Juergen Umbrich submitted on 04/Apr/2016

Suggestion:
Minor Revision

Review Comment:

I would like to thank the authors for their effort in improving the submission.
Especially, the added visualisation of the dependency graph (Fig.2), the metadata description (Fig.3) and provenance model (Fig.4) help to better understand how the dataset can be explored and used.

In addition, the provided example queries in the use case section are nice and provides some ideas for further use of the dataset.

However, some previous comments were not directly addressed:

*) I think it would be nice to have some idea how people are using the dataset at the moment by describing the types of the 20,606,194 SPARQL queries. Maybe the authors could inspect the where clause of the queries; e.g., how many queries use filters, how many triple patterns, etc…

*) In agreement with the review of Sebastian Hellmann, Section 4.5 is still not really about the statistics of the dataset itself, but about its usage.
It would be nice to have statistics about the dataset itself, number of distinct properties, number of classes, etc…

*) Also the dissemination process can be further improved by providing insights into the update process of the statistics.
How is the scalability of LODLaudromat for the nightly builds. Is it possible to rerun the extraction of the statistics in less than 12 hours, or what is the time span for this.
This would be crucial information for someone who is using the dataset and relies on the up-to-date statistics

*) Table 1: I still would add the URIs for the meta-data properties.

Considering the SWJ evaluation criteria:

(1) Quality and stability of the dataset: the data is available and can be considered as stable.
The authors detail to some extent the process how the data is generated.
One minor issue regarding the quality is the lack of details about the up-to-dateness of the metadata.
The authors claim to recompute the statistics every night, but it is not clear how long the process takes and if the metadata is as such up-to-date.

(2) Usefulness of the dataset:
There is less doubt that the dataset is useful. The authors provide a good motivation and show use cases in which the metadata set can be used (e.g. verifying claims in papers, finding datasets with specific features, etc..)

(3) Clarity and completeness of the descriptions:
The description of the dataset was significantly improved with the added images of used vocabularies, schema, etc…
As such, it should be easy to explore and navigate the datasets based on the paper and the example queries.

Overall, I think the authors meet the requirements of SWJ wrt. a dataset paper.
---------

Log in or register to post comments
7778 reads

Main menu

Editorial Board

Syndicate

Meta-Data for a lot of LOD

Tracking #: 1282-2494

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Meta-Data for a lot of LOD

Tracking #: 1282-2494

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles