Topic profiling benchmarks: issues and lessons learned

Tracking #: 1604-2816

This paper is currently under review
Blerina Spahiu
Andrea Maurino
Robert Meusel

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
Topical profiling of the datasets contained in the Linking Open Data cloud diagram (LOD cloud) has been of interest for a longer time. Different automatic classification approaches have been presented, in order to overcome the manual task of assigning topics for each and every individual new dataset. Although the quality of those automated approaches is comparable sufficient, it has been shown, that in most cases, a single topical label for one datasets does not reflect the variety of topics covered by the contained content. Therefore, within the following study, we present a machine-learning based approach in order to assign a single, as well as multiple topics for one LOD dataset and evaluate the results. As part of this work, we present the first multi-topic classification benchmark for the LOD cloud, which is freely accessible and discuss the challenges and obstacles which needs to be addressed when building such benchmark datasets.
Full PDF Version: 
Under Review