Knowledge Base Quality Assessment Using Temporal Analysis

Tracking #: 1596-2808

This paper is currently under review
Rifat Rashid
Giuseppe Rizzo
Nandana Mihindukulasooriya
Oscar Corcho

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence. The quality of such knowledge bases can drastically affect the decisions being taken by any algorithm, thus, for instance, affecting the classification of an email or the final policy maker choice. Establishing checks to ensure a high-level quality of the knowledge base content (i.e. data instances, relations, and classes) is at utmost importance. In this paper, we present a novel knowledge base quality assessment approach that relies on temporal analysis. The proposed approach compares consecutive knowledge base releases to compute quality measures that allow detecting quality issues. In particular, we considered four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. In particular, a prototype has been implemented using the R statistical platform. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly between the two case studies. The Completeness characteristic is extremely effective and was able to achieve 95% precision in error detection. The proposed approach delivered good performances. The measures are based on simple operations that make the solution both flexible and scalable.
Full PDF Version: 
Under Review