Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources

Tracking #: 1957-3170

This paper is currently under review
Mohamed Nadjib Mami
Damien Graux
Simon Scerri
Hajira Jabeen
Sören Auer

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Full Paper
During the last two decades, a huge leap in terms of data formats, data modalities, and storage capabilities has been made. As a consequence, dozens of storage techniques have been studied and developed. Today, it is possible to store cluster-wide data easily while choosing a storage technique that suits our application needs, rather than the opposite. If different data stores are interlinked and queried together, their data can generate valuable knowledge and insights. In this study, we present a unified architecture, which uses Semantic Web standards to query heterogeneous Big Data stored in a Data Lake in a unified manner. In a nutshell, our approach consists of equipping original heterogeneous data with mappings and offering a middleware able to aggregate the intermediate results in a distributed manner. Additionally, we devise an implementation, named Squerall,that uses both Apache Spark and Presto as an underlying query engines. Finally, we conduct experiments to demonstrate the feasibility, efficiency and solubility of Squerall in querying five popular data sources.
Full PDF Version: 
Under Review