Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

DIAERESIS: RDF Data Partitioning and Query Processing on SPARK

Submitted by Georgia Troullinou on 09/29/2023 - 15:17

Tracking #: 3554-4768

Authors:

Georgia Troullinou

Giannis Agathangelos

Haridimos Kondylakis

Kostas Stefanidis

Dimitris Plexousakis

Responsible editor:

Aidan Hogan

Submission type:

Full Paper

Abstract:

The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on one hand, to minimize data access and on the other hand to group data usually queried together. This is translated into limited improvement in terms of efficiency in query answering. In this paper, we present DIAERESIS, a novel platform that accepts as input an RDF dataset and effectively partitions it, minimizing data access and improving query answering efficiency. To achieve this, DIAERESIS first identifies the top-k most important schema nodes, i.e., the most important classes, as centroids and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under. Our algorithm enables fine-tuning of data distribution, significantly reducing data access for query answering. We experimentally evaluate our approach using both synthetic and real workloads, strictly dominating existing state-of-the-art, showing that we improve query answering in several cases by orders of magnitude.

Full PDF Version:

swj3554.pdf

Previous Version:

DIAERESIS: RDF Data Partitioning and Query Processing on SPARK

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/isl/DIAERESIS

Decision/Status:

Log in or register to post comments
979 reads

Main menu

Editorial Board

Syndicate

DIAERESIS: RDF Data Partitioning and Query Processing on SPARK

Tracking #: 3554-4768

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

DIAERESIS: RDF Data Partitioning and Query Processing on SPARK

Tracking #: 3554-4768

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles