SPARQL2FLINK: Evaluation of SPARQL queries on Apache Flink

Tracking #: 2266-3479

This paper is currently under review
Oscar Ceballos
Carlos Ramirez
María-Constanza Pabón
Andres Mauricio Castillo
Oscar Corcho

Responsible editor: 
Ruben Verborgh

Submission type: 
Full Paper
Increasingly larger RDF datasets are being made available on the Web of Data, either as Linked Data, via SPARQL endpoints or both. Existing SPARQL query engines and triple stores are continuously improving to handle larger datasets. However, there is an opportunity to explore the use of Big Data technologies for SPARQL query evaluation. Several approaches have been developed in this context, proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based Ecosystems. New trends in Big Data Technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher performance data processing. In this paper, we present an approach for transforming a given SPARQL query into an Apache Flink program for querying massive static RDF data. An implementation of this approach is presented, and a preliminary evaluation with an Apache Flink cluster is documented. This is the first step towards our main goal, efforts to ensure optimization of the system need to be done.
Full PDF Version: 
Under Review