A Query Language for Semantic Complex Event Processing: Syntax, Semantics and Implementation

Tracking #: 1792-3005

Authors: 
Syed Gillani
Antoine Zimmermann
Gauthier Picard
Frédérique Laforest

Responsible editor: 
Oscar Corcho

Submission type: 
Full Paper
Abstract: 
The field of Complex Event Processing (CEP) relates to the techniques and tools developed to efficiently process pattern-based queries over data streams. The Semantic Web, through its standards and technologies, is in constant pursue to provide solutions for such paradigm while employing the RDF data model. The integration of Semantic Web technologies in this context can handle the heterogeneity, integration and interpretation of data streams at semantic level. In this paper, we propose and implement a new query language, called SPAseq, that extends SPARQL with new Semantic Complex Event Processing (SCEP) operators that can be evaluated over RDF graph-based events. The novelties of SPAseq includes (i) the separation of general graph pattern matching constructs and temporal operators; (ii) the support for RDF graph-based events and multiple RDF graph streams; and (iii) the expressibility of temporal operators such as Kleene+, conjunction, disjunction and event selection strategies; and (iv) the operators to integrate background information and streaming RDF graph streams. Hence, SPAseq enjoys good expressiveness compared with the existing solutions. Furthermore, we provide an efficient implementation of SPAseq using a non-deterministic automata (NFA) model for an efficient evaluation of the SPAseq queries. We provide the syntax and semantics of SPAseq and based on this, we show how it can be implemented in an efficient manner. Moreover, we also present an experimental evaluation of its performance, showing that it improves over state-of-the-art approaches.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Jan/2018
Suggestion:
Minor Revision
Review Comment:

The authors addressed all my comments and I believe that the new version of
the paper is a clear improvement over the previous one and can be accepted for
publication after some minor changes (see below). I am particularly satisfied
with the changes in Section 2, 3, and 8.

I have a final doubt regarding the background knowledge: the authors mention
in Section 8.2, question 2, that SPA_SEQ is more efficient than EP-SPARQL
because it joins background KB triples only when a match occur. Does this mean
that the information in these triples cannot be used for matching? For
instance, would it be possible to trigger a match only if two events come from
the same city, assuming that the name of the city can only be determined by
joining background information?

I suggest that the authors better discuss this aspect when presenting their
model.

Review #2
By Jean Paul Calbimonte submitted on 30/Jan/2018
Suggestion:
Minor Revision
Review Comment:

The authors produced a new and substantially enhanced version of the manuscript.
Some of the main additions are: the expanded experimentation, use cases/queries, and reworked presentation of the paper, putting more emphasis on the main contributions, and placing additional material in annexes. Several small details remain to be fixed. I noted some of them in the detailed comments section below.
However, there are still some major observations. One of them is related to the experimentation. In the previous review, one could observe that the datasets and queries chosen in the examples and the evaluation do not really show why it is absolutely necessary to use RDF or semantics-based data models and query languages. Therefore, the decision to use SCEP instead of CEP is still weakly motivated. In fact it seems pretty feasible to use standard CEP to solve the queries presented in the use-cases and evaluation. Although this is a question that can be posed to the entire RSP/Stream reasoning communities, this paper still provides no convincing answer.
furthermore, by looking at Figure 18, it seems clear that the GPM (and even the parsing) are not negligible overhead, which can be a key decision factor not to choose SCEP.
The other issue found in the evaluation, is the choice of only taking CPU time as evaluation metric. Why is it not interesting to also look at throughput or latency (e.g. end to end time responses)? There seems to be no clear explanation.

The author's new version has addressed several concerns raised previously, and the new manuscript is of much higher quality. Nevertheless, to me it would be important to address or discuss issues such as the ones i just raised, given that there lacks (in general) sufficient motivation to justify the use of semantic-based technologies for streams (in this case SCEP).
Even in the use cases presented, it would seem that normal CEPs could do the job anyway.
Perhaps these remaining issues can be addressed with a discussion that puts these aspects into consideration.

* Abstract:
relates to -> it doesn't relate to, only. It is rather the main goal of.
novelties … includes -> grammar error
and (iii) … and (iv) -> too many ands
keywords: add RDF stream processing?

* Introduction:
CEP definition -> reference any authoritative papers/book?
streams, and their proposed languages -> add comma
trend of using RDF as a unified data model -> reference work on general RDF data integration
It may be useful to reference well known works on data integration with RDF, so that readers can be convinced that this is also a good ide for streams.
I disagree that RSP languages refers only to things like CSPARQL, etc. To me RSP refers to any type of processing, which could include CEP, even reasoning, etc. However, this is debatable.
efficeint -> efficient?
This result -> It results?
The most important feature … -> I am not sure that the separation is the most important feature of this work. However, it is only a personal opinion.
incoming streams by run id -> at this point in the introduction this is meaningless information. Maybe in can be explained at a higher level.
The first contribution reads as follows "we present design/syntax of Spaseq through intuitive examples". It would be better to put as first contribution something more important. Like "we present Spaseq, a novel SCEP language, including a full description of its syntax and semantics".
We show …. show that they outperform -> two "shows"

* Motivation
involves in determining -> ??
to determine -> too many determines in smae sentence
the financial loss -> financial loss
"Big -> bad quotes, use `` in latex
or or a series -> or a series
require RDF graph-based -> require an RDF graph-based
while in 5 -> while in UC 5
strategies determines -> determine
a static background knowledge -> remove "a"

* Related Work
cumbersome -> perhaps not the best wording
are evolved -> ??? consider rewording
only support -> only supports
os triples -> ??? of?
user is not able -> the user …
it provides simple formalism -> a simple formalism

* Section 4
NQaud ->Nquad?

* Formal semantics
The window based semantics defined in Spaseq are a fundamental design choice. It would be appreciated if this is explicitly explined at this point. In fact, it is different form what we see in Etalis, in principle.
an Bop expression -> a Bop expression

* Implementation
a prolog like languages -> rewording
However, consider -> considering?
non-determinism behavior -> not deterministic? maybe check right term

* Optimizations
terms of GPM -> temrs of the GPM
with an n events -> ??
Bop -> Is it Bop or Bop?
a new event arrive -> arrives?
technique of pushing window ->window pushing
either is complete-> are?
a new runs -> run

* Experiments
does not increases -> …
would remains -> would remain
cannot be consider -> considered