Review Comment:
The manuscript presents an application report on building a stream reasoning system for monitoring several activities in large geographical areas. The work is extended from authors’s previous paper[45] which uses the core technology of theirs in[6]. Authors report their experiences in overcoming the performance issues when dealing with a complicated processing pipeline involving various data operations that are not trivial to build efficient indexes or materialised views.
The content has a good structure with an easy-to-follow storyline that helps the reader to understand the technical changes and why authors have to go through all the hurdles to build their system. However, judging the manuscript as an application report, the report must indicate a deployed application other than just a simulated lab test, therefore, author must provide a detailed real setup for such a application otherwise the paper should go to the "full paper" track with a longer content. Following is the other comments to revise the next version.
1. Definition 1: “if p in enclosed in A”, the meaning of “enclosed” here needs to be more precisely defined.
2. Definition 3: the nearby relation can be defined based on vessels V1 and V2, no need to put p an p’ in the notation nearby(V1.p, V2.p',…) if we consider p is a property of a vessel V, then we can have nearby(V1,V2,…) which is consistent with definition 1 and 2. Then consider consistently use , V.p, then p.x,p.y,p.t in this section.
3. The process of cleaning expired points from third paragraph of section 3.3 needs to be discussed on the locking implication which caused by the concurrency of the multi-core processing context evaluated in the paper. Moreover, this approach needs the assumption that the incoming data has to be in strictly order, so, this assumption needs to be explicitly stated.
4. Section 1, first paragraph mentions ‘windowing” term here, but, there is no further discussion at this point until then introducing window parameter in the evaluation at Section 5.3. I think some definition or description on “window" should be introduced here.
5. Section 5.1, the paper does not compare its to a baseline but itself, so, gives some explanations to why?
6. Section 5.2.2, it’s not clear for me what the temporal threshold of 30s is used for, how is related to this experiment setting.
7. Section 5.3, the experiments use the slide step at 1 hour and window 8 hours (with 31k input events) while the processing time is less than 3.5 second, then, the reported throughputs are ranged from 5-25k events per second. These figures paint a quite inconsistent picture of workload in to me. I think, authors need 1-2 paragraphs to explain the correlation among input throughputs, number of items in windows when an execution step is trigger, and then, how the slide step parameter with play the role in the over processing workload here.
8. I would suggest to use the term “near real-time” or “online” instead of “realtime” as authors used here and there,
9. The paper has several language issues, please invest considerable efforts in the next revision, here are some of them I came across:
-Third paragraph of section 1: typos: “illegal vessel behavior"->“illegal vessel behaviors"
- First paragraph of section 2: "e.g., a vessel is located within and area” -> “an” instead of “and”? ; "various types of suspicious, dangerous or illegeal vessel activity.”-> …”illegal” and “activities” ?
- Second paragraph of section 3.3: "If no cleaning were performed, then too many (old) vessel positions would be retrieved that satisfy the spatial constraint, but would be eliminated due to the temporal constraint, leading to wasteful processing”-> the sentence structure is messy, please rephrase.
- Firsts paragraph of section 4: |to detect various types of suspicious, dangerous and illegal vessel activity”-> activities?
-Section 4.2, "Rule (3) is but one of the possible…”->?
-Authors use “some” with singular nouns in several places, please double check.
|