Context-driven RDF Data Replication on Mobile Devices

Paper Title: 
Context-driven RDF Data Replication on Mobile Devices
Authors: 
Stefan Zander, Bernhard Schandl
Abstract: 
With the continuously growing amount of structured data available on the Semantic Web there is an increasing desire to replicate such data to mobile devices. This enables services and applications to operate inde- pendently of the network connection quality. Traditional replication strategies cannot be properly applied to mobile systems because they do not adopt to changing user information needs, and they do not consider the technical, environmental, and infrastructural restrictions of mobile devices. Therefore, it is reasonable to consider contextual information, gathered from physical and logical sensors, in the replication process, and replicate only data that are actually needed by the user. In this paper we present a framework that uses Semantic Web technologies to build comprehensive descriptions of the user's information needs based on contextual information, and employs these descriptions to selectively replicate data from external sources. In consequence, the amount of replicated data is reduced, while a maximum share of relevant data are continuously available to be used by applications, even in situations with limited or no network connectivity.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Guest Editors
Decision/Status: 
Accept
Reviews: 

Submission to http://www.semantic-web-journal.net/content/special-issue-cfp-real-time-...

This is a revision of a previous submission which was accepted for publication pending only minor editorial changes. The current version has been accepted for publication. The previous version was accepted with minor revisions. The reviews of the previous revision are below, followed by the reviews of the original submission.

Solicited review by Jerome Euzenat:

The revisions did not address all my comments satisfactorily, so some remarks still holds: in particular, the narrative and the title insist on "replication" while the larger part of the paper is about the framework and the experimental results do not concern replication.
However, in spite of this comment, this is now a great paper. The work is substantial, innovative and up-to-date.
The presentation has been largely improved: the paper is far better articulated and reads very easily.

Hence, I think that the paper is ready for publication as it is.

Remarks:
- My main criticism is about the graph plots: it is not possible (at least on the version I printed) to see the results for μjena. I suggest to drop the grey background, to ensure that the lines are in black and that they are distinguishable.
- p1: the last years -> the past years
- p1: heat generation is rather a problem than a quality, so it is strange to put it in line with memory capacity and CPU performance
- p2: thiscontext -> this context
- p2: data,context -> data, context
- p5: combing -> combining
- p8: Berkley -> Berkeley
- p11: exemplary is an adjective, use "sample" instead
- p12: it is not clear what is the meaning of mandatory and optional in ddesc
- p17: and represents the entry-level -> and now represents the entry-level (this was not the case when it was launched)
- p18 (6.2): start by explaining that the behavior on different machines varies quite a lot but that the order between systems is preserved across machines.
- p20: parsing performances "constantly drops" with increasing graph size. I am not sure what is the meaning given to constantly drop (the x-scale is logarithmic). It would be good to have an explanation.

Reviews for the original submission:

Solicited review by Martin Raubal:

This paper presents an important and practically relevant idea regarding the handling of context on mobile devices. The paper is written and structured very clearly and the authors provide good arguments for their approach. However, the evaluation "only" focuses on computation time and ignores the evaluation of the reasoning mechanisms themselves. It would be beneficial to the paper to include more details on the reasoning mechanisms and provide a concrete example within the case study. E.g., What does the data provider concretely do in this example and how? Section 5 really just gives an overview (and it is a relatively simple example without any complexity) without providing details of what exactly happens at each stage (in terms of representation, reasoning, etc.).

Some more details:

p3, 2nd to last paragraph: For an example of dynamic interaction between context- task-, and user models for mobile devices, which is extensible and based on activity theory, see Raubal, M. and I. Panov (2009). A Formal Model for Mobile Map Adaptation. Location Based Services and TeleCartography II - From Sensor Fusion to Context Models. Selected Papers from the 5th International Symposium on LBS & TeleCartography 2008, Salzburg, Austria. G. Gartner and K. Rehrl. Berlin, Springer: 11-34.

p4, col2, 1st para: You should explain why exactly a transformation into qualitative statements is needed.

p4, 2.3: "existing descriptions ..."; "information she / he is interested in";

p5, 3rd para: "prescribed vocabulary";

p6, 3.2: "Those frameworks..."; "processing large data amounts"; "we exclusively concentrate...";

p7, just before 4.: "interested in leading...";

p8, first "- ...": "enhance the overall";

p9, footnote 26: "software sensors";

p10, 1st para: "context provider"; col2, 2nd para: How exactly is the reasoning performed?

p11: "context provider to enrich..."; you need to provide more details on the "applies reasoning rules"!; "data provider that makes...";

p12, 3rd para: "content provider"?; 5.: "mobile user depend...";

p13, 3rd para: "task"; col2, 4th para: "these data";

p14, col2, 2nd para: delete "device" after "256 MB";

p15, 3rd para: "('Triples...";

ad Conclusions: You did not really show what the users' information needs are, how are they determined in general?

Solicited review by Jerome Euzenat:

The title of the paper accurately describes its content. It proposes a
framework for replicating RDF data on mobile devices based on the
present and future context of the device and device holder. Moreover,
the context itself is represented in RDF. It justifies the need to do
so, presents some related work, describes the design of such a
system, introduces the implementation on the Android platform and
evaluates the storage requirements of other triple stores.

First, I think that the work is original and significant. So the paper
deserves to be published. The justification for replication is clear
and performing context-based replication is a good idea; doing it with
a semantic framework for context management is great. The paper itself
is a good conceptual paper: it provides an innovative design for a
class of systems that achieve context-driven RDF data replication. It
would be more convincing with an implementation of the framework that
actually works. It is not clear that this is the case. The experiment
does not help, because it does not deal with the main task of the
system.

I have two main criticisms for the paper:

The organization could be greatly improved. Indeed, Section 2 and 3
are not particularly about replication. Replication only appears in
the end of both Sections. In Section 4, the description of the
component which implements replication is not found. It seems that the
replication strategy is left to the implementation of Data
providers. However, describing how such a strategy can be implemented
is necessary (the link between context and fetching, which kind of
reasoning can go there, what strategy is used, etc.). Given
the title of the paper and its originality, replication should be the
main guiding line of the presentation.

The second criticism is related. The experiment evaluates data
stores. It is interesting in its own right, but it has nothing to do
with replication. What I would expect, in addition, is a monitoring of
a three day use case showing that the size of the database with
replication is low relatively to the size of the database without
it. This is obvious, but this is the goal. It would also be
interesting to see this size variation, the connection attempts and
failed access at data. That would rather show the interest of
replication.

I would advise that this paper be only published once there is a link
to where to retrieve/buy the system. It will only be interesting when
semantic web application developers can use it.

Otherwise, I have only minor comments and remarks:
- "Web 2.0" is rather a marketing term as far as I am concerned. So, I
do not think it has its place in an academic paper. It is better to
tell which feature of Web 2.0 applications matters.
- the introduction starts with the assertion that mobile devices are
always connected and the whole argumentation of the paper is based on
the idea that this is not true... it may be better to mitigate the
first statement.
- It would be useful to motivate the paper with an example. The first
paragraph of Section 5 could be put there with the description of what
does the user expects. In addition, Section 5 should be more
precise about what data is actually replicated.
- Section 2, rather justifies the use of semantic technologies.
- 2.1: Many definitions have been proliferated. I thinks that
proliferation entails multiplication already.
- Is "Basically... employed" really useful?
- In Section 3.2, you may have a look at the two demonstrations who
won the best demo award at ISWC 2010. They are relevant to this topic.
- The end of Section 2 and 3 should summarise what has been found from
studying the litterature: Applications would benefit from replication,
RDF mobile databases do not support replication, etc.
- It is OK that Section 4 does not commit to an implementation
target. However, it should be said earlier in the paper either what
platform is targeted or that this work is supposed to be target
independent.
- It would be worth to use also the context for deciding when the next
connection will be available (something like what IYOUIT does by
learning user patterns). Of course, this is only suggestion for
future work.
- Fig. 1: I would draw a line around those components which run on the
mobile (almost all) and those which do not. It is also strange to have
a figurative RDF graph for the context model and not for the triple
store.
- It is a pity that Section 4 is platform independent until the last
column which introduces Android-related features. Try to keep this
indendent.
- Concerning the experimentation, it seems to me that there are two
limiting factors in Android systems: RAM and ROM. The RAM is used for
running the applications while the ROM is used for storing persistent
databases (at least the SQLite databases). I would be interested to
know what is the limiting factor in this case (given that there are
devices in which the ROM is relatively small). If ROM is relevant, it
would be worth telling how much ROM is left available before starting.
- Another suggestion is to consider dumping some data sources
to external memory (SD card), instead of fully discard it and
arbitrating between these various storage available.

Tags: 

Comments

We thank the reviewers for their helpful comments. We have considered them in a revised version. Please find below our answers and comments to the reviews.

Stefan Zander and Bernhard Schandl

REVIEWER 2

"However, the evaluation "only" focuses on computation time and ignores the evaluation of the reasoning mechanisms themselves. It would be beneficial to the paper to include more details on the reasoning mechanisms and provide a concrete example within the case study. E.g., What does the data provider concretely do in this example and how?"
-> As described in the paper, we have presented a lightweight generic reasoning engine. This engine has to be customized according to the needs of the particular application, either using a rule-like language or by integrating new reasoning steps via Java classes. To make this more clear, we have added a more detailed description of the reasoning component plus concrete examples in Section 4. An evaluation of the reasoning component as such is however out of the scope of this work.
-> We also included detailed information regarding the role and tasks of data providers for RDF data replication and complemented our descriptions with a concrete example and a code snippet of a data provider that replicates location-based data from DBpedia.

"Section 5 really just gives an overview (and it is a relatively simple example without any complexity) without providing details of what exactly happens at each stage (in terms of representation, reasoning, etc.)."
-> We have added more examples of processed RDF data, as well as more detailed descriptions of the system components that are responsible for replication, including a code example, to Section 5.

p3, 2nd to last paragraph: For an example of dynamic interaction between context- task-, and user models for mobile devices, which is extensible and based on activity theory, see Raubal, M. and I. Panov (2009). A Formal Model for Mobile Map Adaptation. Location Based Services and TeleCartography II - From Sensor Fusion to Context Models. Selected Papers from the 5th International Symposium on LBS & TeleCartography 2008, Salzburg, Austria. G. Gartner and K. Rehrl. Berlin, Springer: 11-34."
-> This work has been integrated in Section 2.1 to illustrate the orientation of more recent context awareness approaches towards a flexible and integrated (holistic) processing of contextual information.

p4, col2, 1st para: You should explain why exactly a transformation into qualitative statements is needed.
-> Several reasons for transforming quantitative raw-sensorial data into qualitative statements are outlined in the penultimate paragraph of Section 2.2. Such reasons are for instance to express complex conceptual relationships and dependencies, apply classification-based reasoning techniques, unify access and utilization of contextual information among applications, and simplify context sharing and exchange since context consumer do not need to be familiar with low-level data processing and interpretation.

p4, 2.3: "existing descriptions ..."; "information she / he is interested in";
-> corrected

p5, 3rd para: "prescribed vocabulary";
-> corrected

p6, 3.2: "Those frameworks..."; "processing large data amounts"; "we exclusively concentrate...";
-> corrected

p7, just before 4.: "interested in leading...";
-> corrected

p8, first "- ...": "enhance the overall";
-> corrected

p9, footnote 26: "software sensors";
-> corrected

p10, 1st para: "context provider"; col2, 2nd para: How exactly is the reasoning performed?
-> corrected

p11: "context provider to enrich..."; you need to provide more details on the "applies reasoning rules"!; "data provider that makes...";
-> We extended the description of reasoning about acquired contextual information in the Context Dispatcher paragraph since this component is responsible for creating a global context model. Additionally, we added a concrete example of a reasoning rule for context aggregation and consolidation that simplifies further context processing by merging multiple related resources.

p12, 3rd para: "content provider"?; 5.: "mobile user depend...";
-> corrected

p13, 3rd para: "task"; col2, 4th para: "these data";
-> corrected

p14, col2, 2nd para: delete "device" after "256 MB";
-> corrected

p15, 3rd para: "('Triples...";
-> corrected

"ad Conclusions: You did not really show what the users' information needs are, how are they determined in general?"
-> This is a very valid question, however it is one that (in our opinion) cannot be addressed in general. Users' information needs will always depend on their background, the tasks they have to perform, and their interests. It is not the goal of the presented framework to provide an all-in-one-solution for every possible information need a user may encounter. Our framework has been designed as an infrastructure that provides a set of services to applications, in order to disburden application developers from taking care of these steps.

REVIEWER 3

"The organization could be greatly improved. Indeed, Section 2 and 3 are not particularly about replication. Replication only appears in the end of both Sections. In Section 4, the description of the component which implements replication is not found. It seems that the replication strategy is left to the implementation of Data providers. However, describing how such a strategy can be implemented is necessary (the link between context and fetching, which kind of reasoning can go there, what strategy is used, etc.). Given the title of the paper and its originality, replication should be the main guiding line of the presentation."
-> As stated above, we have added a more detailed description of the replication component, including detailed examples of RDF code as well as a code snippet that demonstrates how one particular data provider is implemented. We hope that these details will help to clarify the reader's picture on which steps are taken during replication.

"The second criticism is related. The experiment evaluates data stores. It is interesting in its own right, but it has nothing to do with replication. What I would expect, in addition, is a monitoring of a three day use case showing that the size of the database with replication is low relatively to the size of the database without it. This is obvious, but this is the goal. It would also be interesting to see this size variation, the connection attempts and failed access at data. That would rather show the interest of replication."
-> To give the reader a more clear understanding on the amounts of data we are talking about, we have added numbers about replicated data towards the end of Section 5. We compare the size of a base data set (in this example, information about places in DBpedia) with the number of triples that are replicated based on the information found in a user's calendar. A general discussion of this issue is however difficult, since it depends on the concrete application, the context that can be analyzed by the framework, and the data sets from which subsets are to be replicated.

"I would advise that this paper be only published once there is a link to where to retrieve/buy the system. It will only be interesting when semantic web application developers can use it."
-> The system is currently in the process of being transformed into an industrial solution. Due to intellectual property rights, we cannot fully disclose the source code of the system at the moment; however, we have added a relevant code fragment to Section 5 so that the reader is able to get an impression on the inner workings of the system. We have further added contact information (web link) to Section 1.

"'Web 2.0' is rather a marketing term as far as I am concerned. So, I do not think it has its place in an academic paper. It is better to tell which feature of Web 2.0 applications matters."
-> We have replaced this term and shifted the focus towards the Web of Data.

"the introduction starts with the assertion that mobile devices are always connected and the whole argumentation of the paper is based on the idea that this is not true... it may be better to mitigate the first statement."
-> We have mitigated this part of the introduction accordingly.

"It would be useful to motivate the paper with an example. The first paragraph of Section 5 could be put there with the description of what does the user expects. In addition, Section 5 should be more precise about what data is actually replicated."
-> We have motivated the updated version of our paper through a typical use case of a knowledge worker that is on a business trip including several meetings and he/she can not rely on a stable network connection. Our motivating example describes parts of his/her daily working data items and related Linked Data sources from where data which might become relevant in the near future can be replicated to the device. Section 5 further gives examples of data that are actually replicated, as well as code examples of the components that are responsible for replication.

"Section 2, rather justifies the use of semantic technologies."
-> We extended this part of Section 2 and included more specific details regarding the role and benefits of using Semantic Web technologies and languages for context-aware computing.

"2.1: Many definitions have been proliferated. I thinks that proliferation entails multiplication already."
-> Yes, we have corrected this statement.

"Is "Basically... employed" really useful?"
-> corrected

"In Section 3.2, you may have a look at the two demonstrations who won the best demo award at ISWC 2010. They are relevant to this topic."
-> We have included both papers in our related work section and discussed their relevance according to our work.

"The end of Section 2 and 3 should summarize what has been found from studying the literature: Applications would benefit from replication, RDF mobile databases do not support replication, etc."
-> A discussion and summary is now included in both sections.

"It is OK that Section 4 does not commit to an implementation target. However, it should be said earlier in the paper either what platform is targeted or that this work is supposed to be target independent."
-> Since the proposed framework is not bound to a specific mobile platform, we changed the corresponding parts in Section 4 and instead provided platform-relevant details in Section 5, which describes concrete implementation details and our case study.

"It would be worth to use also the context for deciding when the next connection will be available (something like what IYOUIT does by learning user patterns). Of course, this is only suggestion for future work."
-> Indeed, this fact has been addressed in our future work section.

"Fig. 1: I would draw a line around those components which run on the mobile (almost all) and those which do not. It is also strange to have a figurative RDF graph for the context model and not for the triple store."
-> Done.

"It is a pity that Section 4 is platform independent until the last column which introduces Android-related features. Try to keep this independent."
-> We have adapted the corresponding parts in Section 4 and provided the relevant information regarding platform-specific implementation in Section 5.

"Concerning the experimentation, it seems to me that there are two limiting factors in Android systems: RAM and ROM. The RAM is used for running the applications while the ROM is used for storing persistent databases (at least the SQLite databases). I would be interested to know what is the limiting factor in this case (given that there are devices in which the ROM is relatively small). If ROM is relevant, it would be worth telling how much ROM is left available before starting."
-> As obvious from our evaluation, the limiting factor is the internal memory. We have addressed that in Section 6.

"Another suggestion is to consider dumping some data sources to external memory (SD card), instead of fully discard it and arbitrating between these various storage available."
-> We have extended the evaluation towards parsing and storing RDF data where we also consider external memory.