Survey of Model and Architectures for a Restricted and Local Mobile Access to the Web of Data

Tracking #: 2634-3848

Authors: 
Mahamadou Toure
Kaladzavi Guidedi
Fabien Gandon
Moussa Lo
Pascal Molli
Christophe Guéret

Responsible editor: 
Ruben Verborgh

Submission type: 
Survey Article
Abstract: 
Mobile Access to the Web of Data is currently a real challenge in developing countries, mainly characterized by limited Internet connectivity and high penetration of mobile devices with the limited resources (cache, memory, etc.). In this paper, we survey and compare proposed solutions (models, architectures, etc.) that could contribute to solve this problem of mobile access to the Web of Data with intermittent Internet access. These solutions are discussed in relation to the underlying network architectures and data models considered. We present a conceptual study of peer-to-peer solutions based on gossip protocols dedicated to design the connected overlay networks. In addition, we provide a detailed analysis of data replication systems generally designed to ensure the local availability of data on the system. We conclude with some recommendations to achieve a connected architecture that provides mobile contributors with local access to the Web of data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Edgard Marx submitted on 01/May/2021
Suggestion:
Major Revision
Review Comment:

The paper presents a “survey of models and architectures for restricted and local access to the Web of data”. The authors attempt to enumerate and discuss existing and possible approaches that could be used to facilitate RDF access, particularly in remote communities with limited internet connectivity.

In general, the idea is valuable and the paper is well written.
However, the are several points that should be addressed before it reaches the level of being published.
I tried to collect them on different topics below.

== Title and Motivation

It is not clear to me in which sense the approaches listed in the work are particularly fit for “mobile” data access. In fact, I could not find a single approach designed exclusively for mobile applications.
I would rather remove the word “mobile” from the title.
I also would like to see other examples of “offline” Linked Data usage.
For instance, historical facts (such as birthdates or birthplaces) do not change and some knowledge bases are updated monthly or yearly.
It seems to me that there are several scenarios in which online query capabilities are not really necessary.

==Content

The first concern is that the authors mixed peer-to-peer architecture and client-server. They are different and should be distinguished. For instance, approaches relying on SPARQL use the client-server architecture, and seems to me that all RDF approaches cited in your paper are client-server based. I think the different types of architectures deserve a discussion, perhaps a subsection. Youcan later related architectures, protocols advantages, and disadvantages I provide a list of peer-to-peer RDF solutions later in this section.

I would hardly advocate to change your survey for something like “Survey of Models and Architectures for Linked Data Access” and complement the missing works, e.g. HDT (https://www.rdfhdt.org/publications/) and Linked Data Fragments (https://linkeddatafragments.org/publications/) (i.e. https://ieeexplore.ieee.org/abstract/document/8334472).

One of my biggest concerns is regarding the method that was used for selecting the works that were surveyed. There are a lot of works from the Semantic Web that were ignored and do deserve a citation and discussion.

I highlight some of them in the list below.
Please revise and check if I am not missing something else.
Create a systematic method for finding the related work and discuss it in an early section, it should look like chapter two of this work https://www.researchgate.net/publication/284200058_Quality_assessment_fo....
It would also be helpful if you provide the method used for finding and selecting the related work in the introduction or a separated chapter.
I suggest organizing these works into different categories:

-> Cache

I would avoid citing cache systems, they are often part of distributed architectures, but they are not the same. Cache systems operate over tiny chunks of data mapped to a particular query. Thus, can not provide subsets over the cachable data. There are also queries that never happen i.e. users do not query the whole datasets. However, if you think that these approaches are important, I suggest revising your survey and include several works missing. Do not forget to discuss the relation of these systems with the architectures themselves.

A Survey of HTTP Caching Implementations on the Open Semantic Web
https://link.springer.com/chapter/10.1007/978-3-319-18818-8_18

Improving the performance of semantic web applications with SPARQL query caching
https://link.springer.com/chapter/10.1007/978-3-642-13489-0_21

A cache-based method to improve query performance of linked Open Data cloud
https://link.springer.com/article/10.1007%2Fs00607-020-00814-9

Graph-Aware, Workload-Adaptive SPARQL Query Caching
https://dl.acm.org/doi/10.1145/2723372.2723714

-> Architectures

KBox — Transparently Shifting Query Execution on Knowledge Graphs to the Edge
https://www.researchgate.net/publication/305410480_KBox_--_Transparently...

A decentralized architecture for SPARQL query processing and RDF sharing: A position paper
https://ieeexplore.ieee.org/abstract/document/8334472

A Decentralized Architecture for Sharing and Querying Semantic Data
https://link.springer.com/chapter/10.1007/978-3-030-21348-0_1

-> Other relevant works

A Demonstration of the Solid Platform for Social Web Applications
https://dl.acm.org/doi/10.1145/2872518.2890529

A Survey of Structured P2P Systems for RDF Data Storage and Retrieval
https://link.springer.com/chapter/10.1007/978-3-642-23074-5_2

Query Processing in RDF/S-Based P2P Database Systems
https://link.springer.com/chapter/10.1007/3-540-28347-1_4

==Format & Structure:

I found it extremely difficult to evaluate the approaches that you listed in your work. You mixed models and architectures of the general Web with some specifically designed for the Semantic Web. Please organize it differently.
Chapter 3 and 4, give an overview of techniques for the Web and, in a different chapter, discusses methods that are used in Semantic Web, relating them with chapters 3 and 4. That will make it much simpler to read your work.

==Writing

On several occasions, you do not use space between citations e.g. “protocols[13]”. The proper way of doing it is by adding a space between the last word and the citation. Please use the macro “~\cite{}”.
That will ensure there is a space and that the citation does not occur in the other line.

In general, the text is well written except for few words that seem to be written in old archaic English. I would recommend using the modern form: amoung (among), signalling (signaling), behaviour (behavior), availabillity (availality)

I also would suggest the use ‘Z’ which is accepted in both American and British English instead of ‘S’ in some words: decentralised

The second suggestion is to remove etc from your text, if you are just giving some examples you really do not need it. A sentence “models, architectures,etc.” can be written, “such as models and architectures”. It reads better.

Some of the spelling mistakes are, but not limited to:

-Title

“Survey of Model…” -> Survey of Models...

- Abstract

“could contribute to solve...” -> could contribute to solving

- Introduction

“Overcomed” -> overcome

“manipulate Web contents” -> manipulate Web content

“By analogy to traditional Web (of documents)” -> By analogy to traditional Web (of documents),

“responses to users needs” -> responses to users’ needs

“Furthermore” -> Furthermore,

“nodes failure” -> node faulure

“In such model,” -> In such a model,

“the number of the smartphone “ -> the number of smartphone

“In recent years, peer-to-peer” -> In recent years, the peer-to-peer

“classes of the superposed networks” -> classes of superposed networks

“problem of the intermittent access” -> problem of intermittent access

“For each of these approaches” -> For each of these approaches,

“Section 3 discusses logical” -> Section 3 discusses the logical

-Section 3

“peer-to-peer networks have made” -> peer-to-peer networks has made

“gossip based” -> gossip-based

“the peers selection” - the peers’ selection

Section 3.1.1

“Ganesh et al. [14] presents” -> Ganesh et al. [14] present

“Jelasity et al. [11] introduces” -> Jelasity et al. [11] introduce (and many more from this mistake)

“that are pronned to malicious behaviour” -> that are prone to malicious behavior

3.2

“members of same cluster” -> members of the same cluster.

“a geographical, semantic, profile” -> a geographical-semantic profile

*I think “amoung” (used several times in your text) is not used anymore, the right spelling is “among”

“This results on a structure named the target graph” -> These results in a structure named the target graph

“connected and of periodically providing” -> connected and periodically providing

“The sampling approch follows” -> The sampling approach follows

“relies on a attack-resilient” -> relies on an attack-resilient

“as an computational entity that” -> as a computational entity that

-Section 4

“In this section we” -> In this section, we

“applied to solve data intensive problems” -> applied to solve data-intensive problems

“devices/sensors (users) as an middle” - > devices/sensors (users) as a middle

“consists of both linux kernel” -> consists of both Linux kernel

“unit gathers raw sensory data and execute” -> unit gathers raw sensory data and executes

“and pass them on to the higher level fog unit” -> and passes them on to the higher level fog unit

-Section 5

“(models, architectures,etc.)” -> (models, architectures, etc.)

Review #2
By Carsten Keßler submitted on 19/Aug/2021
Suggestion:
Major Revision
Review Comment:

# Review: ‌Survey of Model and Architectures for a Restricted and Local Mobile Access to the Web of Data

[SWJ 2634](http://www.semantic-web-journal.net/content/survey-model-and-architectur...)

Per the title, this article attempts to provide a survey of model (modelS, I guess) and architectures that enable participation in the web of data in conditions with restricted (e.g. low bandwidth, "patchy" connectivity) internet access. The work is motivated with a scenario where several participants in a Jazz festival access and modify data about the festival.

The article covers an area of work that is at the intersection of the two research areas "Web of Data" (Linked Data, Semantic Web, …) and Peer-to-Peer networks – specifically those based on gossip protocols. The title does not really reflect this in its current form, as "Restricted and Local Mobile Access" could also be realised using other approaches (as mentioned, but not further discussed, in the article). Generally, the article struggles to clearly outline what field it is about, as it "lingers" between these two areas of work. One can tell that the authors have struggled to decide what to include from either side, and what to leave out. In my opinion, the authors need to clearly delineate what this survey is about and revise it with a focus on this specific area (and a new title).

Some specific comments:

- Introduction: I'm sure there must be more recent stats about smartphone owners on the African continent. Moreover, and more importantly, the second promised contribution – a classification of approaches dedicated to designing data sharing systems adopting an RDF data model – needs to be improved; see below.
- P. 1, L. 44: "structure of unstructured architectures" seems contradictory to me.
- The Jazz festival scenario in section two helps motivate the work, but it stands a bit isolated from the rest of the article. I think the scenario is useful, but it would be good to come back to it in the remainder of the article and exemplify differences between different communication models based on the scenarios, for example.
- The quality of fig 1 needs to be improved
- The connection between sections 3 and 4 – which is basically the borderline between the two areas of work covered here – is not clear. I think it would really benefit the article if you could create a clear connection here.
- Section 4: For this journal, I don't think it is necessary to introduce RDF to the extent done here.
- Section 4.1.2 left me a bit puzzled. Comparing an RDF serialisation (RDF/XML) to storage systems (Jena and Sesame) seems weird to me. They are two different things and there are many other (IMO more practical) RDF serialisations and also other graph databases and triple stores than can store RDF data (which are then read and written in those different serialisations).
- Section 4.1.3: RDF data are not necessarily XML documents (see above). More importantly, I was surprised to see no mentioning of SPARQL federated queries in this context.
- Same section: "Much work has been done to build infrastructures..." – if that's the case, please cite at least some examples.
- At the end of the section, please explain why you focus on graph replication and cloud/fog computing.

This manuscript was submitted as 'Survey Article', I am therefore following the journal's recommendations and structuring the remainder of the review according to the criteria for this type of article:

*(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic:*

Being new to this particular intersection of research areas (peer-to-peer computing and Web of Data), I had trouble finding out how the two areas are really connected and depend on each other. This needs to be more clearly defined and explained – otherwise, they just remain two separate technologies that are used together.

*(2) How comprehensive and how balanced is the presentation and coverage:*

Not being an expert in peer-to-peer computing, I cannot judge this part; however, there are several issues that need to be addressed in the parts on the Web of Data, as listed above.

*(3) Readability and clarity of the presentation:*

Readability is generally okay-ish, but the language needs to be thoroughly proof-read (particularly articles – many either missing or superfluous). The structure needs to be improved as explained above.

*(4) Importance of the covered material to the broader Semantic Web community:*

Besides the lack of focus discussed in the beginning of the review, this is probably the biggest weakness of the manuscript. I am not convinced that this very specific area of (admittedly very interesting!) research is of an importance to the broader community to warrant a survey article.

Review #3
Anonymous submitted on 29/Aug/2021
Suggestion:
Major Revision
Review Comment:

In this paper, authors surveyed and summarized existing solutions to the problem of mobile access to the web of Data with intermittent internet access. As a survey paper, this paper provides a relatively systematic overview on previous studies and compared different solutions. Its structure is very clear and easy to follow. Some suggestions and comments are listed below:
(1) The title and introduction indicated that the paper is surveyed for regions with restricted and Local Mobile Accesses to the Web of Data (e.g., Africa). I am wondering how this survey as well as the solutions mentioned in the survey would help solve the problem. It seems to me this survey is summarized in a more general sense, rather than specifically for the regions mentioned above.
(2)The motivating example is easy to follow. However, a pictorial illustration would be much more helpful for the audience, especially for readers who are new yet interested in this field.
(3) Some typos need fixing. For instance, in page 6, line 43 and 44, make sure the verb form after 'et al.' to be consistent throughout the paper in either singular or plural form. e.g., show -> shows if you want to align with similar situations in other contexts. other typos, such as Amoung -> among, need to change as well. Page 15, line 7, datas -> data.
(4) Table 1, 2 and 3 compared different approaches surveyed in this section from different aspects. I am wondering how you come up with these characteristics used in the comparison and whether they are representative, comprehensive and systematic. For example, in section 3.2.2, you chose Similarity metric, Semantic clustering and clustering by gossip. It seems that those terms are partially overlapping and are of different levels.
(5) The paper surveyed four central concepts: (Graph Replication, Distributed Graph and DHT, Distributed Graph and Semantic Overlay, Cloud and Fog Computing); however, the relationships among them are not well-discussed.
(6) Optimistic and pessimistic replication approaches are mentioned. Their comparison (advantages vs. disadvantages) is missing.
(7) Page 12, a CRDT (Commutative Replicated Data Type) was first mentioned in line 49 but more details were introduced latter in line 25. It would be more reasonable to introduce when it was first mentioned.
(8) Page 14, line 15, the differences between fog computing and edge computing are not clearly discussed.
(9) Page 14, line 17, MEC was not introduced before while AMC was introduced but not discussed.