Review Comment:
This manuscript was submitted as 'Survey Article' and reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.
I was also asked to read the current re-submission "as a new one", as I was not involved in the review of the first submission.
Introduction:
Work is partly motivated by the importance of the government sector for Linked Open Data, which I agree. However, in my view the following paragraphs don't do very much for making the case of why we should study something as specific as methodologies for Linked Open Government Data and how is different from a Linked Open Data publishing methodology (without government).
One paragraph is about how few RDF datasets have been published, providing a citation on how open data publishing is evaluated according to a legal perspective and not about how useful it is, that begs the question, if the problem is about how the evaluation is done from a government perspective, then why we should care about the methodologies?
The next paragraph describes some problems identified with Linked Open Government Data methodologies. They are all fair and relevant problems, but how this survey/mapping helps to solve them? Does this drives in some way the research questions?
Next paragraph is about the quality of Linked Data based on an overall LOD study. It is again unclear how this relates to publishing methodologies and this particular study. Is it that we want to know if current methodologies leave out that step?
Finally, in terms of suitability as introductory text, I'm missing a sentence establishing for whom this survey is targeted. What will this target reader learn about the topic? How it is going to be useful for them?
Some of these questions find an answer at a later section, but for me they should clear from the introduction from a clarity perspective.
Background: Overall fair, it is clear that OGD creates data dis-integration and Linked Data seems to be a good way to tackle its integration. Something that could improve is a precise definition of what the authors consider a "publishing methodology".
Related works: Other systematic reviews are mentioned. At the end of this section there is an argument about the autonomy of public bodies to process and publish under their own norms. It looks like this was included to support the case for LOGD methodology, but publisher's autonomy also occurs on a general Linked Open Data setting, government is not a special case for this. It is in this section that we learn that one of the contributions is a general model based on the systematic mapping. This is relevant research, but is missing from the introduction, and missing how it helps the target audience of a survey paper.
Methodology: Standard systematic mapping, no problem here. I do have a remark on one of the exclusion criteria: "The study focuses on the aplication of LD in a specific domain". It is mentioned in the introduction that "adopters claim (LOGD methodologies) are too generic for use". One solution to this would be to develop domain-specific methodologies, aren't we missing relevant papers with that exclusion criterion? Smart Cities data (or at least a good subset of it) is managed by public sector and they seem to be excluded. I don't understand why genericity is highlighted as a problem in the introduction, but then some domain-specific works within a government context are excluded.
It is in this section that we get an answer to one of my questions on the introduction: Why quality is particularly highlighted.
Results:
The mapping work leading to the matrix on Figure 3 is very useful, however, the summary of each step lacks detail on how the descriptions of each step by each different paper were unified, or how they differ from each other. Most descriptions seems to cite only one of the selected papers, in some cases none. A particularly confusing example of the latter is the RDF cleaning step, that is "sometimes regarded as separate step" when according to Fig. 3 only one paper does. we are also unsure how many of the 24 papers that mention RDF conversion as a step do include cleaning as a subtask and what they mean by cleaning. Without this, we can't really answer if the steps are really in common or not.
RQ2 on tools does have a more comprehensive answer and a comparison of what tools are mentioned by what paper. My only remark in this subsection is that it is claimed that most tools are discontinued. It would've been good to know how many of them are discontinued, and when they were discontinued.
For RQ3, what is presented is fair. It seems that no methodology has been evaluated as a case study, and there is no evaluation with end-users (though this latter point should be made explicit)
For RQ4, a problem I find is how the terms "step" and "task" are used. There is already some confusion from previous subsections, but it is here where we feel it the most. At the beginning of this section, it is said that "studies divided the tasks of publishing into phases, and in turn, in more atomic steps with clear outputs", hinting that step is a part of a task. The Fig 3. description on the text talks about "explicit tasks identified", but the caption and column name says "step". Some descriptions in the answer to RQ1 are mentioned as "steps", and other as tasks (linking is referred as a set of tasks, hinting at tasks as part of steps, the opposite from the beginning of the section, cleaning is also referred as a task). In the RQ4 subsection, things are explained in terms of phases and tasks. Assuming tasks are the finest granularity, I'm missing a structured account of what are those quality control tasks that were found.
Unified publishing model:
This is per se a valid contribution, but I'm not quite sure that is consistent with the purpose of a survey paper in this journal. Furthermore,if this is seen as new "methodology" or as a "roadmap for LOGD initiatives", it has exactly the same problems as the works described in the mapping: It's quite generic (and we were told adopters don't like that), and it has not been evaluated beyond a logical argument. In this section it is said that "It can be used as a roadmap for LOGD initiatives and resource initiatives", hinting that this is targeted at practitioners. It is also said that managers may decide "the level of formalism according to their context". If this is the case, what is the minimum level of formalism? How mandatory/optional steps were decided? based on how they are labeled in the literature or following your won logical argument? There was a spotlight on quality tasks, a "validation step" is added after each phase, but this is not reflected on Fig. 5.
Discussion: In general I agree with it, however, there is still some unclear points about the proposed unified model. For example, "Some steps may be too expensive... in order to be implemented, a lean model is required. However, in our model we provide steps that should be considered in a formal initiative". I don't see the connection between step cost and the proposed model providing mandaotry steps, especially as there is no estimation of the cost of the proposed steps.
Research directions: Mostly agree, especially with the part on longitudinal studies.
Conclusion: It is said "deriving a unified methodology", again, inconsistent with the goal of a survey paper.
Overall my assessment is:
Pros:
* Methodologically correct sytematic mapping (caveat on one exclusion criterion)
*Valuable contribution on highlighting the fact that methodologies have not yet been properly evaluated.
*Valuable contribution on tools proposed for LOGD.
That covers the "How comprehensive and how balanced is the presentation and coverage" criterion. I also think is important for the Semantic Web community, with the caveat that it is not sufficiently motivated why we need to consider publishing methodologies on LOGD instead of only LOD.
Cons:
* Unclear suitability as introductory text, it is not explicit for whom this survey is.
* Interchange of step and task concepts hurts clarity.
* Goals of the paper vary across sections, intro starts with systematic mapping, conclusion talks about an "unified methodology"
* Answer to RQ1 not satisfactory due to lack of detail on how different definitions of each paper were harmonised. Alternatively, alack of an account of the different definitions.
Recommendation: Major Revision
|