An information model for managing resources and their metadata
Solicited review by Sven Schade:
While the last revision could resolve all critiques in terms of content, this second round of modification largely succeeds in presenting the valuable findings. The authors again took a major effort in addressing all reviewers' comments. For me, only three minor revisions prevent this article form being published:
1) The first sentence of the abstract is not precise enough ('...where the same problems tend to occur during the development phase'). In fact, the first three sentences describe the issue at hand in a too complicated fashion. They should be revised in a sense that clarifies (1) the context - information management in and between web applications, (ii) the problem - development of applications, which are capable of exchanging data and meta data, and (iii) the desired solution - a reusable and sufï¬ciently generic information model.
2) The heading of section 1.1 might be simplified, e.g. to 'Open Issues' or 'Problem Statement'.
3) The last paragraph on the left column of the second page ('Parts of the presented solution in this article rely on the use of RDF and the concept of named graphs...') breaks the flow of the text, because it mentions the proposed solution too early. I would keep the statement on a more generic level, e.g. 'The concept of named graphs, and particularly the use of the Resource Description Framework (RDF) have already been suggested as a partial solution'.
Once the comments above have been addressed, I would recommend to finally accept this article for publication.
Solicited review by Tudor Groza:
The authors have addressed all my comments.
Revised manuscript after "accept with minor revisions". Reviews of the initial submission are below.
Solicited review by Sven Schade:
The initial comments of the three reviewers were sufficiently addressed in most respects. As a direct result, the newly submitted version provides the required improvements to consider this article for publication. In fact, I highly appreciate the authors"™ hard work on the topic in general and also in improving the presentation in a scientific article. The results provide valuable results for the Semantic Web community and I hope to see this work being continued and also taken up by others.
However, although I appreciate the intense revision as such, I see several needs for improvement, mainly related to the presentation style. It has already been noted on the previous manuscript that the authors tend to write in a reporting style. Unfortunately, this as in parts still carried over into the revision. More detailed comments for improving this aspect and several minor items are given below.
- The abstract already begins in a reporting style and as a whole is way too detailed. Many of the statements would fit better into the introduction, such that that part of the text becomes self-contained. I suggest to completely re-writing the abstract in order to present a brief but complete overview of the complete contribution, including conclusions and future work.
- As already indicated above, the introduction should be self-contained, i.e. especially contain the brief context, motivation and problem statement, before indicating the intended solution and the structure of the remaining article. Parts of the current abstract should be used to further elaborate on these aspects in section 1. Special care should be taken in respect to the use of English language, for example the second and third sentence in the new version show an eminent decline compared to the initial manuscript.
- Section 1.1 should be moved to the background section in order to increase readability of the overall article. The style might be changes to a more narrative description, as the current text reads almost like a glossary.
- Section 1.2 should become more prominent. It describes the real problems as requested above. Accordingly, this might be moved up and converted to bullet points.
- The last paragraph and listing of the current section 1.2 reflect partially the previously requested indication of the overall approach. This should be better integrated into the overall new revised section 1.
- Section 3 should begin with a few "˜bridging"™ sentences, which take the reader from the given background to the following introduction of ReM3.
- The term "˜publication"™ might in general be replaced with "˜article"™ or "˜paper"™.
- Section 1, just after reference 7, "˜for example"™ should be deleted.
- Section 1 should complement the goals of Organic.Edunet with those of Ariadne.
- Section 1, text about article organization should be revised.
- The first mentioning of "˜Semantic Web"™ should be equipped with a reference.
- The above also holds for Dublin Core.
- The formatting is "˜broken"™ in the paragraphs just below Figure 5.
- Section 6.3 again begins in a report kind of style. "˜The authors participated in"¦"™ should be changes to something like "˜Within the scope of the "Hack4europe!""¦ we "¦"™.
- Section 9 should be called "˜Future Work and Next Steps"™.
Solicited review by Tudor Groza:
The authors have definitely improved the manuscript and have addressed well all the weak points mentioned in the previous review. While from a technical perspective the manuscript is on a good track, unfortunately, the presentation requires some additional work (as per the comments below). The major issues with respect to it are: a fairly incoherent introduction, the lack of examples in the model description and the inclusion of a series of background descriptions, which could have been easily left out as they represent common knowledge for the readers of this journal. Finally, a couple of aspects could be improved in the presentation and discussion of the evaluation.
* The introduction has been re-written, however, it is now more confusing than previously. It starts directly by discussing the contribution of the paper without any sort of background, context or motivation. The fact that it has been split into corresponding subsections that deal with some of these aspects (e.g., the motivation and goals of the framework) is perfectly fine, however, these need to follow a coherent argumentation thread, which is, in this case, absent. The very first sentence of the introduction raises confusion: "The focus of this paper […] to solve these kind of problems." - what kind of problems? It's true that the abstract describes these problems, nevertheless, the introduction should be coherent on its own, without reading the abstract.
* Other aspects that could be improved in the introduction are: (i) a justification of why is this approach better than other existing approaches; (ii) moving the terminology subsection at the end of the introduction; and (iii) moving the description of the structure of the paper after the terminology, at the very end of the section.
* The state of the art section still reads more like a background, and some parts could be left out, e.g., not sure if 2.4 and 2.5 are really needed. Also, access control and RDF is now mentioned, but just as 'note' and without analysing to what extent the current approach complements this state of the art. Finally, access control is also discussed (later in the paper) in the context of SPARQL. Hence, an overview and discussion on this aspect should have been included in the related work, especially since it's a fairly hot topic.
* The inclusion of the RDF-based formalisation is a great addition to the paper, however, the entire section 3 could be probably improved (w.r.t. readability) by adding some concrete examples, especially in 3.1.
* In 4.3: the authors have clearly adopted the easiest solution w.r.t, access control and SPARQL. However, this does raise the question of whether this solution is enough, and what happens if one does have the necessary credentials to access certain data, but it cannot, because this scenario is not supported.
* The authors have done a good job with the evaluation, especially w.r.t. the structured interviews and the discussion about the framework in the light of these interviews. The survey presentation (Sec. 7.2.1) could be improved by adding a bit of structure to it and including some concrete questions asked as part of the survey (if not the entire list - in an appendix). The scalability evaluation, on the other hand, raises a series of questions. For example, in a real-world deployment (which has happened according to the paper) are those 20 concurrent connections enough? Testing the system using synthetic 'data' is great as it allows one to push the boundaries of the system's scalability. However, to complement this, it would also be interesting to analyse the real usage data. Finally, how is this scalability affected by the backend repository? The authors do mention that the evaluation has been performed only with one backend, however, it could have been interesting to already delimit the latency induced by the actual system in the context of the entire response time.
Revised manuscript (as general full paper submission) after a "reject and resubmit". Previously submitted under the title "An Information Model for the Annotation of Resources with Heterogeneous Metadata" in response to http://www.semantic-web-journal.net/blog/special-issue-linked-data-scien...
Solicited review by Sven Schade:
The paper submission to this special issue on Linked Data for Science and Education presents an approach for annotating web resources with different forms of metadata and outlines a few show cases from the education domain. It focuses on the used information model and a related application for resource and metadata management.
In general, the contribution does not match well with the topics of the special issue. "˜Science and Education"™ covers parts of the example application, but is not emphasized in the problem statement, state of the art and conclusions sections. Illustrative examples are introduced late in the paper, which makes the first parts hard to read and understand. However, the presented work has high potential and it would be valuable to include an improved (and more focused) version in the special issue. I therefore suggest a re-submission after major improvements. According detailed comments are listed below.
- The problem statement is not illustrative enough, e.g. why is the replacement of harvesting problems with linked data an issue at al? This whole section would clearly benefit from an example (from the science and education domain). Here, the example should show the issue. Later, the same example should be re-visited showing how the ReM3 approach addresses the central issues/problems. Some of the required information is available late in the paper, in the "˜showcases"™ section.
- A more detailed problem description would improve the overall value of the contribution. This should include a paragraph about the particular relevance in the context of this special issue.
- The role of web service technology for solving the particular problem of the required information model remains unclear.
- Figure 1 is a central element of the paper. In its current form, it is confusing. Cardinalities might help. The difference between Entity Information, Local and External Metadata is unclear. The same holds for the difference between Entity and Resource. It might be useful to reduce the overall number of expert terminology.
- Below Figure 1, it remains unclear why all the types of types (kind of types, i.e. meta-types) are required.
- The following text about named graphs etc is very theoretic and would benefit from an illustrative example.
- The "˜Representation type"™ is introduced too late.
- From "˜ReM3 "" An Information Model"¦"™ onwards, the paper becomes easier to read. Yet, the desire of using Web Technologies should be clarified early in the paper (see also comment above).
- Under "˜additional interfaces"™ it is mentioned that SCAM supports harvesting, while in the problem section suggest to replace harvesting. These two statements seem to contradict and should be clarified.
- The "˜scalability"™ section addresses some fields which are relevant for this special issue. These might be exploited.
- The Confolio application and the related section are certainly interesting, but many of the presented information does not add value to the paper, it rather shifts the focus. This section might be shortened.
- The relation between the section "˜presenting and editing metadata"™ to ReM3 is unclear, especially because the types in the table do not seem to match to any ReM3 element.
- The conclusions should directly refer to the problems mentioned at the beginning. Is should be outlined how each of these problems has been addressed by the work presented in the paper.
- The paragraph about the structure of the paper should be moved up (from the problem section to the introduction.
- All headings should be followed by a text block. This should for example include "˜State of the Art"™ and "˜ReM3 "" An Information Model"¦"™.
- A definition of the "˜resource"™ concept would be helpful.
- Why are "things" mentioned on page 4 and not resources?
- The statement that Linked Data extends the Semantic Web can be questioned. In fact Linked Data operates "˜below"™ the semantic layer.
- The State of Art should at least include some sentences and pointers to the most common, frequently used, metadata models, including DC, LOM etc., which are mentioned later in the paper.
- SCAM version 4 should be briefly introduced,
- A clear reference to Confolio is missing.
- The sentence starting with "˜The web API of SCAM"™ does not need an extra paragraph. It can directly follow the text above.
- Acronyms are not used consistently throughout the text.
Solicited review by Tudor Groza:
The paper presents an information model for capturing a comprehensive set of provenance metadata for Web resources, with an accent on the integration of heterogeneous metadata exposed by different sources. It also discusses a reference implementation and several application use cases.
* the supporting service has a reference implementation (although the model's reference implementation is not described)
* the showcases presented in the paper, and also published online, are impressive
* the presentation of the paper is quite weak. For example, the introduction lacks a clear description of the context and of the problem that the authors try to solve, although they do depict clearly the problem in Sect. 2. Also, there are several places were the presentation could be improved by providing concrete examples.
* the paper presents the information model only at a high level and leaves the formalisation unspecified. There are no details (or links to external resources) about the actual grounding of the model into, for example, an ontology or vocabulary. Concrete examples of how can the grounded model be used or how do possible SPARQL queries look like, are also missing.
* the related work is slightly out of context and tends to be a background description rather than an actual related work analysis. For example, what other approaches try to model access control using RDF?
* the scalability supported by the model has not been evaluated. It would have been interesting to see at least what is the amount of triples generated by the model to represent certain aspects, such as ACL, and how does it scale with the number of users / groups and resources.
* in general, the use of the term "annotation" is highly unclear in the context of this paper. The authors should provide a clear definition of what do they mean by "annotation" at the very beginning of the paper.
* as already mentioned, the introduction is not setting properly the context and the problem addressed in the paper. What are the "learning repositories" or the "traditional repositories" you refer to?
* the intrinsic entailment from the first to the second paragraph (talking about bringing existing metadata into the Linked Data Could) is highly unclear.
* the use of past tense in the first paragraph is confusing, especially due to the lack of concrete examples.
* "Using Semantic Web technologies to annotate resources with metadata […]" -> since the claim is generic (with no grounding in a particular context or domain problem), a reference to some more foundational work in (semantic) annotation would be more appropriate, instead of the two self-citations.
* the short discussion on using triples to describe resources and the missing provenance information requires some examples and / or a proper reference.
* the problem statement does shed some light onto the issues addressed by the paper, yet it could profit, again, from some concrete examples. What kind of diverse information are you referring to? What is educational metadata? Please provide clear examples for these.
* "Metadata is copied …" - this phrase is unclear. What are the metadata instances you are referring to?
* next phrase: The need to provide links between related resources is probably rooted in some requirements, which should be specified. Otherwise this claim has no support.
* the formulation of the shortcomings of Named Graphs discussed at the end of Sect. 2 is out of context. Named Graphs are a generic representation mechanism and the way in which developers make use of them is application / domain dependent. Hence, there is no need for generic guidelines on the provenance or relations between Named Graphs as such. For example, if one models Persons via NGs, s/he would need to specify that Persons are related at a conceptual level (i.e., Person_URI knows Person_URI or Person_URI sameAs Person_URI), and not that the underlying representation as NGs of the Persons are related.
* the related work section should probably be renamed to Background (or Foundational work), because it discusses the building blocks that support the solution provided by the paper and not approaches that try to solve the same problem. The section also contains some statements that are not supported by any evidence or references (e.g., "There are situations where the conceptual model cannot be cleanly mapped […]"). Finally, the authors keep hinting towards the direction of technology-enhanced learning and the associated models, but without putting this information into a proper context or discussing the relation to their problem and solution.
* Sect. 4 presents a good overview of the conceptual aspects of the model but lacks in details about a possible formalisation. For example, would an Entry Information be a class? Would Location Type be a class? What kind of relation would exist between the two? Would any ontology design pattern be useful to implement such a relation? What external / widely adopted vocabularies would you recommend for modelling some of the provenance aspects? How can the ACL elements be formalised? All these things should have been described, to give a better picture not only on the conceptual model but also on the implications of following different implementation routes.
* The scalability discussion in Sect. 5.6 doesn't really make sense without proper experiments and numbers (especially the shallow comparisons between the performances of the different instances - e.g., "hardly noticeable" or "very low").
* In the same section, as part of the discussion about the free-text querying and the use of SOLR, it would be interesting to see how were the ACL aspects implemented.
Solicited review by Paul Groth:
This paper primarily summarizes the design and implementation of an information model, ReM3, designed for the management of information in learning repositories. The paper describes some experience with its implementation in the context of two projects and then describes an application, Confolio, for managing personal and organizational profiles that makes use of ReM3.
The paper reads more as a report of what the authors have done rather than a scientific article. To become a scientific article the authors would need to provide significant added detail and explanation in 4 areas: contextualizing the work, related work, identifying contributions, and evaluation. I now describe, in more detail, the concerns I have in each of these areas.
1) Contextualizing the work
The paper fails to orient the reader within the overall domain and scope of the work. It begins by stating "Several projects with focus on exchanging metadata between learning repositories has the same problem: how would it be possible to bridge the gap between "traditional" repositories and triple stories, taking advantage of the features that Semantic Web has to offer". The paper then goes on to describe general notions around Linked Data and Semantic Web technologies. What projects do the authors refer to? What are "learning repositories"? What do they contain? Why are they useful? Why do they need to exchange metadata? What features do the authors refer to? Without answers to questions such as these it's impossible to orient the work.
In Section 2, the problem statement consists of 5 statements that read just generally as the information integration problem in general: integrating heterogenous information, exposing this information, reduplication. I don't believe the authors are aiming to solve this problem in general, thus, it would be better to have a much more focused problem statement.
2) Related Work
The section on the state of the art essentially describes current practice in developing Semantic Web applications. It does not look at what current information models are for learning repositories and why those are insufficient. It does not discuss metadata exchange standards, it fails to look in any way at the related work in provenance. Indeed, of the 19 references only 2 are to non-generic references about the area that are not self citations. For a journal paper, I would expect much more.
I would suggest the authors look at the following survey as an entry point into the provenance literature:
- Luc Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, 2(2-3):99-241, November 2010.
For more recent Semantic Web provenance literature consult the recent special issue on Provenance in the Semantic Web in the Journal of Web Semantics (Volume 9, Issue 2, Pages 83-244 (July 2011)).
Some entry points for work on learning repositories and information models are:
- Semantic Technologies for Learning and Teaching in the Web 2.0 era - A survey
Tiropanis, Thanassis and Davis, Hugh and Millard, David and Weal, Mark (2009) Semantic Technologies for Learning and Teaching in the Web 2.0 era - A survey. In: Proceedings of the WebSci'09: Society On-Line
- Permanand Mohan, Christopher Brooks, "Learning Objects on the Semantic Web," Advanced Learning Technologies, IEEE International Conference on, p. 195, Third IEEE International Conference on Advanced Learning Technologies (ICALT'03), 2003
- S. Ternier, and E. Duval, "Interoperability of Repositories: The Simple Query Interface in Ariadne," Int'l J. E-Learning, vol. 5, no. 1, 2006, pp. 161–166.
3) Identifying Contributions
The key contributions of the paper are not identified. It is hard to determine what the particular information model adds to the discussion around how to appropriately model systems. In general, I have the feeling that it looks just like the database layout of an implementation. Indeed, this feeling is buttressed by the discussion of caching metadata on pg. 4. Also, I think the authors mix information model and implementation when they discuss Named Graphs in section 4.2. The discussion of provenance metadata is rather lightweight, essentially, listing properties already found in Dublin Core.
I also wonder how this paper differs from the following paper (not cited) about the Ariadne system discussed as a major implementation of the system.
- Stefaan Ternier, Katrien Verbert, Gonzalo Parra, Bram Vandeputte, Joris Klerkx, Erik Duval, Vicente Ordonez, Xavier Ochoa, "The Ariadne Infrastructure for Managing and Storing Metadata," IEEE Internet Computing, pp. 18-25, July/August, 2009
The authors need to clearly identify the contributions of the paper above the state of the art.
The authors provide no evaluation of the information model or its systems. They discuss scalability in section 5.6 but only provide anecdotal experiences with no hard numbers or comparisons. They discuss various projects that used the system but no way to judge whether their information model made a difference. That is there is no feedback at a user experience level or at a developer performance level.
Given these four areas of concern, the paper is currently not in a position to be considered as a journal paper.
- I thought the URL design in section 5.2 was interesting maybe some lessons learned could be drawn from there.
- The acronym SCAM has poor connotations in English. I would suggest finding a better one.