Injecting semantic annotations into (geospatial) Web service descriptions
Review of the final round of revision by Jacek Kopecky. Reviews of earlier rounds are further below.
The paper says it talks about the sapience API but it actually doesn't (as far as I could see); that promise should be removed.
The paper could use some cleaning, after the rounds of review and incremental changes, many paragraphs and sections now jump between unrelated points, and have unwanted forward references, so it's hard to follow the points made by the paper.
Some terminology could still be fixed: perhaps say "path expressions" instead of XPath expressions (the first mention would note the similarity to xpath); and perhaps clarify earlier that you expect the client (in your terminology, apparently meaning a system that uses some services with its own version of their semantic annotations) to provide the semantic annotations of the services it uses in its application(s), and the proxy just facades the service and maintains the annotations over service description changes.
This is a revised manuscript following an accept with minor revisions. The reviews immediately below are for the previous round. Reviews of earlier rounds are further below.
Review 1 by Tudor Groza:
Accept as is.
Review 2 by Jacek Kopecky:
I'm still not happy with the how the paper presents its contributions. Unless I'm missing something, the authors claim (in the paper and review rebuttal comments) two points of value:
1) the proxy maintains the metadata through (some) changes of the underlying service descriptions,
2) the proxy redirects to the service so the client only needs to know one location (the proxy).
On the first point, the paper still does not contain a useful discussion of the types of changes that the system will handle. The only sentence that could remotely be seen as useful is "Changing the metadata does not affect the injection procedure, as long as no elements are removed which are used within the extracted XPath expressions." There should be examples of what changes will work and what not, with focus on defining the line between the two. Such guidance in the paper will help service providers make their service description changes friendlier to the proxy, and will help clients manage their expectations. Then potential users of the proxy will be able to evaluate whether the proxy would have benefit with their pattern of usage and description changes.
The second point is not very well demonstrated in the paper. On the one hand, the term "proxy" would indicate that the system will facade the service, but the authors say that it "redirects", meaning the client must interpret HTTP redirects and send the request to the service itself. In this behavior, the system is less of a proxy and more of a registry because it really only returns requests for service descriptions. On the other hand, even if the above was not an issue, the paper doesn't describe any limitations to the redirection of what would be invocation requests to the service. A few problems come to mind immediately: i) HTTP only allows automatic redirection for GET and HEAD requests (for other requests, the user should be involved); so the "proxying" behavior would only work for the method GET. ii) if the invocation request uses GET and consists of a single "sid" parameter, the proxy would interepret it as a request for metadata, not an invocation request, and not do any redirection; so the proxied service cannot use a parameter called "sid". iii) why would the client even send invocation requests to the proxy? Assuming the client gets the service description from the proxy, the endpoint information in the service description will be unchanged and still point to the original service's endpoint, right?
The authors put a lot of associated functionality out of scope of the paper, especially the discovery of the available annotations for known services. Without such functionality, however, the system looks incomplete because the only remaining difference to a static document repository with annotated service description copies would be the point 1) above, the adaptation of annotations to changed descriptions, which is not described in the paper to any extent that could let me consider this as a significant contribution worthy of a journal paper publication.
I'm not saying that you don't have journal paper material in your work, just that the manuscript doesn't show it.
Btw, sorry, at the first resubmission I didn't know about the point-by-point rebuttal comments on the journal server (different from the review-submission server), hence the request in my second review.
reference 30 is mangled (surely, the title is not "49"); reference 31 needs to be capitalized properly.
The manuscript is a revised submission. Below the reviews for the revision, followed by the reviews for the original submission.
Review 1 by Marinos Kavouras:
Although the paper has improved a lot, the semantic aspect per se is still not clearly presented. Nevertheless, since the character of the paper is fully technical, being a report on a system, it can be considered acceptable taking also into account the general readership of the journal.
Review 2 by Tudor Groza:
The paper has been indeed improved by the authors according to the received reviews. Most of the critical comments have beed addressed. A couple of small remaining remarks:
* the part about the manual annotation of Web services is still not clear enough, i.e., who does it and when (or why). The reason I keep insisting on it is because, generally, the process of manual annotation always brings along the issue of incentive.
* the small experimental evaluation currently contained in the paper brings a certain plus, however it could profit from a more detailed description, i.e., what to the axes represent, what were the factors considered during the experiments, etc.
Review 3 by Jacek Kopecky:
The paper presents a proxy-based storage and retrieval system for semantic annotations in Web service descriptions. It seems to be a workable system, and the current revision of the article is much improved - especially it presents more useful concrete information than the previous I've reviewed - but the article overall still isn't very convincing.
The system description says that "the client application has to specify which annotations are to be injected"; and "depending on the client request, different annotations may be added to the metadata", which seems to promise something like the following features: 1) a client can specify multiple annotations to be combined, 2) a client can specify the ontologies that it is compatible with, and the proxy will inject the appropriate annotations, 3) the client may search for available annotations based on the original service metadata URI. These would be interesting, but the system does not seem to support any of them. As it is, the system is very simple and the only actual interesting contribution is that it can maintain the semantic annotations even through some (weakly specified, but at least discussed a bit in this revision) changes to the underlying metadata. That isn't much.
And quick testing based on Figure 2 and the returned data indicates that the system doesn't work for WSDL services. (see below) The authors should at least test the uses they want to publish.
Finally, the next update should be accompanied with a brief but point-by-point statement by the authors about how they handled the review comments.
Detailed comments follow:
The introduction says how "clients are not aware of [the proxy's] existence: the proxy takes the identity of the proxied service", which could be a nice feature except it's not explained anywhere; instead the system clearly changes the URIs of the service descriptions (or "the web service location" - whatever exactly that means - as said in Section 3).
In Section 4.1, change "Url (7) in Figure 2" to say Url (5).
Clarify the sentence "The nature of the reference (attribute, sibling, or child) is concatenated, and used to identify the location in the original metadata document." - the resulting XPath expressions in Figure 3 are somewhat ambiguous; in effect, the system changes here the meaning of XPath, so it should say how. For example, the normal interpretation of XPath (2) is "all attributes of an xsd:simpleType element that has a sawsdl:modelReference attribute". And how does the system interpret the "~25437290" part at the end of (4), which doesn't fit the syntax of XPath? (The article says a bit about this adding uniqueness to the location, but more should be said about its handling).
Why is there a redirect to the service if request parameters do not match? What kind of scenario is this redirect designed to support? Is this intended for proxying actual interaction with Web services? If so, what are the limitations here? (I can think of two immediately - it only handles invocation through GET, and it only allows parameters other than "sid".)
In Figure 2, URI (2) doesn't work; in fact, none of the WSDL-based descriptions seem to work due to a bad (OGC-based?) XPath in the annotations. This goes directly against the article's statement that the system has been thoroughly tested.
In Figure 3, XPath (4) has a mismatched ' vs. ' and ends with an extra " - is it a real example with editing errors, or is it a made-up example?
Figure 5 needs a legend - for the vertical axis, the text says "average response times" - is that in seconds? for the horizontal axis, short names or IDs for the services would be useful (even service 1 ... service 4).
The measurements seem to indicate a speed-up for retrieving the description of a SAWSDL-Testsuite service through SAPR in contrast to direct retrieval; this is explained as "probably due to a more efficient implementation" - implementation of what? How can a proxy that has to access the original source of data perform quicker than a direct access to the original source of data? The only reason I could think of is a better network routing when it goes through the proxy server than when it goes "directly".
As before, the paper says the proxy is available as "free software" so it should include a link to download.
On Page 9, the use of the acronym SDI (end of first column) precedes its expansion (beginning of second column).
In Conclusions, "the client does have to manage the two separate sources" - I suspect you wanted to say "does not".
In references 11, 30 and 31, expand on the venue of the publications - 11 doesn't have anything, 30 and 31 only have an address.
Below the reviews for the original submission.
Review 1 by Marinos Kavouras
The paper presents an approach, implemented as a web service, for semantically annotating descriptions of geodata. I understand that the focus is at the explication level, attempting to account for this semantic enrichment while maintaining the standards followed. Separating such annotations from original metadata is a right way to go. I was not able to test the implemented service described in section 4; therefore I am not in a position to verify how well it works. It seems however that it is not something extremely difficult to worry about. The literature is well presented, and so is the paper. It is a paper focusing on technical details and tools, and not on new ways of achieving a high level of semantic communication. As such, it suits the purpose of the journal, several will find interest in it, and I recommend acceptance. Also, the language of the paper is acceptable. There are few typos here and there (I noticed at least 2-4) which a careful reading will pin down.
Review 2 by Tudor Groza
The paper reports on the SAPR (Semantic Annotations Proxy) system that enables dynamic injection of semantic annotations into Web Services (WS) descriptions. The system was deployed and tested in the context of Geospatial Web Service descriptions.
Being a report on a system, the paper should have a clear emphasis on four aspects: detailed technical description, maturity, importance and impact of the system. On the positive side, one can quickly grasp the importance and great _prospective_ impact of the system (due to the very good application scenarios), that basically bridges the gap between legacy Web Services and today's semantically-aware clients without touching the original WS (similar to the legacy databases case). Also, the system is online, it is fully functional and free to use. On the negative side, the technical description is underspecified, and the system raises serious questions at the maturity and _actual_ impact categories, as the evaluation section simply lists that different forms of evaluation have been performed, but not the actual results, while looking at the services currently registered in SAPR, one can find a total of 7 entries, of which two are duplicated and one appears three times (and it is the toy example presented in the paper). This clearly shows that the system is, in fact, not in use, and it probably represents, at least for now, a proof-of-concept.
* Considering, again, the type of paper (i.e., system report), the first half of the introduction is probably dispensable. There is really no need to go into so much detail w.r.t. the research context, when the first and second paragraphs on page two (second column) explain nicely the problem and the motivation behind the system. You could reallocate the space to a more detailed technical description.
* The application scenarios are in general excellent. One possible flaw may exist in the argumentation w.r.t. the usefulness of the data, as exemplified in the last paragraph on column two, page 3. "Without semantic annotations potential clients have no means to find out what these attributes represent". However, in the current setting (without the existence of a concept-based WS search application) the clients would actually use the semantic annotations in exactly the same fashion the GIS applications use the data, i.e., direct hard-coded interpretation and use, with no discovery, or perhaps for disambiguation purposes (in reality, this part is also problematic - see comment below) .
* The separation of concerns description is also great, but only from an analysis perspective, as it is missing the pragmatic side. The content discusses the potential benefits of having separation of concerns but does not mention clearly what is already there (i.e., implemented and usable). Reading the following section sheds light onto this question, and unfortunately reveals a very static / rigid system with respect to this aspect. The ad-hoc injection procedure discussed in the last paragraph on column two, page 5, would profit from a sequence or workflow diagram, showing different possible cases.
* The actual technical description of SAPR seems to be underspecified (it takes only 10% of the entire paper). Some key missing elements are:
-- a more detailed description of the manual annotation and registration process (here it would be really nice to include an annotation and XPath-based binding example - maybe directly the one present at http://semantic-proxy.appspot.com/api/list/references?sid=921e1da1);
-- a richer sequence / workflow diagram for the presented example, showing the calls that happen in the process (this part is well-enough described in text, but the figure seems a bit simplistic)
-- a discussion on what happens if multiple conflicting annotations are uploaded for the same WS (where conflicting annotations are annotations covering the same elements but pointing to completely different concepts)
-- a richer online example, as the toy example taken from the test suit has a single annotation, and it is on an operation rather than on an attribute (as one would have expected).
* The evaluation, as already mentioned, is superficial. It contains no actual data or results that show that a real evaluation was performed. The least that could have been mentioned, in terms of numbers, is how many clients/services/organizations use the system (or plan to use it). Also, although the authors claim that the overhead imposed by the injection process is negligible, it would still be nice to have some graphics showing how does this overhead evolve with the size of the WS description and / or the number and complexity of the existing annotations.
* Finally, a section on future development plans is missing.
* A small remark w.r.t. references, listing 4 - 7 citations in one block is a bit too much, and should be avoided, especially since the authors do not go into details about the individual work presented in those references.
Review 3 by Jacek Kopecky
The paper describes a service for storage and retrieval of semantic annotations for third-party service descriptions.
From the description in the manuscript, it seems like a workable system, however, there are major gaps and problems in the description due to which it is hard to judge the quality, importance and potential impact of the described service. The manuscript must undergo a substantial update before being resubmitted for new review.
The following are the substantial problems, further down I've included comments on lesser details and some wording suggestions.
1) Most importantly, the paper does not describe how the solution is better than an approach where a description is locally copied and annotated with semantics.
Firstly, how does the system process uploaded annotations? It seems that it only allows upload of a complete annotated file, together with a URI for the original, and it appears that the system does some kind of difference comparison of the two files, and store only the annotations. How is the difference computed? Section 4 seems to mention a line-by-line approach which may easily break on some harmless XML changes (such as namespaces, formatting, element reordering, character set recoding etc.), all of which can be done when the domain expert adds annotations.
Secondly, after annotations are uploaded, can the system deal with any changes to the original descriptions? Same harmless XML changes as above can happen, but additionally the description can evolve in backwards-compatible and incompatible ways. The paper should contain a discussion of the robustness of the annotations to changes in the original description.
In conclusions, the manuscript says "dynamic injection relies on a reproducible way to identify the annotation's location" - this should not be in conclusions but very well detailed in the system description!
Thirdly, the new annotated description gets a new URI. Is there any functionality for clients to discover annotations based on original service URIs, other than by searching in the list of all services? Also, since the list of all services is not proper hypertext (because the service ID must be composed into a service retrieval URI), search engine crawlers cannot discover any of the annotated descriptions. It seems that RDF might be a much better match than JSON for the list of all services in the proxy.
Fourth, the paper doesn't really show how the system allows a "client application to specify what annotations are to be injected", or how different annotations can be combined if at all (for clients that specify multiple sets). In fact, the URI (2) in Figure 2 returns an error, so it is not clear that a client can retrieve the annotations at all.
All these points make the system, as described in the manuscript which is what I am judging, actually worse than a simple document repository that would host annotated copies of the original service descriptions.
2) Any kind of evaluation of the system is missing - only the last paragraph of the evaluation section states that evaluation was done, but doesn't describe it. In a Tools and Systems paper, the evaluation should at least talk about speed (how much processing time does the proxy processing add?), size (in what form are the annotations stored? Are there indices for quick search?), and the possibility of distributing the system, should the load become too large for a single machine.
3) According to the paper, there are well-established standards for spatial data, but none for its semantics, correct? One of the goals of the system is to separate potentially conflicting annotations from different sources. This needs to be better motivated because conflicts can also be viewed as very valuable: if two domain experts use the same semantic model to annotate the same service in conflicting ways, either the service or the model is ambiguous; in either case, identifying and resolving the ambiguity can lead to improvements in the quality of the descriptions. In general, on the semantic web, different sets of annotations should be able to coexist and be ignored by clients if unknown.
On a related note, in the system, "information about the uncertainty can simply injected during runtime, and only compatible clients can request this data if needed" - but in semantic systems, the compatibility can often be established when the client sees the data (and possibly discovers ontology mappings that help the client understand the data). In any case, how does the client indicate to your system the types of semantics with which it is compatible?
4) One of the major assumptions of the paper is that "delegation of the annotation from the data provider to the domain experts" is desirable. However, what are the incentives for domain experts to annotate services? If the domain expert is tied to the service provider, they don't need a proxy solution for providing semantic annotations; if the domain expert is tied to the client, it doesn't need a proxy either; if the domain expert is a third party, why would they annotate the service? I've simplified the situation here; it should be discussed in the manuscript, to demonstrate that the system is actually valuable and desired.
Details and wording suggestions:
- The first citation of  is unnecessary, it can probably simply be dropped.
- End of 2nd paragraph in the introduction: logics ensures better precision, not recall (cf beginning of section 3).
- The introduction links feature type 42001 to class Street, a "part of a globally shared ontology accessible on the Web" - where is this ontology? How well-established and standardized is it?
- In "semantic annotation techniques exist for ... media formats (e.g. photos)" I'd suggest to add a mention of EXIF or some such, otherwise the parentheses in that sentence do not match (they contain different kinds of examples).
- SA-WSDL is SAWSDL (no dash), reference  is inappropriate for SAWSDL, it should be .
- The paper should talk more about how "this focus [of SAWSDL] on WSDL does unfortunately impair its applicability for some scenarios" - where is its applicability impaired and how?
- In the introduction, reference  does not seem very appropriate for saying your system is a conceptually simple proxy-based solution.
- I'd suggest that the paper should discuss the relation of the presented Web service to HTTP proxies, because the authors call the service a proxy-based solution. This would help readers who are led by the title to expect an HTTP proxy.
- In section 2.1: "This distinction has been proven useful to capture the sometimes complex functional dependencies in between attributes of data models" - how and where has it been proven? How is this relevant to this paper? The proxy system doesn't seem to "distinguish between local and global semantic annotations" in any way!
- Do you have any real use cases for section 2.1? Any globally shared ontologies? (not the "(made-up) domain concept GeologicEra")
- In figure 2, URI 2 is broken and the results of URI 4 are quite opaque; the paper should discuss the formats.
- Around the data quality annotations: annotations for describing data quality are often property of actual data - different data from the same service may have different quality - e.g. mapping may be very precise in some areas but rather sketchy elsewhere. The paper could list examples of quality annotations that are understood to be global to the whole service.
- In "each measurement inhibits some sort of error" - change "inhibits" to "is done with" or something; the word inhibit means something else.
- Section 3.2: SAWSDL is not only about "instance identification", especially for WSDL elements. It is intentionally unconstrained in the semantics of the link annotations.
- Figure 3 needs a legend.
- In section 4, what are "the original parameters" with which a service identifier can be coupled? How can it be so coupled?
- Section 5 says the proxy is free software - where is the source available for download?
- By page 8, the acronym SDI (used only once in the introduction) is forgotten.
- Is there no related work on injecting of annotations? This would be the kind of related work that this paper should discuss.
- In the Conclusion section: the long-term vision of the SemWeb does *not* assume a complete shift towards semantic-enabled web resources - it expects a coexistence of many kinds of resources on the web, some of which would be annotated semantically, and some with direct semantic representations. For example, representing image bitmap data semantically would be virtually useless; so semantic annotations are here to stay, in one form or another.
- Conclusions should not repeat references; in fact, it often should not contain any references at all.
- References: take care to have the proper case (e.g. , not owl-dl reasoner but OWL-DL Reasoner);  doesn't have enough information to find and identify the document being cited; there are encoding problems, e.g.  has uppercase long umlaut U where it doesn't belong; there are issues with formatting: e.g.  says "International Conference on Semantic Computing 0" and  says "International Symposium on 0".  has duplicated authors.