Contextual Information Retrieval in Research Articles: Semantic Publishing Tools for the Research Community
Submission in response to http://www.semantic-web-journal.net/blog/special-issue-new-models-semant...
Revised manuscript after an accept with minor revisions - now accepted for publication.
Review 1 by Sudeshna Das
The authors present a sentence citation context ontology, supervised learning approaches to extract citation contexts and a Linked Data application for articles from the European Semantic Web Conference (ESWC) series. They define their own set of citation contexts stating that existing ones were hard to reuse/adapt, which is ok. The new Sentence Context Ontology is a positive contribution and the machine learning approaches and Linked Data application are methodologically sound as well. The only issue I have is that on the whole there is no compelling use case for understanding the context of citation sentences.
Other minor issues that need to be corrected:
* typo on page 2 "While  report s on model-ing the contexts of sentences in related work sections of research articles and supervised learning experi-ments for context identification"
* page 8 - 4.2.9 is not following 4.2.8
* page 34 - 8.1.3 is mixed up
* page 35 - "View Contexts of citation sentences" is mingled with the Discussion section (seems like this occurs everytime there is a figure or table spanning the page)
* Figure 9-18 are so blurry that we don't get a good sense of the Linked data application.
Review 2 by Tim Clark
The authors present an interesting and potentially useful approach for classifying and organizing the contexts of scientific citations. Noting that it "is becoming increasingly difficult to keep abreast of research developments in one's field" due to a dramatic increase in output of published research, the authors suggest that their method of textual analysis and organization can provide "value added information services for the research community.
This is a sound motivation for the work reported upon. I have no issue with the computational methods used and believe the authors are correct to attempt to build an application for scientific readers based upon their methods. The authors used the CORESE semantic web engine and it's associated SEWESE server at INRIA (Durville & Gandon 2007) to build an application which is able to query a SPARQL endpoint containing their results and present them in a fairly reasonable looking way to scientific users via Simile's Exhibit interface (Huynh et al. 2007, Karger et al. 2009, etc.).
This would be highly publishable and re-usable result if the following were provided as well:
- URL to the application (we cannot see anything except screen shots provided by the authors) and some instructions on how to use it;
- URL to acquire, inspect, execute and potentially re-use the source code, with licensing terms - preferably open source;
- URL to the SPARQL endpoint where their results are available.
In addition, to validate the authors' claim that their application could provide improved approaches to "strategic reading" across large sets of scientific publications, it would be useful to see the
- statement of the general use cases, specifically, they were trying to build for; and
- some trial of the application with real scientific users other than the authors themselves; with
- an assessment of the outcome, plusses and minuses, and conclusions.
Unfortunately these are missing. If these were provided I would certainly recommend publication, but as it is I cannot.
Some additional points to consider:
- the work is not as well-referenced as it ought to be.
For example, the references I pointed out above for SEWESE and for Exhibit, are in published conference proceedings. The authors reference, in their place, web pages for informal presentations and the like. This type of issue ought to be cleaned up when and if the authors re-submit a modified version, which I would recommend they do.