Eventseer: Calls for Papers as Linked Data

Paper Title: 
Eventseer: Calls for Papers as Linked Data
Authors: 
Thomas Brox Røst, Christophe Guéret, Amund Tveit, Pablo Mendes
Abstract: 
Finding relevant publication outlets is a necessity for all academics and researchers. The Eventseer web service was originally created to simplify this task by providing access to academic calls for papers in a semi-structured and searchable format. This paper describes the work being done to make the Eventseer data available as Linked Data, thereby further increasing its accessibility and usefulness to the scientific community. Details are given about the process of extracting necessary information such as event names and dates, deadlines and associated people, topics and organizations from the call for paper texts. The resulting mapping to a Linked Data RDF format and the modeling choices made are discussed. Examples of secondary use of Eventseer data is given; these include social network analysis of academic communities, altmetrics for measuring researcher impact, and automated modeling of topic hierarchies. Finally, a set of suggested improvements and known limitations are mentioned, along with plans for further improvement of the breadth and quality of the Linked Data.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Major Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised manuscript after a reject and resubmit decision, now "accepted with major revisions". First round reviews are beneath the second round reviews.

Solicited review by Philippe Cudre-Mauroux:

The authors responded to my comments and have updated their contribution accordingly. At this stage, I still find it quite disappointing that neither a SPARQL end-point nor a dump is available [this is still the "revision" that I'd recommend]. Apart from this, the paper is now quite convincing imho.

Solicited review by Aba-Sah Dadzie:

The paper is well written and easy to understand. The snapshots of the linked data, along with the simple data model, help to illustrate the conversion of the original dataset and how this may be reused in other scenarios (with examples of use of the original dataset in the paper, and existing ontologies reused). The model/rationale followed is also described to a good degree of detail for the paper length.
The work reported has promise to be a useful resource.

However, a significant point of concern in the original paper was the lack of connections between obviously related events, including series of the same event and special issues of a journal. While the authors now provide an explanation for this it seems more of an excuse for a rush job than a justification for disregarding a key principle of linked data. At the end of the paper the authors state also that it is left to the user to make these links. Browsing the data from the URL provided, each call is presented as an isolated snapshot of RDF - is there any support provided for doing so - continuing on from someone else's work is always several factors more difficult than doing so from your own. At the moment the best guess is to wait for the RDF dump or a SPARQL endpoint.

So while I would not reject this paper I'm a bit torn as to acceptance - this is clearly work in progress and the authors indicate long-term improvements to be made. However, where these are due to limitations in the original model and/or rationale I find it difficult to see how they will be carried out, not without having to start from scratch or redo a large part of the work. Maybe an illustration of how this would be resolved would help.

OTHER POINTS

Wrt Table 1 - It would be useful to indicate what percentage of the original data set has been converted to linked data. And/or what criteria were used to decide which portions to extract - this would give a better idea of what is currently available.

The authors state researchers at "Vrije Universiteit Amsterdam have been responsible for the actual conversion to Linked Data and for making sure that best data publication practices have been followed." Either state which practices have actually been followed, or provide a reference that indicates the practices followed. Simply because the authors themselves highlight limitations in their model that contradict that statement.

While the other ontologies listed in the related work section are in fairly wide use the ABC ontology is not - a brief explanation of its relevance is necessary.
The authors also cite two CFP ontologies/vocabularies - at first glance these appear to be the most directly related to the work presented - a (brief) discussion and/or explanation of why they were not used, and/or what the Eventseer model provides over them is necessary.

Solicited review by Fabien Gandon:

my comments were addressed for the part that could be addressed in the article.

First round reviews:

Solicited review by Fabien Gandon:

The paper describes the work done to make Eventseer data available as Linked Data.

Authors aim at increasing its accessibility and usefulness of CFP as linked data.

Although the dataset is interesting, it is a bit too much "eat your own dog food" and not enough linked to the non-academic world.

The work on establishing links with external resources is in progress and links to Geonames for instance are not mentioned. In addition the alignment and linking from the schema to other existing schemas is not addressed.

Metrics and statistics on the dataset content and structure are not really detailed in the paper.
Licensing is not covered in the paper.

The paper does explain the extraction process very well, but not so much the modeling rationale and results.

A call to the URL http://redux.eventseer.net/ at the time of writing this review does not answer : " 504 Gateway Time-out"

The special case of collocated events, joints conferences, etc. is not discussed.

Authors might also want to keep an eye on the "Person Vocabulary"
http://philarcher.org/isa/person-v1.00.rdf
and
Latest version: http://www.w3.org/ns/person#
and
http://www.w3.org/TR/vocab-people/

Solicited review by Aba-Sah Dadzie:

The authors identify a key resource for academic and other research work - CfPs, and the potential for added value in converting the data to Linked Data. The paper is well written and makes for easy reading. While the work is still in its early stages, the quality of the original dataset, existing limitations in generating it, and what work is needed to manage these in the conversion are discussed and addressed (with one exception - see below).

The original data source and other relevant sources are described, with links to the latter illustrated with example of use. Usage of the existing dataset, including less obvious uses, such as in social networking, is discussed. The potential for greater reuse as a linked data set is indicated in the related work section. Curating a sub-set of the data for commercial purposes, in order to continue to fund the project, is being considered.

External and internal connectivity, and re-use of established vocabularies is discussed, in addition to added value obtained in the use of more specialised ontologies and corpora such as LODE and Semantic Web DogFood. The authors provide some detail about the process followed and the data model used in the conversion to Linked Data, and further work required to obtain added value over the current data structure. Known shortcomings of the dataset are discussed in the conclusion, due mainly to the original reason for developing eventseer. The authors propose solutions for these. However one key shortcoming - links between related events, such as conference and workshop series (mentioned also by the authors), still remains to be addressed.

DETAILED REVIEW

The authors write (p.2): "Each instance of events that occur periodically is usually categorized as a unique and independent event." This raises the obvious question "why"? Simply because this defeats the whole point of Linked Data.
This is dealt with to some extent in the conclusion, but for related concepts, not the actual events. It would be useful to forward reference to this discussion, and also discuss whether and if so, what plans are in place to handle this issue.

I would suggest using quotation marks in the title to make it easier to interpret - it isn't immediately obvious it refers to "Calls for Papers" in general, and not the call for this special issue.

==============================

Figures & Tables

Table 2 is placed in the text before Table 1, although they are referenced in the correct order. This may be a formatting issue? - since the latter spans two columns - it does however introduce unnecessary confusion when scanning to match to the text.

Table 4 - I'd suggest putting a 'date collected' in the caption - more accurate for referencing/citatione than expecting the reader to assume so based on publication date.

Fig.1 - recapture at higher resolution or use a vector image - text is fuzzy except at very high resolution on screen.

Citations & Bibliography

I'd suggest in-text "author name" citations move the citation number to just after the author name (from the end of the sentence). Simply to reduce reader load in locating the corresponding reference.

* [1] - is this Berners-Lee's online article? It should provide a "howPublished" - (available at) URL - if so, otherwise the publication it can be found in.

* [7, 12, 18] need a "howPublished" / "available at ..."

* [9] should include institution

* [19] - be consistent - this reference contains far more information/fields than all others of the same type

p.2
* Need references (a URL or e-mail address should do) for the mailing lists 'Dbworld' and 'SEWORLD'.

p.3
* Need reference for 'GeoWorldMap database' - URL or paper.

* Ditto for DBLP (URL as footnote should do). Also, saying what type of "source" it is - CS bibliographic - would be useful to especially people new to the field, as it indicates why DBLP and not some other source. Which begs the question - does the CS bent not reduce recall for non-CS events? OR does eventseer focus on only this sub-set of academic events? This IS finally discussed in the conclusion - it might be worth doing so at the start of the paper as well.

p.4

* "Jeong et al. ..." and later in the paragraph "Their analysis ..." - this publication has only ONE author

p.5

* "In general, there is currently a growing interest in alternative metrics for science that also covers research activities that are currently not taken into account by classical metrics."
I'd suggest citing Priem et al's altmetrics manifesto here, especially as the term IS used in the paper. Especially relevant as this led to two workshops on the topic discussing, among others, what the rest of the paragraph says.

J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), Altmetrics: A manifesto, (v.1.0), 26 October 2010. http://altmetrics.org/manifesto

* "The Semantic Web Conference Ontology [12] is an example of an ontology..." I'd suggest citing, in connection also with [18]:

K. Möller, T. Heath, S. Handschuh, J. Domingue. (2007) Recipes for Semantic Web Dog Food - The ESWC and ISWC Metadata Projects Proc., ISWC/ASWC 2007, 802-815

Language & Presentation

The paper, while very well written and easy to read, is a bit acronym heavy. While the more well known (NOTE WITHIN the domain, so this comes with a caveat) may be easily interpreted, they really should all be expanded at first use. Those that MUST be expanded (even within the domain are quite specialised):
- EBNF grammar
- LODE (and link to reference as cited later, on p.5)
- LATC platform (should also be referenced - URL or paper)

Abstract
"Examples of secondary use of Eventseer data is given" -> "Examples of secondary use of Eventseer data ARE given"

Keywords
"call for paper" -> "call for paperS"

p.2
"Each instance of events that occur periodically is usually ..." -> "Each instance of events that occurS periodically is usually ..."

p.4
"A fundamental assumption was that most people mentioned in Eventseer CFPs would be program committee members and that they were therefore representative for the community." -> "... and that they were therefore representative OF the community."

"In a recent paper, Das et al. describes how Eventseer ..." - > "In a recent paper, Das et al. describe[no 's'] how Eventseer ..." - it IS one paper, but "et al." means MULTIPLE authors.

p.5
"This lead to some ad-hoc modelling choices ..." -> "This LED to some ad-hoc model[l]ing [** other instances use one 'l' - neither incorrect, just need consistency] choices ..."

Solicited review by Philippe Cudre-Mauroux:

This short paper describes the Eventseer service (which automatically parses and aggregates calls for papers) and the ongoing effort to make the underlying data available as Linked Data. Overall, I found the paper interesting though too centered on the Eventseer service itself: about two-thirds of the contents of the paper is about the service itself (how the data is extracted and cleaned, how the service has been used so far, etc.) while Section 3 only revolves around Linked Data. Furthermore, the Linked Data part is rather standard, and some important features seem to be missing at this point (SPARQL end-point, data dump, deadline taxonomy etc.) Also, looking at a few examples online (on http://redux.eventseer.net/ ), only little information is available in RDF (only the label and the seeAlso properties appeared in the few examples I looked into). Despite those limitations, I feel like the overall idea behind this project is strong and I can foresee many interesting applications built on top of Eventseer LOD data.

Tags: