Review Comment:
The article "eagle-i: biomedical research resource datasets" describes the efforts of a two-year project of gathering biomedical research data from 25 different institutions and making them available through a semantically enabled, federated search system and as linked data sets. The article consists of all the information about the data, the source, type, modeling and availability and additionally contains three interesting use cases to portray the usefulness of the data.
The effort to, as the authors state, "make these "invisible' research resources more discoverable" is definitely commendable considering the wide range of information that they make available from 25 different institutions. All the data is important and gathering it in one large repository and making it available as linked data is definitely a step in the right direction. Also, there are already users of this dataset.
The authors have covered majority of the points necessary for dataset description articles. However, the major aspects that are lacking are:
(i) Interlinks to other external data sources, including quantity, quality and purpose
(ii) Quality of the data itself, how good/accurate is the ETL process and more information about SWEET - who are the users and how is it useful? Also, report of the quality issues that the current users may have encountered.
(iii) Description (and example screenshot) of the web-based search application mentioned in Section 1 - does it support keyword search?
(iv) Explicit licensing information preferably as a VoID description of the dataset which also includes the versioning information
(v) Related work or related initiatives such as Bio2RDF etc.
(vi) It would be interesting to know about the performance of the SPARQL endpoints considering the huge amount of data that is queried.
(vii) How often is/would the data (be) updated? Does it change often? Are the older versions available?
The paper is well written throughout and I only have a few minor comments:
- I would align the numbers according to the units or either side in Table 1 and 2.
- Figure 1 is a bit unclear when printed. I recommend to increase the font a bit.
- "soft stackware", did you mean "software stack"
- The sentence " The lack of a single SPARQL query interface to search over all of the eagle-i datasets at once, but is easily overcome using programmatic access." is incomplete
- Instead of referring to the blog post in reference 5, I would add the link to the paper: http://www.carlotorniai.net/docs/integrated_pipeline.pdf
As a side note, I would like to point the authors to this paper: http://www.ncbi.nlm.nih.gov/pubmed/19397794.
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...