Semantic Web Machine Reading with FRED

Tracking #: 1007-2218

Authors: 
Aldo Gangemi
Valentina Presutti
Diego Reforgiato Recupero
Andrea Giovanni Nuzzolese
Francesco Draicchio
Misael Mongiovì

Responsible editor: 
Harith Alani

Submission type: 
Tool/System Report
Abstract: 
FRED is a machine reader for extracting RDF graphs that are linked to LOD and compliant to Semantic Web and Linked Data patterns. We describe the capabilities of FRED as a semantic middleware for semantic web applications. It has been evaluated against generic tasks (frame detection, type induction, event extraction, distant relation extraction), as well as in application tasks (semantic sentiment analysis, citation relation interpretation).
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Antoine Zimmermann submitted on 04/May/2015
Suggestion:
Minor Revision
Review Comment:

The paper describes FRED, a middleware for Semantic Web application that applies NLP techniques to extract RDF data from text. As a middleware, it can be used for many tasks in many applications, several of which are presented in the paper. The paper argues that the number of applicative use cases, the quality and efficiency of implemented applications based on FRED, and the relative successes of them compared to other tools on the respective tasks, validates the contribution.

Overall, the paper is satisfying in what it is trying to do: to demonstrate that the value of the middleware FRED. This is difficult because it is hard to assess middleware without assessing a particular application where the quality of the application may be due to features that are external to the middleware. In this paper, the authors chose to proove their point by showing that FRED was at east reasonably successful in tasks covering a broad range, therefore limiting the chance that the successes are only due to other factors.

However, the paper has some drawbacks, especially related to presentation. Unless the reader is already quite familiar with the topic, it is not clear what FRED is up until Section 3. "FRED is a tool for automatically producing RDF/OWL ontologies and linked data from text" -> this should be explicit from the very beginning of the paper. A lot of acronyms are used that are not always explained. This makes the paper sound like it is written for the NLP community within the Semantic Web, in spite of a "Background" section that's suppose to introduce the concepts. Note that I am not an NLP expert at all.

Detailed comments:

Introduction:
NIF is mentioned with reference [19]. This seems to be an inappropriate reference. The main Web page about NIF (http://persistence.uni-leipzig.org/nlp2rdf/) says:

"""
If you refer to NIF in an academic context, please cite the recent paper published at the ISWC in Use track 2013:

Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013)
"""

footnote 1: "..., e.g. ... etc.)" -> "e.g." can't go with "etc." Besides, there is a closing bracket but no opening one.

Sec.2:
"(e.g. DBpedia, YAGO, Freebase, etc.)" -> remove "e.g." or "etc."
NELL is associated with reference 17. It seems not the most appropriate. What about:

"""
Never-Ending Learning.
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2015.
"""

"clearer practices are barely needed" -> is it really what you want to say? Are they not strongly needed?

"NIF [19]" -> again, choose a better reference

Sec.3:
footnote 10: the list of prefixes could be given in a table.

"Anyway, ..." -> this sounds familiar language / spoken language.

"from the termprogramming language" -> missing space

"DRT" -> what is it?

"Since Wikipedia is also rich in "conceptual" entities, TAGME results to be also a precise word sense disambiguator" -> "TAGME turns out to be"?

The names of the subsections / paragraphs (NER, WSD, etc) should rather have the full form, and the abbreviation be used inside the paragraphs.

Sec.4:
"Here formatted data are taken into account by K~ore" -> what is K~ore?

Sec.6:
"F1 = .92 for the type selection, F1 = .75 when WSD is added" -> so WSD degrades the results? Is it something to be expected? It seemed to me that this should be the other way around and that this should be explained

twice, there is "FREDÕS" instead of "FRED's".

Ref.:
[29], the title is wrong, should be "FaBiO and CiTO: Ontologies for describing bibliographic resources and citations". Also the journal "Web Semant." would be clearer with it full name "Journal of Web Semantics". Besides, the formatting of the references is not uniform accross entries.

Review #2
By Philippe Cudre-Mauroux submitted on 06/May/2015
Suggestion:
Minor Revision
Review Comment:

This manuscript gives a high-level overview of FRED: a machine-reading software for extracting knowledge (i.e., RDF graphs) from text and linking it to the LOD cloud.

FRED is presented as a semantic web middleware, in the sense that it combines and improves several natural language components, and then interlinks and serializes the results using the NLP Interchange Format. The paper starts with some background in semantic web and NLP as well as in related issues in terms of interoperability. Following this, the capabilities, architecture, and implementation of FRED are introduced.

Overall, I found the paper interesting and the topic itself (i.e., FRED) definitely worth a publication in the Semantic Web Journal. However, I feel like the current version of the paper does not do justice to FRED, in the sense that the paper is often confusing and does not provide enough information about the tool, its architecture and its performance.

The examples in Section 3 (FRED Capabilities) are definitely helpful. After reading this section, however, I was still unclear about the tool's capabilities. A figure or a table summarizing the main capabilities of the tool would be helpful in that context.

I had similar issues with the following sections (FRED Architecture). I found the overall architecture unclear; for example, it is very difficult to understand which components the text has to go through before the annotations get produced (some flow or process diagram would help in that context). Along similar lines, the exact function of the various tools that are mentioned is unclear (e.g., Apache Stanbol, which is cited but whose role is not specified). From my perspective, the architecture would be made clearer by only introducing the important components of FRED and their relationships in Section 4, and leaving the specifics (e.g., REST, TAGME, Stanbol, etc.) to the following section (implementation).

Section 5 (Quality, Importance, Impact) contains a lot of information about the extensions of the tool; I feel like the descriptions of those applications deserve their own subsections (5.1, 5.2, etc.). Then, some concise information about the end-to-end performance of those tools (both in terms of efficiency and effectiveness) should be summarized using tables and/or graphs in a different section.

The conclusions only give a short summary of FRED and of current developments. It would be very interesting to also include (either as part of the conclusions or as a short distinct section) a discussion on the lessons learnt, current limitations / challenges related to the tool, its implementation and its deployment.

Finally, the figures are generally speaking very difficult to read. Figure 1, 2, and 3, for instance, are key but very difficult to decipher on paper.

Review #3
Anonymous submitted on 14/Jul/2015
Suggestion:
Major Revision
Review Comment:

FRED is a very interesting tool that renders entity extraction and text processing outputs into rich and well connected semantic graphs, thus increasing the processability and value of the output. It uses a suite of tools for the extraction of various elements of knowledge from sentences, which it processes and integrates and links to LOD. It is also freely available and accessible.

The paper however needs a far better structure. It wraps up FRED as a machine reader for the semantic web, which is vague and confusing. The only clear explanation of what FRED does is at the start of section 3. Unclear why this simple and clear explanation was not provided earlier, and instead, vague notions of NLP and machine reading were used instead. Even the abstract fails to explain what FRED really is and does.

Good to see so many graphs to provide examples, but they are all testing the readers' eyesight rather than helping them understand the work better. This is not helpful at all.

The workflow of FRED is missing. It would have helped a lot to see the order in which the various components are activated, and what they take and produce. The FRED architecture diagram could be replaced with such a workflow.

I guess Alchemy API and the like, although they give Dbpedia links as output, they do not provide a graph of how this output (extracted entities) are connected. I think this is what FRED adds to such tools, and more. Perhaps this could be made clearer in the related work section.

To illustrate impact, authors give many examples of applications that use FRED and achieve good accuracies, which is ok since it shows value and reuse. However, I wonder if it Is truly the case that these accuracies are completely attributed to the use of FRED. Do we know how accurate these applications were or would be without FRED? is this even measurable? without this, how can we scientifically state that FRED was useful? So please provide accuracies before and after the addition of FRED.

typos:
section 2 "attempts are happening since a while" is odd grammar. maybe "... for a while"
page 3: strange sentence "really worrying of any SW or just formal reuse of those data"
page 4: shouldn't start a paragraph with "Anyway". maybe "nevertheless" is more appropriate
page 9: what is FREDÕs? typo?

OVerall, a nice tool and worth publishing for sure, but the paper needs a major facelift.