Natural Language Generation and Semantic Web Technologies

Paper Title: 
Natural Language Generation and Semantic Web Technologies
Authors: 
Nadjet Bouayad-Agha, Gerard Casamayor, Leo Wanner
Abstract: 
Natural Language Generation (NLG) is concerned with transforming some formal content input into a natural language output, given some communicative goal. Although this input has taken many forms and representations over the years, it is the semantic/conceptual representations that have always been considered as the “natural” starting ground for NLG. Therefore, it is natural that the semantic web with its machine-processable semantic data paradigm has attracted the interest of NLG practitioners from early on. We attempt to provide an in-depth overview of the approaches to NLG from semantic web data, emphasizing where robust, sustainable techniques have been used and pointing out weaknesses that still need to be addressed in order to improve both the performance and scalability of semantic web NLG.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Philipp Cimiano
Decision/Status: 
Major Revision
Reviews: 

Solicited review by John Bateman:

This paper presents a generally useful introduction both to NLG and to the particular application of NLG in the context of the Semantic Web. Since this is a growing area and will no doubt become increasingly important, it is worthwhile and timely presenting the range of work and issues involved in NLG to the semantic web community. This is the function of the paper as I see it and on this basis I would certainly recommend that it be published. The paper does however need to be worked over thoroughly in several ways, which I detail below. The paper's overview of the relevant field is really
very extensive and the authors are to be congratulated on putting this all together in one accessible place.

As a general style note, though, I find the form of referencing where only numbers appear to belong in the stone age and to be extremely reader-unfriendly: thus all references such as "[23] say that" should include the names of the authors. If this is a journal style policy, then the editors (and authors) should appeal to have it changed. There
is no excuse for this form of reference these days. In addition, some of the references in the bibliography are incomplete. For example, Jarrar et al [69] and White [122] have no publication details beyond year; [98], [99], [109], [116] and possibly others are missing page numbers. The references should therefore be checked for completeness
throughout.

More minor points concerning the content and discussion, which the authors may want to respond to or adopt in their next version:

p1, §2 "rendered as a coherent narration": of course there are many other text types, or genres, that the information could be rendered as (report, description, etc.); the
restriction here to 'narration' is inappropriate.

The issue of components for NLG has aways been tricky; this is probably the main reason why RAGS has not found acceptance. For example, I would certainly have put lexicalization as a subtask of generating referring expressions rather than the other way round!

§2.1: single genre generators: although it might be worth remembering that this was the original goal of *general purpose* generators, such as Penman (KPML), Mumble, SURGE and the others. And actually is still implicit even in components like SimpleNLG: the genre decision is a separate one.

I am not sure that writing "for somewhat outdated detailed reviews" is beneficial: if something is outdated, then it certainly should not be mentioned here (unless this were a historical report, and it isn't).

Some of the references to 'staggered' architectures are pretty old; are there no newer references that have tried to breakdown the pipeline view (e.g.,: Marciniak/Strube; Dethlefs; Belz 2007 ?)

p4, col.2 It may be relevant here to show more awareness of the limitations of Semantic Web formalizations as well: "OWL/RDF offers explicit support for the modularity..." This is limited to importing entire subontologies and so is only one kind of (quite restricted) modularity. There is an entire literature on more sophisticated modularities, as seen in the WoMo international workshop series (Workshop on modular ontologies: cf. http://www.informatik.uni-bremen.de/~okutz/womo5/) and some of these ideas may well be required for more advanced NLG capabilities and representations of resources in the future.

p5, col.1 here there is talk twice of "complete" something or others ("complete model about question answering", "complete ontological model"): it is unclear what this could mean. Either completeness (in the mathematical sense) has been proved, which does not appear to apply to either of these cases, or the claim of being 'complete' is absurd. There are an almost infinite range of other facets of users that could be modelled, so to say that a complete user model exists is just silly. Perhaps this is a language problem and some other word rather than 'complete' was actually intended.

It might also be well to reflect on the problem of similar formalisms encouraging merging things together that do not belong together: putting the lexicon in the knowledge base is, to my taste, a really bad decision (in general and on theoretical grounds, although of course it could be motivated on particular engineering grounds within single systems); thus the NaturalOWL approach messes up the entire modularity that could beneficially be enforced even within RDF/OWL.

p10, col.1 verbalization of ACE-sentences: it is not quite clear what happens to the first-order component; does this go into SWRL and then is *not* verbalised, or are only the components that have a translation in OWL (and so presumably falling within DL/SHOIN etc.) verbalised?

p11, col.1 "false implicatures that can be drawn from some of the ontological axioms": surely here you mean false implicatures can be drawn from the *verbalisations* of the ontological axioms! No implicatures follows from the axioms, just entailments within the logic they are couched in.

And in this entire discussion of verbalisation, shouldn't the work by Adam Pease on verbalisations of SUMO/MILO using the connection to WordNet and generation templates be mentioned: or is this ruled as not being Semantic Web because of the first-order expressiveness of SUMO's specification language?

p20, col.1 the mention of heterogeneous data sources is interesting but not taken very far. This relates to the discussion of modularity above; relating heterogeneous DL descriptions has been addressed in frameworks such as E-connections (Kutz etal 2004, AI Journal): there are few NLG frameworks that have begun to consider these more sophisticated possibilities however. A sketch of something moving in this direction is the Hois/Kutz paper in the 2008 Formal Ontologies and Information systems conference book (FOIS 2008). But the linking per se and getting
inferences to work across such data sources is *not* an NLG problem necessarily.

Finally, the English is in many places very problematic and
unfortunately not of a publishable quality. I give a list of basic corrections below, which may well not be exhaustive. After correction, the paper should be checked through again by a competent English reader.

p1, §2.1: laymen --> lay people
p2, col.1: prognosticated only based --> predicted based only
"This is about to change." Delate (it has changed already) Better:

"But statistical realizations of different NLG tasks are now also popular."

"In the best of cases," Delete.

"best for these dependences" --> "these dependencies"

p2, col.2: "sufficient as a" --> "sufficient in many cases for a"

"has been increasingly given more prominence" -->
"has received increasing prominence"

p3, col.1: "relevant on the research" --> "relevant for the research" "of the SW-technologies" --> "of SW-technologies"

p3, §2.2 "The design and realization of nearly each of ... of it." --> "Among the design and realization of the generation tasks listed at the beginning of Section 2, especially content selection, discourse structuring, and lexicalization depend on the type of semantic input structure used by the generator since they ... of it."

§2.2.1 "the so-called closed-planning" --> "so-called closed planning" (no determiner)

and wouldn't a reference to Nicolov be relevant here?

p3, col.2. "is the Rhetorical Structure Theory" no determiner.

p4, col.1 "The most prominent of them" --> "The most prominent of these"

"bridged the gap between LOOM representations and linguistic structures" : seems categorially confused. The gap bridged was, as far as I know, between domain representations and linguistic representations. LOOM refers to the formalisation language used to represent the information, not the status of the information itself.

"used in the context of systemic-functional generators" --> "used in the context of the systemic-functional generators"

"KMPL" --> "KPML"

Alongside [10], I suppose the rather more uptodate article in the Artificial Intelligence journal on the Generalized Upper Model would be relevant here

Bateman, J. A.; Hois, J.; Ross, R. J. & Tenbrink, T. A linguistic ontology of space for natural language processing Artificial Intelligence, 2010, 174, 1027-1071

col.2 "As a consequence, the need for a separation between
different types of knowledge..." : indeed: and this was, I take it, the principal motivation for the entire design of the Penman Upper Model, so a reference to this motivation here would be more appropriate.

p5, col.2 "In [86]", "[93,105] argue" yeeuch.

"Being appealing from" : not an English construction; try:
"As it is appealing from the perspective of..."

"to ensure a certain quality" --> "to ensure quality"

"On the other hand" should only be used in pairs; use:
"Alternatively", etc.

p6, col.2 "associated to their plan" --> "associated with their plan"

"KL-ONE or LOOM)" add comma after closing paren.

"not powerful enough to express all required meaning
constellations" : again, senseless hyperbole, nothing is that powerful! (apart from people and even that is unclear). Tone down appropriately for a scientific article.

"Semantic web offers" --> "The Semantic Web offers" (article obligatory)

"can deal" --> "should be made able to deal" (not good, but better: since this is a challenge, there needs to be some causation expressed to bring something about: this is not present in "can")

"As a matter of fact, it was extended by [55]." -->
", as shown in the extension provided by Name [55]."

"network and stopping" -->
"network, stopping"

"et al. cope with large scale SW repositories in content selection" -->
"et al. cope with content selection in large-scale SW
repositories"

p7, col.2 "developed so far and exploited" --> "developed so far have exploited"

p8, col.2 "category captures Natural" --> "category covers Natural"

p9, col.1 "the more needed and sophisticated is the NLG-technology" -->
"the more sophisticated is the NLG-technology ( ... ) required"

p9, col.2 "certain idiosyncratic" --> "idiosyncratic"

"She was then asked..." --> "The user was then asked to pick the one preferred based on the correctness..."

p10, col.1 "a the Attempto" --> "the Attempto"

p10, col.2 "and after having seen some examples" --> "and seeing some examples"

"(Part of Speech)" --> "(part of speech: PoS)"

p11, col.1 "where each minimal" --> "since each minimal"

"joining all the neighbourhood" --> "joins all the
neighbourhood"

"structures is applied to form" --> "structures to form"

p11, col.1/2 "as introduction of more specific concepts before more general ones and evaluation of" -->
"as introducing more specific concepts before more
general ones and evaluating"

p11, col.2 "two small ontologies, and although" -->
"two small ontologies and, although"

p12, col.2 "The text planner that ensures" --> "The text planner then ensures"

p13, col.1 "folskonomies" --> "folksonomies"

paragraph beginning as follows is pretty awkward:

"The authors justify the use of KMPL by ...

Suggestion:

"The authors justify the use of KPML by virtue of KPML's acceptance of ontology-oriented logical specifications formulated in terms of the Sentence Plan Language [72] as input and the system's provision of large-scale NLG resources (including grammars and task and
domain-independent linguistically-motivated ontologies such as the Penman Upper Model [8]) for a number of languages. The system also offers a stable grammar
development environment."

p13, col.1 "test ontologies, the generated" -->
"test ontologies the generated"

p14, col.1 move the possessive marker after the names Galanis and Androustsopolous rather than after the [55]

I'd suggeset making "Classes and individuals in the OWL ontology" a new paragraph.

"associated to individuals" --> "associated with
individuals"

p14, col.2 "is provided such that" --> "is provided so that"

"2) model" --> "2) a model"

"3) model" --> "3) a model"

"or he might" be consistent! Avoid the generic 'he', use 'she' if you must, or do you want to argue that principal investigators (in opposition to the female 'user' adopted just after) are male! I'd report 'the investigator' personally and not use a pronoun at that point. Same for 'user'

p15, col.1 "discourse (as" --> "discourse (such as"

p15, col.2 "(FOHM) model" remove 'model' as this is already in FOHM.

"They consist of ordered" --> "These templates consist of ordered"

"realized as sentence" --> "realized as a sentence"

"which implements [77]'s" : add name

p16, col.2 "As benefit" --> "As a benefit"

"The obtained scores were significantly high" : in what sense? If statistical, say so; otherwise this is just noise and should be ommitted.

p17, col.1 "we discussed in the previous sections" --> "we have
discussed in the previous sections"

p18: not all of the systems mentioned in the text are in the table, e.g., Schütte 2009 (as far as I can see): is this deliberate? If this is a selection, the grounds for inclusion should be clearer: what is 'condensed' on what criteria?

p19, col.2 "discuss a few these isses" --> "discuss a few of these issues"

the number of footnote 26 in the text occurs between its own commas, move after the comma and delete the spurious additional comma.

p20, col.1 "RAGS C. Mellish" the referencing is messed up here and a comma should come after RAGS if the name really appears in the body of the text.

"recasting ... RAGS data" move the "RAGS data" immediately after "recasting" and move "quite naturally" after "remedied"!

"increase the performance of the generators": why does this follow? seems an independent dimension that could go either way depending on other issues.

p20, col.2 "large data set(" --> "large data set ("

"works on ontology" --> "work on ontology" ['work' in the plural is often clumsy, unless you are talking about paintings or something]

p21, col.2 "The NLG-component used ... is" --> "The NLG-components used ... are"

"data-driven approaches, and" no comma.

Solicited review by Philipp Cimiano:

Natural Language Generation and Semantic Web Technologies

This paper provides an overview of natural language generation approaches in the context of the Semantic Web. A review like this is clearly very useful and I would like to see it accepted for the journal.

However, there are two issues that need to be addressed before this review paper can be published at the journal

1) Lack of examples: One problem of the paper is that it lacks examples. The authors could include here and there some examples of input and output of particular NLG systems. This would make the article more accessible to a broader audience:

2) Challenges involved in NLG from Semantic Web data: the article does not emphasize sufficiently the technical challenges involved in scaling NLG to the Semantic Web. A section giving an overview of the main challenges could be added.

3) Too many details / no systematic overview: The article misses a systematic overview of different paradigms and approaches and an overview of the pro and cons of different NLG approaches. Instead, the article discussed many details of a number of system. The authors have clearly a lot of knowledge in the field, but discussing particular systems in detail is not particularly useful for somebody that wants to get an overview of the main approaches in the field or as a starting point for research in the field. I would propose that the authors try to abstract from details of systems and focus more on providing an overview of approaches, then mentioning systems that implement these approaches.

So my suggestion is to request a major revision of the paper by the authors. The article should be reduced in size to half of its current size. This is possible by removing the detailed discussion of particular systems and trying to provide a systematic overview of approaches / paradigms rather than discussing concrete systems. The article as it stands is clearly too long and fails to give a systematic overview of different approaches to NLG in the context of the SW.

Some more detailed comments:

Section 3.2 „Use of SW technologies for reasoning in NLG“: it is really not clear at all what this section tries to convey. It talks about many systems, about OWL 2.0, OWL-DL, SPARQL-RDF, SPIN-SPARQL the usage of SPARQL queries on OWL to get a content plan, etc.

There are several issues here: first, it does not become clear here what kind of reasoning the authors talk about here and how it would help the NLG system. Is some sort of subsumption reasoning or other kind of reasoning in the focus her? Second, SPARQL is a language for querying RDF and not OWL. Finally, it seems that what the authors mainly discuss is the use of SPARQL in the content planning step. But then it is not clear why the title of the section talks about reasoning if the authors really only refer to using SPARQL to query a dataset and extract relevant information from which to generate NL. This needs to be clarified and the whole section needs to be reworked and the main points made more clear.

Section 3.3 (Scaling up to large repositories of data): it is also not clear what this section aims at. The main point seems to be that „content selection strategies can deal with SW repositories“. In this sense there seems to be an overlap with Section 3.2 to an extent that it is not clear anymore what Section 3.2 vs. Section 3.3 try to say. There is no real discussion on scalability and what the challenges/problems involved are, so also here the title of the section is quite misleading. The section is a good example for the point I made above: the section discusses specific work of O’Donnel, Bouayad-Agha, Dai et al, but clearly fails to provide a systematic overview of the main approaches to content selection from the Semantic Web, abstracting from concrete systems as well as formalisms. For example, using SPARQL is only a language to implement queries to an RDF dataset and not particulary interesting. The interesting thing is what content is retrieved. Whether this is done by SPARQL, SQL or in some other way is not really crucial IMHO.

I would also advise to change the title of the article to: „Natural Language Generation in the context of the Semantic Web“

Some stylistic issues:

The authors use references as nouns, i.e. as part of a sentence. This is bad style and should be avoided.

Some examples with proposed rewritings follow:

Section 2.2.1 : „in [24,27]“ => „in the systems proposed by Bouayad-Agha et al. [24] as well as by Bouttaz et al. [27]

Section 2.2.2 „of [87]“ => „of McKeown [87]

Section 3.1

„done in [36,55]“ => „in the approaches of Dannels [36] and Galanis and Androutsopolous [55]“

„[93,105] argue that“ => Melish and Sun [93] as well as Power and Third [105] have argued that...“

etc. etc.

Section 2.2.3 : what is „*semantemelexeme* association“ ???

Section 2.1: representations have been experimented with – including model-theoretic semantics -> model-theoretic semantics is not a representaiton, but a way to define the semantics of a logical language (e.g. first-order logic, modal logic, descriptions logics, etc.). The authors probably mean first-order logics and related formalisms here.

Section 3.5

„However, most NLG-applications developed so far and exploited...“ -> „However, most NLG-applications developed so far *have* exploited...“ ???

Section 5.5

„The vocabulary and data in the SW are often obtained automatically or semi-automatically from textual information in the Hyperdocument Web using Knowledge Acqusition, IE or Text Mining techniques“.

=> I do not agree with this. Most ontologies in the SW are modelled by hand and most data are generated automatically from existing databases. Could the authors provide evidence for this statement?

Solicited review by Ion Androutsopoulos:

This article is a survey of Natural Language Generation (NLG) research related to Semantic Web (SW) data and technologies.

According to the journal¢s guidelines: “Survey articles should have the potential to become well-known introductory and overview texts. These submissions will be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.”

Starting from (4), the subject of the survey is particularly important for the SW community. As discussed in the article, NLG can be used during ontology engineering, to allow domain experts who are unfamiliar with SW formalisms to interact with natural language renderings of the ontologies; and it can also be used to present SW data (e.g., product descriptions) to end-users, possibly in different natural languages and tailoring the presentations to the users¢ profiles. In both cases, NLG can help make SW data easier to understand for people who are not SW experts, and this may have a significant impact on the adoption of SW technologies.

Regarding (2), the article first provides an overview of established (largely pre-SW) concept-to-text NLG approaches, including a brief description of the most common processing stages of a typical pipeline architecture. The article then focuses on NLG especially for SW data and technologies, as well as “burning issues” that need to be addressed to maximize the positive interplay between NLG and SW. The overview of established NLG approaches is rather brief, and readers without an NLG background may get a limited view of the range of language phenomena and problems that NLG research has addressed over the last few decades; but sufficient pointers to relevant literature are provided, and interested readers could follow them to get a broader view of past NLG research. There is also very little coverage of more recent, mostly statistical approaches to NLG, though this may be justified by the fact that most NLG work for SW data does not yet employ statistical methods. The coverage of NLG work especially for SW data and technologies is very thorough, and this part of the article is also reasonably well organized and balanced. The “burning issues” section is potentially very useful for researchers wishing to get involved in NLG for the SW. By contrast, the concluding section is weak; a concise summary of the most important points discussed is needed.

Regarding (3), the article is generally well written. My only complaint is that the descriptions of some of the methods are difficult to follow in the absence of examples illustrating how the methods work. I realize that including examples would increase the article¢s length, but I would still recommend including examples at least at points that are otherwise very difficult to follow; see detailed comments below. There are also several acronyms, names of components or resources etc. that do not seem to be absolutely necessary, and I would recommend removing them; again, see detailed comments below. The use of English is very good; some typos and minor comments are listed below. The bibliography entries and figures are sufficient.

Regarding (1), I would definitely recommend the survey as an introductory text to researchers, PhD students, practitioners etc., especially if the suggested improvements are taken into account. This article may well become a standard introductory text for researchers interested in NLG and SW technologies.

More detailed comments:

Section 1, lines 10-12, “linguistic surface-oriented structures already predetermine the linguistic form of the output beforehand”: Saying “… already predetermine, at least partially, the linguistic form…” might be more accurate.

Section 1, line 13: It would be worth providing a few pointers to prominent work on NLG from numeric data.

Page 1, column 2, line 1, “machine-processable semantic data”: What is “semantic data”? Maybe say “machine-processable data with formally defined semantics.

Same paragraph, last line: What is “semantic NLG”?

Section 2, line 5, “discourse and information structuring”: “Text planning” or “discourse planning” would be more standard terminology.

Same paragraph, aggregation: It might be better to mention aggregation after lexicalization, since aggregation may also examine linguistic structures produced by lexicalization. Maybe also provide a brief example of aggregation, for the benefit of readers unfamiliar with NLG.

Same paragraph, “onto language-specific semantemes and lexemes”: I would recommend using simpler terms, for example say “onto language-specific sentence representations”.

Same paragraph, “morpho-syntactic realization and linearization”: “Surface realization” would be more standard terminology.

Same paragraph, (5), “projection of the discourse or sentence plan obtained from the preceding tasks onto the surface”: I recommend rephrasing this as “projection of the discourse and sentence specifications obtained from the preceding tasks onto the final (surface) form of the text”, since the meaning of “surface” may otherwise be unclear to some readers.

Section 2, paragraph 2: Aggregation is also (at least partially) concerned with semantics. Hence, it seems strange not to include aggregation in the following discussion.

Section 2.1, paragraphs 1-2: It might be worth including the discourse history, physical location and surroundings in the discussion of “context”.

Section 2.1, paragraph 3, “application such as Dialogue”: Calling Dialogue an information processing application sounds strange. Maybe refer to “dialogue systems” instead.

At several points throughout the article the authors use dashes at points where they do not seem necessary (e.g., “NLG-research”, “NLP-application”, “SW-standards”, “OWL-ontologies”).

Table 1, input, Type: Isn't a table or a template also structured information?

Same table, “Domain independence”: Cannot a “conceptual representation” be domain dependent?

Same table, “Request” and “Communicative goal”: The distinction between the two is unclear to me.

Same table, “Coherence”: I suggest replacing “coherent paragraph” by “coherent text”, since coherence is not restricted to paragraphs.

Same table, “Fluency: fluent NL, controlled NL, or telegraphic style”: Cannot controlled NL be also fluent? Cannot a controlled NL be also telegraphic?

Page 3, first paragraph under Table 1, reference [122]: It might be worth including a pointer to E. Krahmer, M. Theune (eds.), Empirical Methods in Natural Language Generation, Springer, 2010.

Same page, paragraph 2: The part from “For instance” to the end of the paragraph is bit unclear. Some examples are needed to illustrate the problems of a typical pipeline architecture. For example, content selection may select information that is difficult for the discourse planner to structure coherently. “Syntacticization” and other uncommon terms or terms that have not been defined should be avoided.

Same page, paragraph 3, “document or text planning (sometimes also referred to as content determination or macro-planning)”: Content determination is usually considered part of document planning, it is not an alternative name for document planning.

Last paragraph of page 3 and first paragraph of page 4: The discussion here seems to assume that any evaluation that relies on human judges is qualitative. An evaluation with human judges, however, can be quantitative (e.g., if it collects scores from the judges, and applies a statistical analysis on the scores).

Section 2.2.1, paragraph 1, last sentence: "Say everything there is to say” is also an explicit communicative goal. Maybe say “where content selection is guided by communicative goals that require particular subsets of the available information about an object”.

Section 2.2.1, paragraph 2: It might be worth mentioning that some systems use the RDF graph directly, i.e., a graph whose nodes correspond to classes and individuals, instead of using a graph whose are facts and edges represent selection constraints.

Page 4, paragraph 2, “It has been argued…”: This paragraph is unclear to me. First, the distinction between local and global coherence should be defined more clearly. Secondly, if the selected semantic units (e.g., facts) are already linked with discourse relations, hasn't global coherence already been achieved? Should “to achieve global coherence” be replaced by “to obtain a total ordering of the semantic units (e.g., facts) selected”?

Same page, paragraph 3, “Another issue…”: Please explain more clearly what EDUs are, and how they can be formed using templates or via topological search.

Section 2.3.3: Simply mentioning terms like “discrimination networks” or “semantemelexeme association with additional variation” along with pointers to articles is not particularly useful. Please explain the main ideas.

Same section, paragraph 2: Exactly which gap did Penman UM bridge? Please explain more clearly how a Generalized Upper Model helps.

Section 3.1, paragraph 1, “enriched by the information how to”: I couldn¢t parse this sentence.

Page 5, paragraph 2, “The layers… can produce”: I could not follow this sentence. Please break it into shorter, simpler sentences.

Same page, paragraph 5, “A complete ontological model”: Complete in what sense?

Page 5, column 2, paragraph 4, “[93, 105] argue”: At several points the authors use reference numbers as subjects of sentences, which does not look right.

Section 3.2, paragraph 2: An example of a concept that arises in human communication and is not modeled by domain ontologies would help.

Same section, paragraph 4: I could not understand how NLG can be used to “verbalize the query potential of an ontology in a non-standard query language”.

Section 3.4, paragraphs 1 and 2: These paragraphs are unclear to me. What does “task” mean here? Why is task-specific knowledge domain-independent? Also, how is the mapping facilitated by the UM?

Section 3.4, last sentence: What does “meta-model” mean here?

Section 3.5, paragraph 1, sentence 2: What is the “Problem Description Language”? Do we need to know its name?

Same paragraph, “However, most NLG applications developed so far and exploited”: I couldn¢t parse this sentence.

Page 8, last paragraph of Section 3: What exactly is a “reason-able view”? Do we need to know this term?

Same paragraph, last sentence: This sentence contains nested brackets, which make it difficult to follow the sentence.

Page 9, paragraph 2, “round trip authoring”: It is worth explaining that the CNL is edited (e.g., by an ontology engineer) before being translated back into ontology axioms.

Same page, paragraph 3, “in the ROO-authoring tool for the Rabbit-CNL”: Is something missing here? Should the sentence say something like “as, for example, in the ROO-authoring tool…”?

Same paragraph, last sentence, “Unfortunately, only a few”: I couldn¢t understand this sentence. Isn¢t comparing systems that do or do not use NLG a way to compare the “underlying technology” (NLG vs. not-NLG)?

Page 9, paragraph 10, “interactive guided verbalization, i.e., conceptual authoring”: What is “interactive guided verbalization”? Is it a synonym of “conceptual authoring”, as the phrasing here seems to suggest?

Page 10, paragraph 2, “a the Attempto Parsing Engine”: Delete “a”.

Page 11, paragraph 1, “In order to achieve a more balanced… linearized into a sentence”: I couldn¢t follow this sentence. Please break it into shorter, simpler sentences. An example might also help.

Page 11, paragraph 2, “if they can be inferred from previously selected axioms”: What are the “previously selected axioms”? Why should the (new) selected axioms be inferable from the previously selected axioms? Also, in “depth-first search of refined concepts (i.e., subsumers)”, should “refined” be “more general”?

Page 11, column 2, paragraph 2: What is a “top-level argument”?

Same paragraph, “However the result is dampened … and asked them which is”: The English of this sentence may need to be re-checked.

Page 11, last paragraph, “The paths are translated into NL statements”: Should “NL statements” be “NL questions”?

Page 12, first paragraph: I couldn¢t follow this paragraph. Please explain in a clearer manner. An example might help.

Section 4.1.2, paragraph 1, “the user authors the concepts of the ontology schema to formulate a query…”: Does the user author the concepts of the ontology even when formulating a query?

Same paragraph: I am familiar with WYSIWYM, but I doubt that a reader unfamiliar with this approach would be able to understand how WYSIWYM works without an example.

Footnote 13: This URL has already been provided.

Page 13, last paragraph, “Argüello et al. [3]¢s generator”: This should probably be “Argüello et al.¢s [3] generator”.

Page 14, paragraph 2, “Galanis and Androustsopoulos [55]¢s”: This should probably be “Galanis and Androutsopoulos¢s [55]”. Also, delete the extra “s” in the second surname here and in footnote 15.

Same paragraph, “interest to the use”: This should be “interest to the user”.

Page 14, column 2, paragraph 2, (2) and (3): An article (“a”) is probably missing at the beginning of (2) and (3), i.e., “a model”, not “model”.

Footnote 21: This URL has already been provided.

Page 15, last paragraph: The discussion here is very difficult to follow. An example might help.

Page 16, column 2, last paragraph: Is it necessary to introduce the SQASM acronym? Is it even necessary to mention the name of the model (Semantic QA Structure Model)?

Page 17, column 2, paragraph 3: What are “pondered nodes”?

Footnote 24: I couldn¢t follow the argument of this footnote.

Reference 56: “Protégé” needs fixing.

Tags: