QA3: a Natural Language Approach to Question Answering over RDF Data Cubes

Tracking #: 1660-2872

Authors: 
Maurizio Atzori
Giuseppe Mazzeo
Carlo Zaniolo

Responsible editor: 
Guest Editors ENLI4SW 2016

Submission type: 
Full Paper
Abstract: 
In this paper we present QA3, a question answering (QA) system over RDF data cubes. The system first tags chunks of text with elements of the knowledge base, and then leverages the well-defined structure of data cubes to create a SPARQL query from the tags. For each class of questions with the same structure a SPARQL template is defined, to be filled in with SPARQL fragments obtained by the interpretation of the question. The correct template is chosen by using an original set of regex-like patterns, based on both syntactical and semantic features of the tokens extracted from the question. Preliminary results obtained using a limited set of templates are encouraging and suggest a number of improvements. QA3 can currently provide a correct answer to 27 of the 50 questions of the test set of the task 3 of QALD-6 challenge, remarkably improving the state of the art innatural language question answering over data cubes.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Jun/2017
Suggestion:
Minor Revision
Review Comment:

The authors have adequately addressed the majority of my comments. In order for the paper to be accepted for publication, some minor issues need to be further addressed, as explained in the following.

It is mentioned in the text that: "the system was able to provide the correct answer in the majority of the test questions". However, the precision and recall values provided in the paper do not justify this assumption (both values are close to 60%). The authors need to further clarify this statement.

Also, I would suggest that the authors revise section 4, providing precision and recall values **per question**, in order to also illustrate what types of questions the system is able/not able to process.

One final comment has to do with the flexibility of the approach. The authors mention that “the template/pattern approach proven to be in fact very flexible, with 7 only patterns covering the majority of the 150 (training + test) QALD questions.”. Although this might be true (I have some doubts - see my previous comments about the performance), the fact that the system is able to handle the QALD questions (i.e. the questions of a single dataset) does not necessarily mean/prove that the system is flexible in the general case. In my opinion, in order to experimentally prove the flexibility of the approach, the evaluation needs to be extended with additional datasets and question types.

Review #2
By Oscar Corcho submitted on 15/Sep/2017
Suggestion:
Accept
Review Comment:

In this second round of review, the authors have clearly addressed the comments that were provided in the first round, and I am generally happy with the way in which they have addressed them.

Particularly, I am now happier with section 3.1, which describes the initial tagging phase. This step was not clear in the initial submission and was the main reason for me to ask for major changes. Now the section describes more clearly this tagging phase and I can see how the process is done, and furthermore the process seems sensible to me.

All other less relevant recommendations have been considered as well (mostly typos or ambiguities in the usage of terms), and I also like the fact tha the trade-off between the introduction and the related work section has been better balanced. Now the paper reads much better as well.

Finally, I also like the fact that now clear references to GitHub repositories are given, which I have been able to explore.

Only a few typos that I have detected:
They value --> Their value
values they are --> values are

Review #3
By John Bateman submitted on 08/Oct/2017
Suggestion:
Minor Revision
Review Comment:

The paper is now much improved and gives good detailed information
about what is done and how. There are due to the additions a host of
minor language problems that need to be fixed again: these are listed
below. There is also considerable repetition in the final review of
the state of the art which needs to be removed or completely rephrased
so that the same things are not said twice! For example, we have
virtually the same information on p. 2 and p.11 on Xser: this cannot
stay as it is. If the technical issues raised by the other reviewers
are now in their opinion well addressed and these problems are fixed,
then I'd support publication.

However, as a rather general point, and after testing out the provided link to
the web interface, I was left wondering about how the reliability of
the answer can be supported. For example, I asked the following the
question almost at random: 'how many countries are there in Europe?'
and got the answer 27. Now, 27 what? The UN lists 44 countries in
Europe, while the EU has 28 including the UK... seems that to make the
answers useful, one would need more than a bare figure. The SPARQL
indicates that what was being counted was

Now while this is clearly beyond the scope of the present paper, some
indications of how the system will support *explainability*, an
increasingly important property of intelligent systems, would add
value considerably. Perhaps further information about the measures
used would help here too. As the authors write:

"An interesting fact related to our approach is that each step we run
provides insights on how well it has been performed."

providing more of this information back to the user could then also be valuable.

It is also very easy to form questions for which no template is found,
so suggestions that the 7 defined are already substantial in their
coverage should probably be scaled down in comparison to a more
realistic measure of the kinds of ranges of questions that might
actually occur, as well as indications of the effort involved in
broadening coverage. Or give a stronger argument that the QALD
question catalogue is sufficient for broader application. I would
agree that the results are promising for the small set of templates,
but it is important not to oversell. The authors begin to address this
in the discussion:

"but also that specific patterns may support
only few questions."

and this needs to be given due attention to convince that the method
is scalable in a useful way. Reference here could possibly be made to
other work working on learning connections between semantic
representations and natural language expressions of those semantics:
indeed, having a semantic representation of the questions might well
make the mapping to SPARQL easier, no? One would expect the use of
regular expressions at some point to come up against limitations...

Language corrections:
-------------------

on a tailored user interface, that require --> that requires
2015 QALD challenges [2] the respectively : no 'the' before 'respectively'

This suggests that translating natural language
questions into SPARQL queries is a really hard task. : hardly a surprise!

As we will see in the followings --> As we will see in the following

They value can be both --> Their (?) value can be both

city in which the measured spend happened : 'spend' is not a noun, do you mean 'expenditure'?
this has to be corrected whenever 'spend' appears.

attribute values they are also used : attribute values are also used ?

using Stanford tokenizer --> using the Stanford tokenizer

and the structure of dataset. --> and the structure of the dataset.

right dataset is peculiar of the statistical --> right dataset is peculiar to the statistical

defined in a datasets --> defined in a dataset

reduces of orders of magnitudes --> reduces by orders of magnitudes

thanks to the following trick: 'trick' is horribly informal and does not sound at all serious - I'd suggest phrasing more appropriately

we lookup the index for every n-grams --> we lookup the index for every n-gram

7-grams up to single words: *down* to single words surely

how much his city spent for public : do not use 'he' for generic reference!

the largest sum of spent amount of money. -> the largest sum of money spent.

namely the number 4, --> namely number 4,

our system need to manually --> our system needs to manually

can be also derived by --> can also be derived by

The online system allows to freely type questions -->
The online system allows the user to freely type questions

over the two set of questions : *sets*

The most performer in this comparison : *best* performer!

while givin up answering: *giving*

which enables to define grammars --> which enables the definition of
grammars

The adopted solution in literature --> The adopted solution in the
literature

Questions in the task 3 of QALD-6 testbed all refers : *refer*!!

Review #4
By Efstratios Kontopoulos submitted on 11/Oct/2017
Suggestion:
Accept
Review Comment:

The paper presents a system supporting statistical question answering over RDF data cubes.

I found the proposed work very promising, tackling the problem of natural language question answering (typically through some clever and efficient "tricks"), and demonstrating some very promising results.

On the other hand, it's understood that the proposed solution is mostly effective in a specific context and not in more generic contexts, like e.g. DBpedia. Still, this does not reduce the added value brought to the table.

Also, kudos to the authors for adequately tackling all the reviewers' remarks during the previous round of reviews. I personally don't have any other major comments to add on top.

Maybe a diagrammatic overview of the system/framework in the beginning of section 3 would help.

Overall, I found the paper very easy to follow, with just one exception. Although the authors state that they have fixed most syntactical/grammatical errors, unfortunately there were still several errors left. Indicatively:
"As we will see in the followings" --> "As we will see in the following sections".
"participated at set of the task 3 of QALD-6 challenge" --> "participated in the set of task 3 of the QALD-6 challenge".
"They value can be both" --> "Their value can be both".
"attribute values they are also used" --> "attribute values are also used".
"questions posed in natural language into SPARQL queries" --> "natural language questions into SPARQL queries".