Review Comment:
The authors present a method for QA over Linked Data, which -- in contrast to existing work --
is mostly independent of the underlying knowledge base and natural language used for querying.
The presentation of the method is quite clear, and a plus of the article is the extensive evaluation
using different QA datasets like QALD (various datasets), SimpleQuestions, etc.
The method for query generation is not too complex, but original in its ambition to be KG- and (natural) language-agnostic.
So, in summary, the biggest advantages are that the approach is agnostic to natural language used and knowledge base,
and also it can be used to query multiple KBs at the same time (although currently not supporting (runtime-)parallel querying of KBs).
The main downsides are, that the approach still needs some manual configuration / training in some of the components, so
for someone external trying to use the method on a new dataset, this might be an issue, esp. with the lack of specific knowledge
on how to set up / train this system.
Some manual work is still necessary in steps like providing lexicalizations for entities/properties, tuning to new languages
(with stopwords/stemming), training the ranking (step 3.4. in the paper), etc.
Furthermore, at the current moment, the system only supports queries of low complexity and for example doesn't handle expressions
like superlatives in queries yet.
Finally, the approach is more general and always doesn't keep up with other state-of-the-art methods trained on the QA specific dataset, but
that is to be expected. Hopefully, future work (eg on errors coming from the lexical gap or missing support of eg. superlatives)
will improve performance metrics.
But, in total, I appreciate the effort to move into the direction of providing a platform for potentially querying any dataset in the LOD cloud
in (theoretically) any natural languages, and see it as significant contribution to QA on LD, and therefore
recommend acceptance of the paper, given that a few minor issues (see below) are addressed.
Minor revision action points:
-----------------------------
Most of the issues here reflect a wish to see some clearer ideas or at least discussion on how to address the current downsides of the presented approach.
- Manual effort needed. Please provide some discussion:
In your approach, where do you see the possibility to further reduce manual effort (without harming QA performance), and how?
For areas where it is not possible to remove manual work, does your system provide a clear description
(eg on github) to someone interested in adopting the system, on where and how to provide manual work -- so that this is not a hindrance
for adoption of the system. Or is at least planned to provide such a description at some point?
It might be good to add a paragraph to the paper that summarizes in which components manual tuning/training is necessary,
and how much effort is to be expected...
- According to your evaluations, issues with the lexical gap between queries and the KB (labels) are biggest source of errors (around 40%).
I'd like to see some ideas (eg in future work) on how to address this issue, as it seems very important for performance.
Furthermore, there has been a lot of work in the last years based on distributional semantics (with word embeddings),
which might be useful to better align terms in queries and the datasets. Just as an example, fastText embeddings are available
for currently 294 languages, trained on Wikipedia (https://github.com/facebookresearch/fastText/blob/master/pretrained-vect...).
So this might be something to add to the system, although I am aware that embeddings can make the system much "heavier".
--> so I am not advocating embeddings necessarily, I'd just like to hear some ideas on how to address this (biggest) source of errors in your system in future work.
(And, for example, the manual alignment of SimpleQuestions properties with lexicalizations in SimpleQuestions questions is not elegant,
so there are some ideas needed.)
- You state that language/KB-agnostic QA has been "poorly" addressed so far.
But as it is important to this paper -- you should more clearly describe how exactly other systems have tried to address language/KB-agnostic QA
(and contrast it to your system if necessary and useful) -- If I missed it while reading, please point to it.
- On p2, you state "our approach can be directly used by end-users." .. How? End-user refers to people asking QA questions, or someone
applying your approach to their LD dataset? And, please add, for example, the github URL of WDaqua?!
- Style: The writing style is in general sometimes a bit too casual for my taste. For example the Abstract starts with
"*Thanks* to the development of the Semantic Web ..."
or the Introduction section starts with:
"Question answering (QA) is an *old* research field ... "
IMHO parts like these should formulated in a more formal way -- but I leave this to the other reviewers/the editor to decide.
Also I think in cases like in the last sentence of the abstract,
"To summarize, we provided a conceptional solution for multilingual, KB-agnostic Question Answering over the Semantic Web."
the use of present tense ("we provide") would be more appropriate.
- [Optional:] Discuss, if it is possible, how much effort is to be expected, when integrating this search functionality eg. into someones local
DBpedia/Wikidata endpoint or tool that works with these datasets ... If you think this question can be answered, and is helpful to the reader.
Typos:
p6 "create new training dataset" -> "create new training datasets"
p10 "multiple-languages" --> "multiple languages"
p10 "perform worst" --> "perform worse" ?
p12 "4 core of Intel .." --> "4 cores of Intel .."
p13 "an unified interface" --> "a unified interface"
|