Towards a Question Answering System over the Semantic Web

Tracking #: 1922-3135

Authors: 
Dennis Diefenbach
Andreas Both
Kamal Singh
Pierre Maret

Responsible editor: 
Axel Polleres

<
Submission type: 
Full Paper
Abstract: 
With the development of the Semantic Web, a lot of new structured data has become available on the Web in the form of knowledge bases (KBs). Making this valuable data accessible and usable for end-users is one of the main goals of question answering (QA) over KBs. Most current QA systems query one KB, in one language (namely English). The existing approaches are not designed to be easily adaptable to new KBs and languages. We first introduce a new approach for translating natural language questions to SPARQL queries. It is able to query several KBs simultaneously, in different languages, and can easily be ported to other KBs and languages. In our evaluation, the impact of our approach is proven using 5 different well-known and large KBs: Wikidata, DBpedia, MusicBrainz, DBLP and Freebase as well as 5 different languages namely English, German, French, Italian and Spanish. Second, we show how we integrated our approach, to make it easily accessible by the research community and by end-users. To summarize, we provide a conceptional solution for multilingual, KB-agnostic question answering over the Semantic Web. The provided first approximation validates this concept.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Gerhard Wohlgenannt submitted on 30/Jul/2018
Suggestion:
Accept
Review Comment:

All comments of my original review have been addressed, not always to 100% satisfaction on my side, but sufficiently.
Therefore, I suggest to accept the paper.

Review #2
By Svitlana Vakulenko submitted on 06/Aug/2018
Suggestion:
Accept
Review Comment:

The authors sufficiently addressed the issues raised by the reviewers.

Review #3
By John McCrae submitted on 02/Sep/2018
Suggestion:
Minor Revision
Review Comment:

This paper presents a new system for question answering over linked data, which focuses on ease of adaptation to new languages and new datasets. This is an admirable goal, however my reservation of this is that the approach seems not to be greatly novel. My understanding of this is that the approach involves an over-generation of queries based on using known lexicalizations of properties (e.g., rdfs:label) in the datasets, followed by a 5-feature ranking procedure. This seems like a good generic procedure, however the choice not use syntactic tools such as parsers seems to affect performance notably.

Moreover, the authors propose this as an approach that is easily adapted to new languages, as it does not rely on a syntactic parser, however this does not necessarily make the approach more adaptable to new languages, and as the authors themselves note, performance on even major languages such as Italian quickly drops due to the lack of labels for terms. Moreover, the lack of syntactic analysis may make the system perform much worse on tricky questions (for example those involving negation) as shown in Section 5.2.1. (This was more clearly presented in the first version of this paper)

The evaluation is presented mostly based on recent QALD benchmarks and the results are quite mixed, even as presented. In fact for some benchmarks, there is quite a difference between the existing state-of-the-art (F=0.72) and this proposed system (F=0.52) underlining the difference with approaches that use more linguistic analysis. I was surprised that some pubilshed results on some benchmarks are omitted, for example QALD-7 reports systems with F=0.75, three times the reported value here and so it is unclear why they are not included in Table 3. It would be best if the authors tried to include some of these existing systems in their benchmark.

(p5) "weights were determined manually" could you expand on this and perhaps provide a more principled reason for the selection of weights?

(p7) On HDT versus traditional databases. I wonder if this could be quantified... e.g., by implementing a similar search using a SPO style triple store.

The paper is very well-written in terms of language and quite clear, there were only a few minor issues

p5. "examplary" => "example"

Tables should remain inside the margins, especially Table 3 should use the full page width not flow onto the next page. In Table 6 the text crosses over the column line.
"reefication" => "reification"