FarsBase: The Persian Knowledge Graph

Tracking #: 2096-3309

Majid Asgari Bidhendi
Ali Hadian
Behrouz Minaei-Bidgoli

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Full Paper
Over the last decade, extensive research has been done on automatic construction of knowledge graphs from Web resources, resulting in a number of large-scale knowledge graphs such as YAGO, DBpedia, BabelNet, and Wikidata. Despite that some of these knowledge graphs are multilingual, they contain few or no linked data in Persian, and do not support tools for extracting knowledge from Persian information sources. FarsBase is the first Persian multi-source knowledge graph, which is specifically designed for semantic search engines to support Persian knowledge. FarsBase uses a diverse set of hybrid and flexible techniques to extract and integrate knowledge from various sources, such as Wikipedia, Web tables and unstructured texts. It also supports entity linking, which allows integration with other knowledge graphs. To maintain a high accuracy for triples, we adopt a low-cost mechanism for verifying candidate knowledge by human experts, where the candidates for human verification are prioritized using different heuristics. FarsBase is being used as the semantic-search system of a Persian search engine and efficiently answers hundreds of semantic queries per second.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mozhdeh Gheini submitted on 25/Jan/2019
Minor Revision
Review Comment:

This full paper is the revised version of an earlier manuscript in which the authors introduced FarsBase, a knowledge graph constructed in Persian.
In their revision, authors address the concerns mentioned in the reviews to a very large extent. The quality of the related work section and the writing has improved significantly. However, few grammatical mistakes, as exemplified below, remain in the text:
Introduction, paragraph 2: The past decade have witnessed --> The past decade has witnessed
Page 3, right column, at the top: an alternative approaches --> an alternative approach or alternative approaches
Page 4, right column: do not worth extraction --> are not worth extraction
Hence, I'm voting for a minor revision.

Review #2
Anonymous submitted on 08/Feb/2019
Review Comment:

Thanks to the authors for the major revision, I think now the paper has more structure and scientific language,
also, I am impressed by the demo:http://farsbase.net/search/html/index.html
I would recommend to put the link of the demo and maybe the rest (e.g., the link of the endpoint) in the abstract.
However, I strongly request authors to make another round of language checking some minor stuff are missing such as:

use the correct abbreviation throughout the paper e.g., YAGO
the tense of the language should be simple present or simple past, you often use will e.g., Fanbase will...
sometimes the opening or closing parenthesis is missing
there should be a space between the last word and references
provide a reference for the transformer
would be great if you enhance the quality of tables and figure

Review #3
Anonymous submitted on 10/Feb/2019
Minor Revision
Review Comment:

(1) originality:
Authors proposed a knowledge base construction framework targetted to a specific language named Parsi. They discuss the current SOTA tools for Parsi and the challenges associated with them.

(2) significance of the results:

Raw text classifier:
Authors mentioned that there are four kinds of RTEs.
But, only in the section 5.3.5 Dependency patterns, they report “Using these patterns, we extracted 240320 triples from Wikipedia articles.”
How many triples were extracted by other types of RTEs - how useful were the others?

(3) Quality of writing:
i) Verbosity can be decreased to make it easy and quick for readers to follow along.
For instance, consider the contributions listed in page 1-2, and challenges on page 3: The list items could have been succinctly enumerated.
Suggestion: Generally, the first sentence of each list item shall be the main argument of the list item. Keep away from going into full details there itself, maybe mention the section number for additional info if needed.

ii) Coherency can be improved to make it less confusing for the readers to follow along:
For instance, consider section 5.3 on page 10
Authors say “FarsBase has four modules for raw-text triple extraction, ..... In the following, we briefly describe how different RTE methods pre-process and extract triples in FarsBase.”
But the immediate subsections are 5.3.{1,2} and which goes into “Entity linking” and "Coreference Resolution". Many readers will be confused to read about entity linking when they expected to know about rule-based extraction there.
Suggestion: Either 5.3.1 and 5.3.2 should be moved down to a later part of the story, or earlier where preprocessing is mentioned.