Assessing deep learning for query expansion in domain-specific arabic infromation retrieval

Tracking #: 1871-3084

This paper is currently under review
Wiem Lahbib
Ibrahim Bounhas
Yahya Slimani

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
In information retrieval (IR), user queries are generally imprecise and incomplete, which is challenging, especially for complex languages like Arabic. IR systems are limited because of the term mismatch phenomenon, since they employ models based on exact matching between documents and queries in order to find the required relevance scores. In this article, we propose to integrate domain terminologies into Query Expansion (QE) process in order to ameliorate Arabic IR results. Thus, we investigate different semantic similarities models: word embedding, Latent Semantic Analysis (LSA) and probabilistic graph-based. To evaluate our approaches, we conduct multiple experimental scenarios. All experiments are performed on a test collection called Kunuz, which documents are organized through several domains. This allows us to assess the impact of domain knowledge on QE. According to multiple state-of-the art evaluation metrics, results show that incorporating domain terminologies in QE process outperforms the same process without using terminologies. Results also show that deep learning-based QE enhances recall.
Full PDF Version: 
Under Review