Vecsigrafo: Corpus-based Word-Concept Embeddings - Bridging the Statistic-Symbolic Representational Gap in Natural Language Processing

Tracking #: 2074-3287

This paper is currently under review
José Manuel Gómez-Pérez
Ronald Denaux

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
The proliferation of knowledge graphs and recent advances in Artificial Intelligence have raised great expectations related to the combination of symbolic and distributional semantics in cognitive tasks. This is particularly the case of knowledge-based approaches to natural language processing as near-human symbolic understanding rely on expressive, structured knowledge representations. Engineered by humans, such knowledge graphs are frequently well curated and of high quality, but at the same time can be labor-intensive, brittle or biased. The work reported in this paper aims to address such limitations, bringing together bottom-up, corpus-based knowledge and top-down, structured knowledge graphs by capturing as embeddings in a joint space the semantics of both words and concepts from large document corpora. To evaluate our results, we perform the largest and most comprehensive empirical study around this topic that we are aware of, analyzing and comparing the quality of the resulting embeddings over competing approaches. We include a detailed ablation study on the different strategies and components our approach comprises and show that our method outperforms the previous state of the art according to standard benchmarks.
Full PDF Version: 
Under Review