Distributional methods for extracting common sense knowledge by ranking triples according to prototypicality

Tracking #: 1713-2925

This paper is currently under review
Soufian Jebbara
Valerio Basile
Elena Cabrio1
Philipp Cimiano

Responsible editor: 
Guest Editors ML4KBG 2016

Submission type: 
Full Paper
In this paper we are concerned with developing information extraction models that support the extraction of common sense knowledge from unstructured datasets. Our motivation is to extract manipulation relevant-knowledge that can support robots’ action planning. We frame the task as a relation extraction task and, as proof-of-concept, validate our method on the task of extracting two types of relations: locative and instrumental relations. The locative relation relates objects to the prototypical places where the given object is found or stored. The second instrumental relation relates objects to their prototypical purpose of use. While we extract these relations from text, our goal is not to extract specific mentions, but rather, given an object as input, extract a ranked list of locations and uses ranked by ‘prototypicalyity’. We use distributional methods in embedding space, relying on the well-known skip-gram model to embed words into a low-dimensional distributional space, using cosine similarity to rank the various candidates. In addition to using embeddings computed using the skip-gram model, we also present experiments that rely on the so called NASARI vectors, which rely on disambiguated concepts to compute embeddings and are thus semantically aware. While this distributional approach has been published before, we extend our framework by additional methods relying on neural networks that learn a score to judge whether a given candidate pair actually expresses a desired relation. The network thus learns a scoring function using a supervised approach. While we use a ranking-based evaluation, the supervised model is trained using a binary classification task. The resulting score from the neural network and the cosine similarity in the case of the distributional approach are both used to compute a ranking. We compare the different approaches and parameterizations thereof on the task of extracting the above mentioned relations. We show that the distributional similarity approach performs very well on the task. The best performing parameterization achieves an NDCG of 0.913, a Precision@1 of 0.400 and a Precision@3 of 0.423. The performance of the supervised learning approach, in spite of having being trained on positive and negative examples on the relation in question, is not as good as expected and achieves an NCDG of 0.908, a Precision@1 of 0.454 and a Precision@3 of 0.387, respectively.
Full PDF Version: 
Under Review