A linguistic method for extracting production rules from scientific texts: Evaluation in the specialty of gynecology

Tracking #: 1782-2995

This paper is currently under review
Amina Boufrida
Zizette Boufaida

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
Due to the considerable increase in freely available data (especially on the Web), extracting relevant information from textual content is a critical challenge. Most of the available data is embedded in unstructured texts and is not linked to formalized knowledge structures such as ontologies or rules. A potential solution to this problem is to acquire such knowledge through natural language processing (NLP) tools and text mining techniques. Prior work has focused on the automatic extraction of ontologies from texts, but the acquired knowledge is generally limited to simple hierarchies of terms. This paper presents a polyvalent framework for acquiring complex relationships from texts and coding these in the form of rules. Our approach begins with existing domain knowledge represented as an OWL ontology and applies NLP tools and text matching techniques to deduce different atoms, such as classes and properties, to capture deductive knowledge in the form of new rules. We evaluated our approach by applying it in the medical field, specifically, the specialty of gynecology, showing that our approach can automatically and accurately generate SWRL rules for the representation of the more formal knowledge that is necessary for reasoning.
Full PDF Version: 
Under Review