A Supervised Machine Learning Approach for Events Extraction out of Arabic Tweets

Tracking #: 1545-2757

This paper is currently under review
Mohammad ALSmadi
Omar Qawasmeh

Responsible editor: 
Guest Editors ML4KBG 2016

Submission type: 
Full Paper
Tweets hold rich amount of information about daily events, however they are noisy text, personalized and challenging to be understood by machines. Therefore, this research proposes a state-of-the-art supervised machine learning approach for extracting events out of Arabic tweets. The proposed approach focuses on four research tasks: Task 1: Event Trigger Extraction, Task 2: Event Time Expression Extraction, Task 3: Event Type Identification, and Task 4: Temporal Resolution for ontology population. Extracted event arguments using these tasks are used to populate an event ontology designed for this purpose. This ontology is used to feed a visualization tool (e.g. Calendar) representing live extracted events. The proposed approach was evaluated on a dataset of 2k Arabic tweets and evaluation results were promising. The approach performance was compared to an unsupervised rule-based approach from previous work using the same dataset. Results show that the proposed approach outperforms the unsupervised rule-based one in tasks T1: event trigger extraction (F-1= 92.6 vs. F-1= 78.7) and T2: event time expression extraction (F-1= 92.8 vs. F-1= 88.35), whereas is acting relatively worse in T3: event type identification (Accuracy= 80.1 vs. Accuracy= 95.9).
Full PDF Version: 
Under Review