Review Comment:
Summary of content:
This paper presents an open source framework for real-time semantic social media analysis, which is highly scalable and can be used both for off-line processing and live processing of social media data. The framework consists of several components including semantic annotation, semantic search, dynamic result aggregation and visualization. The complete framework was used for two different use cases and provides information visualization interface to show cooccurrence matrices, term clouds, tree-maps and choropleths.
#############
Review dimensions:
- Quality: This paper is of good quality. The authors start with a description of the framework and its components, and then explain the practical use of the tool through two case studies.
- Importance: The tool presented in this paper has its importance since social media is a huge information source that many real-life applications could rely on.
- Impact of the tool: The tool is based on GATE, a widely used, open source framework for Natural Language Processing (NLP) and it can perform all the steps in the analytics process including collection, semantic annotation, indexing, search and visualization. Since social media content has its unique nature, such as fast-growing, highly dynamic and high volume, reflecting the ever-changing language used in today’s society, and the current societal views and sentimental fluctuations of the authors, existing NLP tools have their limitations to deal with social media data. This tool provides numerous components, which are either specifically designed for social media analysis or adapted from previously developed tools for general usage, and integrate these components into a framework.
- Clarity, Illustration & Readability: This paper is well written and easy to follow.
#############
Overall assessment:
Overall, the important contribution of the paper is that it showcases various components of such a big framework with a clear description and citations. Most of the presented work can be reproducible with minimal effort and applied to real-world problems thus demonstrating its usefulness. The article is well written and showcases almost all the important components used in the framework with enough citations and examples. It can be accepted with minor revisions if the suggestions listed below are considered.
#############
Brief comments and suggestions by section:
- Section 1: Although the authors title this 'social media analysis' in general, this section highlights only one social media platform (i.e. Twitter). It will be interesting to the reader if the section extends problems from other existing social media platforms (e.g. Facebook, Blogs, Reddit etc) and motivates how such a framework can be leveraged to solve issues.
- Section 2: (1) Link to Twitter hosebird should be added; (2) Isn't 50 tweets/second trivial for realtime analysis? (3) No batch processing of tweets? (4) It would be helpful to the reader, if an example of Mimir columns are presented.
- Section 3.2: Showing a query beyond simple textual queries would be interesting.
- Section 6: It would be nice to include an evaluation or user study regarding the discussed scenarios of the analysis of the 2015 UK general election and the investigation of attitudes towards climate change as a whole. It would be easier to read if the results are presented in a table instead of directly embedded in the text.
- Section 7: As a paper on tools and systems, a comparison of described framework with other tools dealing with social media content is expected. As there are plenty of tools for social media monitoring available, the differences of the presented tool in this paper from others should be clarified.
#############
General issues:
- Most of the links to toolkits are embedded into the text. It’s easier to read if they are presented as footnotes.
- A running example of a tweet should be used to understand how each component in the framework can be applied to it.
|