Review Comment:
The article presents an overview of approaches from the intersection of two fields: Semantic Web and Human Computation. It analyses the approaches dealing with two questions: 1) how HC influences the SW, 2) how the SW influences HC. The problem itself is very interesting and emerging for both of the fields. The survey has a potential to be a good introduction to the intersection of those areas, and it is very relevant the Special Issue.
The authors provide a good motivation and background information to the problem in Section 1 and 2. The list of the approaches leveraging HC for the SW is comprehensive, and they are classified in multiple dimensions (task, genre, genome, etc.) in Sections 3 and 4. Following, the authors compare five of the systems in a more detailed way in Section 5. Section 6 aims to present the Semantic Web approaches facilitating Human Computation; this a bit vague, and should be extended. In Section 7, the authors attempt to highlight the prospective challenges and the research questions.
While I believe the article can provide a very good introductory text for new researchers at the intersection of the two fields, there is still a lot of work to be done (details below) for this paper to be ready for publication. Overall, the article is well structures at the top level, with some pitfalls at the lower levels (some sections are very short; details below). It is also very hard to digest, the sentences are quite long, and it is hard to follow the idea, so readability and clarity should be improved. There are many missing references, statements and assumptions without any support like a reference or an example (details below). In addition, there are assumptions made about the background of the reader that need be addressed; the reader is expected to know the surveyed approaches (details below). Moreover, some editorial work is also needed, as there are typos, capitalization of names issues, naming inconsistency (details below). The list of references has also a lot of incorrect characters. From my point of view, a major revision is required.
To improve the article, the authors can follow the detailed comments mentioned below, as well as work on similar issues throughout the article, due to the repetitiveness of many pitfalls I did not enlist all of them them here. The authors could also try to make the style lighter, so that an early PhD student does not get discouraged reading the survey.
Strong points:
S1) Has a big potential to become a good introductory text to get started on the covered topic.
S2) The survey appears to be quite comprehensive and suitable for the special issue.
S3) The approaches are classified along multiple dimensions.
Week points:
W1) Very hard to digest, convoluted sentences.
W2) Many missing references and statements without any support.
W3) Very demanding for the reader, it has to know the surveyed approaches to understand the points made by the authors.
W4) Some subsubsections are very short (one sentence, 3-5 lines). Section 6 is vague.
W5) Many typos.
Detailed comments
missing references, statements and assumptions without any support
Section 1.1
- Semantic technologies have been deployed in the context of a wide range of information management tasks, for which machine driven algorithmic techniques aiming at full automation do not reach a level of accuracy and reliability to ensure usable systems.
-- A source (reference) of this information is needed.
- Researchers have started augmenting automatic techniques with human computation capabilities in an effort to solve the inherent problems.
-- A source (reference) of this information is needed. Some examples would make it more credible.
- The challenge for the semantic web community, is to rethink the original semantic web vision, which was largely built on the vision of computers populating the web of machines[10].
-- It is unclear if the SW vision comes from the paper of A. Bernstein [10], or there is a missing reference to the original vision [2] by Tim Berners-Lee, James Hendler, and Ora Lassila.
- The entrance barrier for many semantic applications is said to be high, given the dependence on expertise in knowledge engineering, logics and more. In short, semantic web lacks the sufficient user involvement in various aspects.
-- A source (reference) of this information is needed.
- Semantic web research can be seen as experiencing a shift from increasingly expert driven to one embracing the larger community and the users involved in the semantic content creation process.
-- A source (reference) of this information is needed.
- Two major genres of research may be seen emerging in the last few years, in an attempt to bring human computation methods to the semantic web:
-- A source (reference) of this information is needed.
- While the potential is clearly evident in going about such a synergy, effectively realizing the synergy of semantic web and human compution will bear its own set of challenges.
-- A reference supporting the statement is needed.
Section 1.2
- As the primary focus, we analyze how the semantic web domain has adopted the dimensions of human computation to solve the inherent problems.
-- What kind of “inherent problems”? Reference, explanation, and examples are needed.
- the two most common genres in human computation namely Games With A Purpose (GWAP) and Micro-Task Crowdsourcing
-- References and explanations for the terms are needed. The explanations can as well be provided in Section 1.1.
- Recent research in crowdsourcing and semantic web has also seen the emergence of some workflow systems designed to meet the need of providing a generic framework for automating human-machine computation workflows.
-- Reference is needed. Which of “recent research”?
Section 2.11
- The problems fit the general paradigm of computation
-- -- Reference is needed. What is “the general paradigm of computation”?
Section 2.2
- Tim Berners-Lee envisioned a ’semantic web’
-- Reference is needed. [2]
- Tremendous amount of data is published on the Web according to the linked data principles.
-- More recent works focusing on such statistics are available [5, 7, 8].
Section 2.3.1
- A variety of tools are available
-- References and examples are needed.
- The notion of achieving an automated process of ontology evaluation generic enough to be applied across domains is hardly feasible
-- A reference supporting the statement is needed.
Section 2.3.2
- However the seamless consumption and integration of linked open data is challenged by the several quality issues and problems that the linked data paradigm is facing. As researchers remark, many of these quality issues are not possible to be fixed automatically rather, require manual human effort.
-- A reference supporting the statement is needed. Some specific examples would also be helpful.
- the LOD tends to emphasize the relationships and links between the entities, rather than classification of entities
-- A reference supporting the statement is needed.
Section 2.4
- After more than a decade of semantic web research, researchers remain challenged by the large scale adoption of the semantic technologies.
-- A reference supporting the statement is needed.
- content cannot be created automatically but requires to a significant degree, human contribution
-- A reference supporting the statement is needed.
- Research clearly indicates that combining human computation and semantic web is of mutual benefit to both domains
-- A reference supporting the statement is needed. Which research clearly indicates that? Why is it so clear?
Section 3.2.1
- Often, a common practice to allow assignments of the same task to multiple workers. Therefore the results may be aggregated using majority voting or other sophisticated techniques such as a probability distribution or by taking into account some estimate of the expertise and skills of the works.
-- Some references and examples are needed.
Section 3.1.3
- There are tradeoffs to both approaches.
-- What kind of tradeoffs? Who described them? Not specific enough. References are needed.
assumptions made about the background of the reader
Section 1.1
- 1) Mechanized Labour and 2) Games with a Purpose for the Semantic Web.
-- Assumption that the reader is familiar with those terms. A definition and a references are needed.
Section 1.2
- collective intelligence genome
-- A definition and a references are needed. The reference is provided too late.
Sections 3, 4, and 5
The authors talk about the surveyed approaches, but they assume the reader knows all of them. The reader has to read and understand all of the papers presented in the survey, before actually reading the survey. It would be helpful to have them briefly described in the article. Some level of technical details of the approaches could be provided.
Section 3.2.1
- PROTON ontology
-- An assumption the reader is familiar with the ontology. A reference and explanation are needed.
Section 5.2.4
- The specific research questions that the authors attempt to address
-- An assumption the reader knows the “research questions”. Details needed.
Section 5.2.5
- CrowdMap relies on CrowdFlower aggregation methods and uses Precision and Recall measures for evaluating the results.
-- Hard to understand if we do not know presented work. Details needed.
- Qualification questions were employed however according to [70] these did not affect the results much.
-- What kind of “qualification questions”. Again if we do not know the cited paper [70], it is difficult to understand.
typos, capitalization of names issues, naming inconsistency
Section 1.1
- computers or machines
-- What is the difference, how do you define the terms?
- semantic Web
--The authors use several ways to capitalize the name in the article: “semantic Web”, “semantic web”, “Semantic Web”.
First page, right column, there is a dot moved to the new line.
- ’The Global Brain Semantic Web’
-- Incorrect quotation marks [1] page 65. All quotation marks in the articles are incorrect.
- and the fields of the like such
-- Probably a missing word.
- compution
-- “computation” (?)
Section 2.1.2
- footnotes 1, 2, and 3 point to the same webpage, it seems it is by mistake
Section 2.2.2
I feel like the term of Linked Open Data is used in exchange with Linked Data, whereas it is not exactly the same [2,3, 4, 5, 6]
Section 2.3.1
- three key stages of Ontology Engineering stages described
-- One of the “stages” is probably redundant.
- Semantic Annotation Automation
-- a missing semicolon
Section 2.3.2
- doesnot
-- does not
Section 2.4
- useful semantic content as this content cannot
-- Probably one of the “content” is redundant.
- but requires to a significant degree, human contribution
-- significant degree of human (?)
Section 5.2.2
- illustrate some 14 distinct annotation
-- It looks like a typo, either “some” or “14”.
Section 5.2.3
- restrict ourselves to to combine
-- redundant “to”
- Dealing with Motivational, Cognitive and Error Diversity: Because people are involved
-- looks like a typo or some missing/redundant words
other
Some subsections are very short, it might be worth to either write more there or merge them. Sections: 2.11; 7.1.13; 7.2.1; 7.2.2; 7.2.5
Section 3.2.1
Some of the elements are described in a very vague way: Verification of Domain Relevance, Annotation of Text and Multimedia, Annotation of Web Content, Domain Specific Vocabulary and Relation Building
- It is obvious from Table 1 and Figure 5
-- Why is it obvious? The statement is quite strong, yet not specific enough.
Section 3.2.2
All of the elements described in the section could be more detailed. It is difficult to understand without knowing the previous works.
Section 4.3
- oftentimes
-- “The adverb oftentimes is an unnecessary variant of often. While using it is not an error, exactly, the word always bears replacement with the shorter word.” [9] There are more words like this one in the article.
Section 5.2.1
- some automated decision making is applied using probabilistic models to reduce the candidate mappings that need verification from the crowd
-- What kind of models? What “decision making is applied”? It is not specific enough.
Section 5.2.4
Some parts are vague: ZendCrowd, CrowdLink, CrowdTruth.
Section 7.3
- The interleaving of human, machine, and semantics even have the potential to overcome some of the issues currently surrounding Big Data.
-- What kind of issues? Details needed.
Tables and Figures
Titles of all the tables and figures could be more descriptive (self-contained), it is hard to understand it without analyzing the text of the article.
References
Strange characters in the references. Some titles have incorrect capitalization.
References
[1] Zobel, Justin. Writing for computer science. Vol. 8. New York NY: Springer, 2004.
[2] Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web."Scientific american 284.5 (2001): 28-37.
[3] Bizer, Christian, Tom Heath, and Tim Berners-Lee. "Linked data-the story so far." Semantic Services, Interoperability and Web Applications: Emerging Concepts (2009): 205-227.
[4] Heath, Tom, and Christian Bizer. "Linked data: Evolving the web into a global data space." Synthesis lectures on the semantic web: theory and technology 1.1 (2011): 1-136.
[5] Bizer, Chris, Anja Jentzsch, and Richard Cyganiak. "State of the LOD Cloud."Version 0.3 (September 2011) 1803 (2011).
[6] http://lod-cloud.net/
[7] Auer, Sören, et al. "LODStats–an extensible framework for high-performance dataset analytics." Knowledge Engineering and Knowledge Management. Springer Berlin Heidelberg, 2012. 353-362.
[8] Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in different topical domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260.
[9] http://grammarist.com/usage/oftentimes/
[10] Bernstein, Abraham. "The Global Brain Semantic Web–Interleaving Human-Machine Knowledge and Computation." International Semantic Web Conference. 2012.
|