Review Comment:
This paper presents a survey on the knowledge-based computer vision, especially on the problem of video analysis with prior knowledge. By given a detailed introduction of the video analysis using the technologies of semantic web, the authors intend to provide the researchers of computer vision community some useful information to utilize the prior knowledge in the form of semantic web. The paper starts with presenting a quick overview of the ontology languages of semantic web and work of symbolic reasoning in the field of semantic web. Subsequent sections discuss many relevant work of the applications of semantic web and the challenges in the field of video analysis.
The paper provides a survey on the using of ontologies for video analysis. But the organization of the paper is not very well, the analysis of the relevant works is lack of comparison and so on. In the following, I will provide some indications of how to revise the paper.
#1 In the first section, the authors classify the video analysis into three levels, i.e., low-level, mid-level and high-level video analysis, by the criterion: the depth of the analysis of the video, but section 4 gives four classifications of different works. So, which level should “Video retrieval applications” and “Multimedia visual content annotation” be classified into? This is an inconsistence of the organization of the paper.
#2 This paper just focuses on the application of SW in video analysis. It is interesting to look back toward the beginning and see which of the original ideas have blossomed in computer vision filed rather than confining to video analysis.
#3 The authors claim that “this is the first work focusing on semantic web technologies applied to video analysis problems”. However, there has been some related works which have presented a survey on this field-Surveillance analysis and multimedia retrieval [1,2]. So, the authors should revise their claim of the paper.
#4 In section 2, the paper gives an overview of semantic web. It would be helpful to the readers if the authors can provide some references of relevant papers on applying the technologies of SW to computer vision.
#5 This paper should give a summarization about the usage of reasoning or ontology in video analysis, such as a flowchart which provides a highly visual and easily understood way of representing the system's flow of logic. For each method introduced, the authors should discuss more about how ontology and reasoning are used in computer vision, it would be good if some examples are given here.
#6 A question to the authors: what is the main difference between the image retrieval and video retrieval? If the authors consider that the video retrieval is a conceptual extension of image retrieval into the video domain, then I think a lot of knowledge-based works of traditional image fields should be considered in this survey.
#7 For some of the reviewed papers, it was not clear whether the proposed method would perform better than the alternative ones. That is to say, different classification of relevant papers should give a detailed comparison to show the motivation, the technology, the results and so on. A classification of usage-types, aims, and purposes would be very helpful here.
#8 A fair comparison of methods under similar circumstances with the traditional methods, e.g., [7] has been virtually absent. Give comparison with these traditional works will help to verify the effectiveness of the knowledge-based methods.
#9 Some references in the references section are incomplete. Some works have not been mentioned in this paper, e.g, [3-9]
Overall, I think that the authors should revise the paper to provide a comprehensive overview on knowledge-based video analysis.
1. .P. Kannan, P. Shanthi Bala, and G. Aghila. A comparative study of multimedia retrieval using ontology for semantic web. In Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on, pages 400–405, March 2012.
2. Chris Poppe, Gaëtan Martens, Pieterjan De Potter, and Rik Van De Walle. Semantic web technologies for video surveillance metadata. Multimedia Tools Appl., 56(3):439–467, February 2012
3. G.C. Stein, J. Rittscher, A. Hoogs, "Enabling video annotation using a semantic database extended with visual knowledge", Multimedia and Expo 2003. ICME '03. Proceedings. 2003 International Conference on, vol. 1, pp. I-161, 2003.
4. Cees G. M. Snoek, Bouke Huurnink, Laura Hollink, Maarten de Rijke, Guus Schreiber, Marcel Worring, "Adding Semantics to Detectors for Video Retrieval", Multimedia IEEE Transactions on, vol. 9, pp. 975-986, 2007, ISSN 1520-9210.
5. A. Hoogs, R. Collins, "Object Boundary Detection in Images using a Semantic Ontology", Computer Vision and Pattern Recognition Workshop 2006. CVPRW '06. Conference on, pp. 111-111, 2006.
6. Wang D, Song D. Video Captioning with Semantic Information from the Knowledge Base[C]//Big Knowledge (ICBK), 2017 IEEE International Conference on. IEEE, 2017: 224-229.
7. Rohrbach, Marcus. "Attributes as semantic units between natural language and visual recognition." Visual Attributes. Springer, Cham, 2017. 301-330.
8. Xiao-Yong Wei, Chong-Wah Ngo, Yu-Gang Jiang, "Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces", Multimedia IEEE Transactions on, vol. 10, pp. 1085-1096, 2008, ISSN 1520-9210.
9. Yu Y, Ko H, Choi J, et al. End-to-end concept word detection for video captioning, retrieval, and question answering[C]//Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017: 3261-3269.
|