Review Comment:
This manuscript presents a meta analysis of neuro-symbolic methods in NLP, with the goal of determining whether and to what extent such methods fulfill their intended goals. The goals discussed are out-of-distribution generalization, interpretability, reduced amount of training data, transferability to new tasks, and reasoning. The article analyzes the papers from different perspectives including the task, type of learning, format of symbolic knowledge, combination method, etc. The conclusion is that there is little work on neuro-symbolic methods in NLP, most of which incorporate symbolic knowledge in a shallow way (i.e. embedding it into the network), and that most methods don’t deliver results on these 5 goals. However, the few models that combine the neural and symbolic components in more interconnected ways perform better on most goals.
This was a very interesting read and I appreciate the formalization of terminology in neuro-symbolic NLP. The findings were along the lines of what I intuitively expected, so it’s nice to have this manuscript provide empirical evidence for this intuition. The main limitations of this paper are the small number of papers analyzed, and in sometimes failing to analyze the findings and presenting the data as is (see details below).
Dimensions of survey papers:
(1) Suitability as an introductory text - the manuscript was easy for me to read but for the sake of a reader knowing little about neuro-symbolic methods, it may be worth looking in depth into 2-3 papers from the analysis, including a figure showing the task, symbolic knowledge, neural component, and combination type.
(2) How comprehensive and how balanced is the presentation and coverage - unfortunately, a small number of papers was analyzed. I don’t know if this is because the selection criteria was too restrictive or because there is indeed very little work on neuro-symbolic NLP. I believe it’s mostly due to the former.
(3) Readability and clarity of the presentation - overall was very good. The only comment I have is that some of the graphs and tables provide raw data which the text doesn't analyze in depth. It would be good to mention in a few sentences even if there is no signal/findings in the particular experiment.
(4) Importance of the covered material to the broader Semantic Web community - neuro-symbolic NLP may have implications for reasoning over ontologies and knowledge graphs and information retrieval across the web.
(5) Supplementary code and data - looks complete and organized.
Specific comments:
1. Page 2, line 34 - it may be worth mentioning that Kahneman himself sees this as a misunderstanding of the systems, as he explained in the Montreal AI Debate 2020.
2. Page 4, reduced size of training data - the distinction between pre-training data and fine-tuning data is important (even the argument holds for both).
3. Section 2 - please include in the appendix the list of venues. In particular, I looked at the supplementary code and couldn’t find many mentions of ACL. I was surprised by the ratio of conferences and journals, since most NLP work is published in conferences.
5. Section 3.1.1 - why is sentiment analysis separate from text classification? Isn’t it a form of text classification?
6. Section 3.1.1 - isn’t KB completion by definition neuro-symbolic (assuming the model is neural)? I.e. the training data is symbolic.
7. Section 3.1.1 - If I understand correctly, linguistic structure is considered as a symbolic element. Did you search for specific terms to find works incorporating linguistic structure? I think it would be interesting to elaborate on that line of work as opposed to most (I assume) other works that rely on knowledge graphs etc.
8. Section 3.1.1 - in the last paragraph, the discussion about supervised neural models gaming tasks should include citations to relevant papers. For example, about image captioning [Szegedy et al., 2015], visual question answering [Agrawal et al., 2016], reading comprehension [Jia and Liang, 2017], and natural language inference [Poliak et al. 2018; Gururangan et al. 2018].
9. Page 15, line 26 - what about the transformer architecture?
Minor comments and typos:
- Page 2, line 44 - a lot intuitive sense -> a lot of intuitive sense
- Page 3, line 21 - missing word “hand”
- Page 6, line 48 - one -> uni
- Fig 6 (a) is missing
- Page 9, line 44 - one-to-one -> many-to-one
- Table 4 appears too early, a few pages before it is referred to
- Figure 18 - are the columns the proposed terms?
- Page 18, line 14 - discussed Section -> discussed in Section
- Page 18, line 19 - missing closing brackets
- Page 23, line 11 - agree -> agree on
References:
[1] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
[2] Aishwarya Agrawal, Dhruv Batra, and Devi Parikh. Analyzing the behavior of visual question answering models. In Proceedings of the 2016 Confer- ence on Empirical Methods in Natural Language Processing, pages 1955–1960, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1203. URL https://aclanthology.org/D16-1203.
[3] Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehen- sion systems. In Proceedings of the 2017 Conference on Empirical Methods in Nat- ural Language Processing, pages 2021–2031, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1215. URL https://aclanthology.org/D17-1215.
[4] Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Ben- jamin Van Durme. Hypothesis only baselines in natural language inference.
References 13
In Proceedings of the Seventh Joint Conference on Lexical and Computa- tional Semantics, pages 180–191, New Orleans, Louisiana, June 2018. As- sociation for Computational Linguistics. doi: 10.18653/v1/S18-2023. URL https://aclanthology.org/S18-2023.
[5] Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bow- man, and Noah A. Smith. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2017. URL https://aclanthology.org/N18-2017.
|