Review Comment:
When a classification algorithm has to classify an example of a class it has never seen before, this is called zero-shot learning. This paper proposes using knowledge graphs as a background knowledge for such classification system. The idea is that this background knowledge can be used to transfer information known about classes of seen examples, to the class of the unseen example.
Overall, I do like the idea of the paper. It does address an important problem and is suitable for the special issue it was submitted to. However, in its current state I do recommend a major revision. The main reason for this is that some important details are either missing, or I was otherwise not able to understand them from the text.
Also, reading the manuscript several questions popped up, which are not explained in the text, and of which I think the article would benefit from a discussion. I will provide a list of issues and questions below.
Note that I have not worked with zero shot learning prior to reviewing this paper. So, there might certainly be aspects that I do misunderstand, which are obvious for the authors of this work. However, as I would perhaps be a more typical reader of papers like this, I urge the authors to clarify the below issues, nevertheless.
Finally, the paper would benefit from a very thorough language review. Small errors in grammar and formulation do affect the fluency of the paper.
Main issues
===========
You do create your own Attribute Graph. It is unclear why that is needed, is DBPedia not sufficient? Are you unable to use it for some reason? How would your system work without that extra graph?
Also, would it be possible to get an insight in how much the coverage of the attributes affects the performance of the overall system?
You define the zero-shot learning problem very clearly. However, it should be noted that in any real system, it is not only important how it reacts on unseen classes, but also on the seen ones.
My point is here that if you were only to test on unseen classes, then you give the system an extra prior and it will never confuse with classes that have been seen before. So, your test cases should be a mix of examples from seen and unseen classes. Then, when reporting performance, these two have to be reported separately.
Also from your definition, it seems like a reasonable idea to also use a held out set to train specific parameters of your system. Did you do this?
I do not completely understand the architecture from the current description. The main point I do not get is the connection from your GCN-/graph attention part to the classifier. What is unclear is: what is the exact output of the module. Is it the classifier, as in parameters for a classifier, or a classification?
Then, I would understand this such that the system actually creates one binary? classifier for each of the nodes (classes) in the GCN. Then, how do you precisely select the classifier to use? Do you interpret weights as confidence levels?
In section 4.3.1, your rationale for using association rule mining is that "the searching space is often large for finding common attribute set"(sic). While this could be true in general, does this really apply to your case? Including some statistics on your particular datasets would be useful. Also, it is surprising you did not use the DBPedia ontology for this, as it does describe the domain and range of properties, which could have helped in this task.
In your datasets, the seen and unseen classes are split such that there is only one hop (in wordnet) from a seen to an unseen class. This does mean that you can be pretty certain that there is a good coverage for a one hop away class. How does this affect your system? Would it still work if the needed information is two, or maybe five hops away?
Related, It would be interesting to analyze the distance between the IMSC and the predicted class. I expect this to be one, most of the time (this is already hinted in table 7).
DBedia and your own graph with attributes is only used for 'after the fact' explanation. It seems to me that it would be even better to include these graphs into the actual classification model, which now only gets access to a much more limited taxonomy. To give an example, a graph might have the information that a dromedary has only one hump, while a camel has two. This would help a lot in transferring knowledge and and would be impossible to find from the wordnet tree.
Since the code for this submission does not seem available, it appears impossible to reproduce the results. Since the setup is pretty complex, the authors should provide a setup for easy reproduction.
Minor issues
============
In the abstract, you write "Transferring of deep features learned from training classes (i.e., seen classes) are often used, but most current methods are black-box models without any explanations, especially to people without artificial intelligence expertise". I am not sure what you mean here.
1. Other systems do provide explanations, but not accessible by non-specialists.
2. None of the systems provide explanations, and this mostly affects non-specialists.
Actually, you are not really stressing why these explanations are important.
p2l16 I think you argument about "but also disables the human-machine interaction which is important in machine learning model developing, configuration and debugging." Is rather weak.
p2l32 "Moreover, its method is ad-hoc, only working for predefined class attributes". This seems a reather weak argument. If I predefine all attributes I can find in DBPedia and some other sources, I can just use this, right?
Moreover, if the attributes are embedded into a latent space, one could claim that only one attribute is really needed (ignoring the issue of explainability.)
p2l43b "*extensive* experiments are conducted to evaluate the generated explanation and the ZSL learner, using *two* .. benchmarks". Extensive seems to contradict with just two benchmarks here.
You often refer to "common sense knowledge" without defining that clearly. If we look at some of the knowledge bases which claim to contain common sense knowledge, then the sources used in the current work come nowhere near.
While DBPedia spotlight has had its time, it is not really state-of-the-art any longer. Actually in that whole section 4.3.2 it is very unclear which parts are done automatic and which manual (if any at all).
In section 4.3.3, you generate text on only 10 random attributes if you have more than 10. This seems like an easy spot for improvement. It seems to make more sense to pick attributes that are clearly discriminative in comparison with other classes.
p12l12 You claim a graph of size 3969. How is that made from only 950 classes?
|