Review Comment:
I would like to thank authors for their effort to try and improve the paper. Unfortunately, I still have issues with major aspects of the paper.
(1) originality,
The paper presents original work in image segmentation combining a spatial reasoner (RCC-8 based) with a deep neural network (CAE). The reasoner assists by augmenting classification data with three additional channels feature channels: shadow, elevation, and some notion of inconsistency. The paper claims to present the first system applied to image segmentation that explains the error of the classifier using a spatial reasoner. Authors also claim to be the first to close the loop between the reasoner and the classification.
(2) significance of the results
My main problem with the paper is to acknowledge the results as significant. The approach is promising but in the end its outcome in this particular paper and in this particular experimental result is shadow detection. The full power of adding knowledge is not investigated and the results are on the level of proof of concept. The paper still needs to improve and provide stronger evidence for the power and performance of the proposed system.
Importantly, I would not be able to reproduce the results in this paper as many crucial details are left out especially about the part that is most important (feature augmentation).
I am particularly unsure about the results. One of the feature channels elevation could just be added directly (without reasoner). Whether results are generated by this feature or other features (shadow) that do come from the reasoner is not clear to me.
(3) quality of writing
The paper is easy to follow and understand.
(4) General comments
a) Explanation
Comments on the previous version with respect to explanation have been addressed by the authors
b) Related Work. Discussion of existing methods was expanded and such comments on the previous version addressed.
c) Details are missing from Algorithm 1 (Section 3).
Authors addressed comments on the previous version of the paper. But there are also some additional issues.
Authors mention that additional constraints were added to OntoCity (Section 3.4.1). What is the impact of those constraints?
How are regions computed? What is the "geometrical process"?
Is your classification an argmax following the final softmax layer? If so, does low certainty refer to the output of the softmax layer? How are probabilities (or whatever the certainties are) aggregated for regions?
I have a hard time understanding how the channels are added and encoded (except for elevation). Shadow is added as a binary signal? I do not understand the third channel. What is the definition of "suspicious" and what are the values of the third channel.
There are three channels added to the input of which only the shadow channel is coming from the reasoner, correct? In my understanding, elevation can be added without using the reasoner. If that is the case, you need to show that the actual gain in accuracy comes from the features for which you actually need a reasoner. Otherwise, your results are possibly simply achieved by adding elevation as a feature.
d) Section 4
Comments for the previous version are addressed.
How does testing work. In training you have an iteration of region identification and then adding three channels, followed by retraining. How is this done in testing? In principle in testing you also need to first identify the regions and then augment pixels, correct? When do you stop iterating in testing?
Minor comments
It seems you are using segment,region,area interchangeably. I find that confusing and would personally prefer a single term throughout.
There are some minor issues with long sentences. Maybe try and cut up sentences with too many adjuncts.
There are also a few issues with determiners (imho).
Algorithm 1: maybe use \operatorname to avoid typo layout errors
Algorithm 1: there is an issue with tabs and layout
|