Review Comment:
Given a pair of SKOS taxonomies, this paper introduces two metrics for assessing the additional information that can be imported through a linkset from one taxonomy into the other. In particular, the authors assume that a complete mapping is provided, and look at how well one taxonomy can complement the other when importing information through skos:exactMatch links. The two metrics in question are "reachability" and "importing". The former metric takes as input a set of "relevant" properties, a source taxonomy, a target taxonomy, and a skos:exactMatch linkset, and measures the ratio of objects in the target taxonomy that are reachable by (a) following the skos:exactMatch links and thereafter (2) following a given number of hops through the properties selected as relevant. The latter metric takes as input a property (e.g., skos:altLabel), a language tag (or wildcard), a linkset, and a source and target taxonomy, and defines the average ratio of increase in the number of values with that property and language tag for each node in the source taxonomy versus when complemented by the target taxonomy; for example, a score of 0.8 for a property skos:altLabel and language tag "en" would, in my understanding, mean that the number of skos:altLabel values found for nodes in the source taxonomy (including only those for which a skos:exactMatch link is present) increases on average by a factor of 5 after importing the values from target nodes of the links. The authors motivate and define these metrics and then present a "validation framework" that aims to evaluate the ability of these measures to indiciate the "completeness" of the complemented taxonomy. This validation framework takes an existing taxonomy (in this case GEMET), makes a copy of it (changing the namespace), generating a skos:exactMatch linkset between the corresponding nodes, and then applies a variety of modifiers that (i) delete nodes and links in the source thesarus to create paths, (ii) delete concepts in either thesaurus, (iii) delete links from the linkset. These modifiers can be combined and their parameters varied to create an array of test-cases, where each contains a source taxonomy, a target taxonomy, and an associated linkset. The original complete taxonomy then serves as a "gold standard" that can be used to gauge various measures of incompleteness for the test-case in question. These measures of incompleteness are compared with the metrics that the authors propose. The authors then present various results relating to how the operations for generating the test-cases and the metrics they propose correlate. The authors then present some results applying their measures to some real-world datatsets before presenting related works and concluding.
The paper deals (perhaps somewhat indirectly) with the important issue of Linked Data quality, and thus I believe it to be of relevance to the Special Issue. The authors motivate their work by stating that a lot of effort has been invested to interlink thesauri such as GEMET, EARTh, AGROVOC, EUROVOC, UNESCO, RAMEAU, TheSoz, but that it is unclear what precisely is the value of these linksets; it is not clear, for example, how these thesauri complement each other. I think this is a solid motivation for looking into metrics to assess the value of a linkset. Indeed, I quite liked the results presented in Figure 9 (and to a lesser extent Figure 10), which provides an interesting overview of the value of the linkset for how the thesauri complement each other with terms in different languages. There is certainly some practical merit to this line of work.
Overall, however, I am afraid I must recommend a reject, for the following main reasons.
First of all, I found key parts of the paper very difficult to read. Just to mention beforehand, though the English is not perfect, it's quite okay and not the main problem (though it contributes in parts to the difficulty). In the following, I want to give an idea of the experience for me of reading the paper, which I hope would give the authors a better impression of the problem from my perspective:
* To start with, in the introduction, it was entirely unclear for me by the end of the section what the paper was about. I mean I can see it's about two metrics, but I have little or no idea what the intuition of these metrics are or what *problem* they aim to address. One thing I did get from the introduction was the motivation (to evaluate the quality of a linkset) but the list of contributions that follows is poorly written: e.g., "a metric ... which checks the linkset complementation potential for any SKOS property" -- I could not understand this at all the first time I read the paper. So by the end of the introduction, I have only a vague idea of what the paper is about.
* Having, in my opinion, failed to give a concrete idea of what metrics are introduced, the paper begins with dense preliminaries in Section 2. First of all, trying to read through this section, I do not understand why these concepts are necessary and have no intuition as to what they will be used for. Having read the paper, I still feel they are unnecessarily complicated and messy. For example, definition 1, "multi-relational network from an RDF triple-set", is (for me) already an extremely awkward representation of an RDF graph, and then we get into various matrix operations for reasons that were lost on me at the time but that are then ultimately used to define reachability in a RDF graph in a specific number of hops traversing only certain properties (something that could be defined and explained a lot more simply, directly, intuitively ...). Struggling to keep all the messy notation and their intuitive meaning together in my head, Section 3 again builds upon that notation with more messy notation. The one really valuable part of Section 3 is that we finally get to see an example using the metric, and the examples in general are appropriate and easy to follow even if the metrics they describe are not.
* The validation framework starts with two research questions, but I do not really understand intuitively what these research questions mean or why they are important. The authors then go into detail on the potential problems of using a synthetic benchmark, but it comes across as defensive and in any case not comprehensive. I think I understand why the authors present this discussion: (i) to highlight design choices in their benchmark, and (ii) to rebut possible criticism of the benchmark. But at this stage I still have no idea what the benchmark is supposed to do. To be clear, there is no problem with a benchmark being synthetic per se. The problem is interpreting results of synthetic benchmarks too broadly or not testing the claims of the paper (like any benchmark). So this whole discussion is just strange to me. Eventually it becomes clear to me why that discussion is there: the benchmark design is not particularly clean, with various modifiers and parameters selected without any real justification that I could follow. Ultimately, the metrics for completeness and the corresponding results presented ... I really could not follow these at all, how they answer the research questions, etc. I'm really conceptually lost at this stage.
Second of all, relating to the previous point, while I now understand more or less the actual metrics proposed, I have little understanding of what Section 4 and 5 intend to show, or more importantly, what the idea behind these sections is. The authors have two metrics and they show how they vary when the linkset/taxonomies vary in completeness. I really don't understand why this is interesting or useful and I could not follow the results presented in Figure 4 and 5. With apologies to the authors, I sort of gave up trying to understand the results of Section 5 in detail: I had no idea what the results were trying to show or what I should be looking for and interpreting the graphs is difficult when there's all these different tests with all of these different modifiers and parameters and so forth. I feel many readers would do likewise.
Third, I feel that the assumption that the linkset is correct and complete is an impractical solution in many scenarios, which would limit the applicability of these metrics.
Perhaps to summarise, I believe the paper does have good motivation and the metrics/tools developed do have practical merit (as suggested in Section 6). However, I feel that the preliminaries are unnecessarily dense and I fail to understand the value of the experimental framework and results in Sections 4 & 5. Removing and simplifying the overly long or otherwise unclear parts of the paper, the remaining contributions feel quite minor to me (not at the level of a journal paper): two metrics to measure the amount of data imported by a SKOS linkset. For this reason, I am selecting a reject rather than a major revision.
In terms of comments to improve the paper, I think the authors could:
1) add a motivating example to the introduction already to give the idea of the metrics (e.g., using Figure 1),
2) clean up and simplify the preliminaries section and the definitions of the metrics,
3) I really don't know what to suggest for Section 4 and 5 because I did not get the idea at all; maybe just remove it all? Otherwise these sections need a lot of work.
I think part of the problem may stem from the fact that the authors are trying to build a research paper from something that, at it's very core, does not have much technical depth. The authors could maybe instead consider developing their tool further and presenting it as a tool paper, or perhaps doing some empirical analysis of real-world linksets and presenting those results.
Some minor comments:
= GENERAL
* "importing": This word does not feel right. "Importation" feels like a better noun to apply.
* "an RDF ..." (multiple ... the letter R has a vowel *sound*, like the word or, hence should use "an")
* "set of RDF triple[s]" (multiple)
* "the importing", "the reachabilty", "the importing and reachabilty" ... you should not have "the" here unless you also say something like "the importing and reachability *metrics*"
* cross-walking -> traversing?
* verteces -> vertices (or vertexes perhaps, keep consistent)
* "complement" Sometimes this can be used in a confusing manner since it can also mean set complement.
* spell-check
= ABSTRACT
* "In particular, {the} reachability and importing estimate"
= INTRODUCTION
* EU Governs -> EU Governments
* "most interesting promise{s} that Linked Data makes is [that] "Linked Data ...". Provide a reference for the quote.
* "in {the} Linked Data"
* List of contributions, particularly second item, not clear
* "Section 3 formalizes {the} importing and reachability"
= BASIC CONCEPTS
* "A[n] RDF triple"
* Do you ever use the distinction between RDFProp and OBJProp in the paper? Is it necessary?
* "Such type of linksets binds" -> "Such types of linkset bind"
* "[and] power matrix"
* Definition 1: very awkward. Also an RDF triple set is most common called an RDF graph.
* Definition 2: E_q is a set of pairs of vertices, so not sure how it can contain z.
* "For each object propert*y* z"
* Definition 3: "weigh[t]ed adjacency matrix"
* Definition 5: the superscript k in S^k make it seem like a power.
* "length minor or equal" -> "length less than or equal"
* "to T_o we define:" -> "to T_o. We define:" ... the first part that follows assumes k >= 1?
= LINKSET QUALITY
* "as good as" -> "as good if"
* "are special kind[s] of datasets"
* "which give indication[s] about"
* "user-specified metric[s]"
* "We *also* assume completeness"
* Definition 7, z is not quantified (\exists z?).
* "percentages normalized between 0 and 1." Percentages are values like 97%. Maybe ratio?
* Figure 1: where is y2, y4, x4, etc.? It's a little confusing when going through the examples.
* Definition 9: I realise the value would be different, but rather than do 1 - (|a|/|a U b|), I was wondering why not simply do (|b|/|a U b|). Seems more intuitive to me.
* Definition 9: look me a while to realise that "den" refers to the denominator, not the whole equation
* "but are not direct object[s] of the links"
* "and the set of vertexes" The formula just before has a dangling ')'
* Example 5 ... there are no skos:narrower links. Need to discuss earlier that these are implied.
= VALIDATION FRAMEWORK
* "The validation [framework] aims at ..."
* "Definitions 10 and 11[,] to evaluate"
* "when {this}[it] is complemented"
* "We want to demonstrate{, the} importing as a good predictor for" ... better to take a neutral stance and say you wish to investigate if it is a good predictor (in any case, it, by definition, measure multilingual gain, so again I'm not sure what's the idea here).
* "we consider {the these} two set[s] of .."
* "in term[s] of completeness"
* "of {the} our measures"
* "created [by] altering"
* "a varied kind of" -> "a variety of"
* "Thus, our ground truth{,}"
* "affecting synthetic benchmark[s]"
* "since they are not enough difficult" -> "since they are not difficult enough"
* "correctness and complet[e]ness for [the] linkset"
* "Th*ese* assumptions seem reasonable{,} since{,}"
* "enabling {in} a ..."
* "does not provide{s}"
* "th*ese* modifiers"
* "[by] developing"
* "with the aim of fully cover{ing}"
* "alterators"? "alterer" or alternator" or simply "modifier" perhaps.
* "The Test Sets Generator module performs two ... First ... Second ..." The latter two sentences are not full sentences. Make it a list if it's a list.
* "on [the] subject thesaurus (test set 1), on [the] object thesaurus (test set 2)"
* "All [of] the importing modifiers"
* "The we {really} construct"
* "10% and 40%" Why these values? This question extends to other values in the section.
* "related each others" -> "related to each other"
[at this point, apologies but I stopped noting minor corrections]
|