Review Comment:
Overall evaluation
Select your choice from the options below and write its number below.
-2
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
Reviewer's confidence
Select your choice from the options below and write its number below.
4
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
Novelty
Select your choice from the options below and write its number below.
2
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
Technical quality
Select your choice from the options below and write its number below.
2
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
Evaluation
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
Clarity and presentation
Select your choice from the options below and write its number below.
2
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
Review
This paper describes an effort to link Person records for digital libraries. The authors link person records from two different datasets (DBLP and the social science publication dataset Sowiport) to two different 'authority data sets': the GND, which is a (German) subset of VIAF and DBpedia.
The authors' hypothesis seems to be that this linking is improved when more structured information is available for the linking process. To test this, they describe a person matching approach which 1) finds string matches 2) compares records, including related keywords, co-authors etc and 3) performs some domain-specific filtering.
Moreover, the authors investigate the amount of (overlapping) structured information (used in step 2) for the various sources. This results in a number of interesting tables reporting on the type of information available in those sources.
My main concern with this paper is that it is unclear what the contribution is. As a description of an approach or algorithm to link persons it is lacking much needed detail into the specifics of the algorithm. From Section 3, I gather that there is not much more to it than standard record linkage techniques, which include string-matching and record-comparison. It is unclear what the extension beyond the state-of-the art here is.
On the other hand, it seems that the contribution could be a description of the amount of structured metadata in the various data sources, which could help matching algorithms. Here the authors find that there is 'currently very limited information beyond the author name'. But at the same time, they conclude that this is actually not that crucial. As the authors state: [This seems to] "suggest that the lack of information does not have a too negative effect on the performance of the person record linkage". I dont understand then what the contribution of the paper is.
In Section 5, the authors want to investigate "how much the name of a person and how much of the additional information (if available) on GND and DBpedia contributes to the correct matching of authors to their corresponding person records". The methodologically correct way of doing this would be to test two versions of the algorithm, one with and one without using structured information and test the effect on the evaluation. The way the authors do it now does not give clear evaluation of the effects.
Also, how generalizable is this whole algorithm and the findings. Do the found effects hold for scientific authors, for authors, or for all types of persons?
Some other issues
- In many cases, overly long sentences are used. These can make it hard to understand the intended meaning of these sentences. For example, in the 2nd paragraph in section 6, the first two sentences cover 10 lines.
- p2:"Not all links are of equal value..." -> This paragraph is confusing. I would suggest a rewriting that clarifies a) how the authors came to this conclusion (references or original research) and b) what they actually do with this conclusion. Did it influence the algorithm? the evaluation?
- In table 2, what is the difference between a "0" value and "NA"?
- p6: The algorithm description is not very detailed. For the preprocessing: what is the success rate of this conversion of name-ordering. What about names in other languages (Chinese,..)
- Sec5.3: why is one testset manually created and the other random. Why are they different size and how do these variations influence the evaluation?
- Table 2 comes before Table 1 (very minor issue)
|