Review Comment:
Having read the rebuttal from the authors and the new paper iteration, I have to say that the paper improved its shape, but sadly it is not yet there for acceptance as-is.
I think some points raised by the other reviewers, with which I could not agree more, are not fully addressed yet (e.g., methodology, evaluation baseline, dataset construction), and I am sure the other colleagues will elaborate further on their regards.
In my opinion, the major flaw of the paper boils down to the clarity of the presentation, which I find most time convoluted and hard to parse straightforwardly.
Mainly, the reader needs multiple goes to get the gist of the whole thing, which is somehow undesirable; let alone delving into full-on implementation details, which requires overcoming the barrier of reading between the lines and connecting dots.
Comments
- A running ambiguity between "(research) field" and "topic" can be found throughout the paper. This emerges from many statements such as "related topics representing the set of shared themes, or research fields." or "Researchers understand the topics by first reviewing a multitude of articles, internalising the evolution occurring within the researchers' fields of interest", or "the field of polysaccharide" (this being one of the 20 selected FoS). Is a field just a topic at a "higher level" in the taxonomy? Isn't the hierarchical topic/subtopic organisation enough? If not, what makes a topic a topic, and what makes a field a field and not a topic? I.e., is chemistry a topic, a field, or a domain? Why so? Alternatively, you also use "knowledge domain", "concept", and "theme", leaving to the reader's imagination what they stand for and what not. For the sake of the record, you explicitly define FoS (the ones you select, and the other ones contained in the topic networks) and knowledge domains "such as business, chemistry, law, and medicine.". Any time you use such terms differently, you increase ambiguity; which is bad. I would suggest reducing jargon variability if these terms are all used interchangeably, or instead explain the differences, if any, w.r.t. your application.
- Similarly, in 3.3, when you describe inter-domain and intra-domain, and you refer to "pairs of domains". I assume you mean domain = FoS, is this the case? Please clarify.
- Also, "The field of topic evolution…" could work better just "(Research) topic evolution…".
- In the introduction, you write, "The topic networks are first extracted from an open bibliographical dataset, with each network representing publications in a specific research journal with a focused set of research interests." you mention a journal (or a set of journals?), but it is unclear where these come in play. Later on, you write that you select 20 FoS seeds to generate the evolving topic networks. No journal seems to take part in this process. Please, drop it if this is the case.
- Still on this note, in section 4.1, you describe MAG snapshot, then move on to describing FoS selection process; then at the end, and after having presented Table 2 (which should come last, IMHO), you go back to MAG tables and their structure. Isn't this unnecessarily convoluted? Shouldn't you mention this earlier to support the FoS selection process in the first place? Also, you mention "filtered papers", which come out of the blue. How do you filter papers? Why? Is this by any chance where you use journal(s) (see point above) or something else? In the rebuttal, at some point you mention "journal was supposed to refer to the SWJ", which however is not explicit anywhere in the paper. Also, in that case, shouldn't the FoS be all relevant to semantic web then?
- Also, "from an open bibliographical dataset" -> I think you can say up front that you are using MAG. No mystery reveal is required. Finally, at the bottom of the same column, after having hinted at the binary classification process, you go back to topic network extraction, reiterating that they have been extracted from MAG. Isn't this an unneeded repetition? Couldn't this be moved up and integrated with the previous point? Please, improve clarity.
- What happens if a new topic appears in a year Y and promptly disappears in Y+1? Is a new topic stability/persistence addressed at all? Would this still count as a new topic in your analysis? In my mind, an emerging topic is something never seen before that's meant to stay; e.g., computational genomics.
- It is still unclear how you reached to the prediction examples you provided in the results section (e.g. "with possible invisibility using its photo- luminescence properties"). Can your system make predictions such as "in year Y a new topic T was indeed flagged at the intersection of topic A and topic B"?
- Maybe a section describing the limitation of your approach could be added to the closing remarks of the paper.
- State of the Art. Better than before, but I noticed that [1] is mentioned in the introduction, but I didn't get why it is not mentioned in the "technology forecasting" paragraph in the related work.
- The text description citing table 2 lacks the newly added columns.
- "to understand the topic in each document" -> I would say topics as one paper generally deals with more than one topic.
- "is then tried" -> better tested
- "hierarchical concepts are then tagged to the papers" -> I would rather say that papers are tagged with concepts, not the other way round.
- "tagged FoS" what do you mean? FoS used in tagging papers?
- Sentences throughout the paper sometimes feel overloaded with the article "the"; e.g., "Researchers understand the topics…". Please revise accordingly.
|