Review Comment:
Quality/importance/impact:
The paper describes the TheyBuyForYou platform and knowledge graph. It provides a specific use case of how knowledge graphs can be utilized for the public procurement domain. The use case can subsequently enhance the procurement data processing, enrichment and analysis pipeline. This can potentially improve public procurement transparency and accountability. In addition, a large knowledge graph is produced by the TheyBuyForYou project, in which some of the popular ontologies are reused to model the procurement data and related corporate information. The authors also work with both public institutions and private companies that adopt and evaluate their platform. Some evaluations for other aspects (e.g., data storytelling) are also provided along with the platform’s current limitation.
Readability:
The paper is rather easy to understand, however, there are some parts that need to be reformulated. It is often because the sentences were written in a lengthy manner within one sentence. For example:
- “These led to the emergence of national public procurement...” (Page 2)
- “Automatic storytelling technology available so far...” and subsequent sentences. (Page 14)
- “Results, see Table 2, were quite promising...” (Page 20)
- “During the data upload process, …” (Page 20, due to the relative clauses, brackets, and commas following the sentence), and likewise
- “As the tool is unable…” (Page 21)
Such sentences should be reformulated.
Additional comments/feedback:
- On page 6, it is stated that external vocabularies and ontologies are reused where appropriate. Are there any parameters to determine whether certain vocabularies and ontologies are deemed appropriate in this case?
- Figure captions could be made more explanatory to improve the readability. For example, the captions of Fig. 1 and Fig. 2 can be accompanied by short sentences regarding what the OCDS/euBusinessGraph ontology is for. The extra explanation would also be helpful for Fig. 7, explaining how the statements below each chart can be associated with the visuals of respective charts (the association is not yet intuitive).
- To achieve readability for a wider audience (e.g. government entities, mostly with a lacking background of data analysis), the charts and graph in Fig. 10 should be associated with the explanation provided on Page 18 (e.g., it is not yet explanatory which part of Fig. 10 a translates to the referred “large transaction” mentioned in Case 1 on page 18). It would also be interesting to show the result of D-Tree algorithms after the screenshot in Fig. 5 is run.
- Is it correct that the reconciliation API in Fig. 3 is not connected to another component (e.g., triple store, OC API, and OO API) in the architecture? This seems to contradict point 2 (reconcile suppliers) on page 9.
- (Minor) It would be great to make the mentioned technical tools explained shortly so that the paper is self-contained without having to open the links on the footnotes. For example, the mention of Velocity templates on page 9. Also, the way it is formulated leads to an ambiguous notion whether “allows specifying how the REST API will look like” is meant for Velocity templates or meant for the R4R tool.
- The description of the actor who performs the data ingestion process is missing (except for the data storytelling section). To what extent do the buyers/companies publishing the data are involved in the data ingestion pipeline? If they were involved, how far can they keep up with the learning curve of semantic technologies?
- An explanation regarding the JSON-XML-RDF pipeline might be missing. If the initial JSON data does not contain the hierarchy (as implied in point 4 of page 9), how was the hierarchy obtained by transforming the JSON data into XML?
- (Minor) On page 18, a pipeline for document processing is mentioned (col. 2 line 25). A graphical representation of this process would be helpful. Also, lemmatization is mentioned in the process. The explanation of how the lemmatization of different languages (non-English) is performed would also be interesting.
- (Minor) On page 21 (section 9), the terms semantic web, knowledge graph, and linked data are mentioned. For a wider audience (i.e., from a non-semantic web community who might be interested in learning semantic-based procurement tools), a brief explanation regarding those three different terms would also be helpful.
|