Review Comment:
The paper describes a platform, called TIDAL, which integrates a set of existing technologies for the sharing of personal data in a privacy-preserving manner and for use in health research.
The platform consists of a set of components that make use of SOLID technology for requesting subsets of data stored in personal (distributed) data vaults and for offering a privacy-aware way to analyze the data.
Using this platform, health researchers can post participation requests which can be viewed and approved by participants. The personal health data of the participants are then retrieved and analysed in a privacy-preserving manner. All data, including participation requests, approvements, analysis, etc., are expressed and stored in RDF using established data models.
The paper is in general well-written and tackles a very interesting and timely topic. However, several claims are not justified, and also several aspects need to be better motivated and explained.
Specifically:
As stated in the introduction, the authors aim at addressing the research question: "how to engage individuals to “donate” their personal data for health-related research with maximal control in data access, storage, and analysis".
However, the paper does not provide any evidence about this "engagement", nor any explanation how citizens will be motivated to use the platform.
Authors state in the introduction: "The current personal data management technologies are mostly research-driven and in their early stages.".
There is no evidence that the proposed platform is not research-driven and not in its early stage as well, since it has not been used in a real environment.
Section 4 provides implementation details however it does not explain how participants can provide the data. In RDF directly? In other formats like spreadsheets using, for example, templates? (which means that data transformation is needed afterwards) What background knowledge do participants need? How easy is for a participant to create a pod? How do you ensure that data is provided in the desired manner? What if important data/parameters are missing?
Although there is a short relevant discussion in section 6, these aspects are, imo, very important and need to be clearly explained early in the paper. Also, the platform needs to contain mechanisms that can automate relevant processes (data entry, curation, etc.). Without a clear solution to this, it is difficult to judge if the platform will manage to be used in practice and actually *engage* individuals.
Other comments:
- Please provide examples of queries for each step of the pipeline: querying data request URIs, querying signature and verification key of data requests, querying RDF data from the participants' pods, etc.
- Section 4.3: It is not clear how one can register a new analysis algorithm, and how such a registration/extension is implemented in the platform.
- Last paragraph of section 4: "The queried data is then fed into the data analysis model which is pre-defined in the Docker image" => What is the format/model of the input and the output?
- Evaluation/Experiments: The objective and motivation of the evaluation needs to be explained. E.g., why is efficiency important in this context? (it does not seem so).
- Section 6 (Discussion): "TIDAL supports users to store and request personal data in a structured RDF format..." => There is no evidence on how users/citizens are *supported* on storing their personal data. E.g., is there a user interface for this?
- Section 6 (Discussion): "After the participants approve the data request, they can still update the data elements" => How?
- TIDAL emphasizes on engaging individuals in health research and connecting them with both researchers and data sources. => There is no evidence on how this engagement can happen.
About the resources provided and their sustainability:
There is a git repository explaining how to build and use a SOLID application. Its update date is two years ago (13-06-2020). There are lengthy video tutorials for each step of the process, however there is no mention of TIDAL.
With respect to privacy preservation, there are two videos explaining the processes of participation and analysis.
|