A PROV-Compliant Approach for the Script-to-Workflow Process

Tracking #: 2047-3260

This paper is currently under review
Lucas Carvalho
Khalid Belhajjame1
Claudia Bauzer Medeiros

Responsible editor: 
Guest Editors Semantic E-Science 2018

Submission type: 
Full Paper
Scientific discovery and analysis are increasingly computational and data-driven. Scripting languages, such as Shell, Python and R, are the means of choice of the majority of scientists to encode and run their simulations and data analyses. Although widely used, scripts are hard to understand, adapt, reuse, and reproduce. To tackle the problems faced by scripts, several approaches have been proposed such as YesWorkflow and noWorkflow. However, they neither allow to fully document the experiment nor do they help when third parties want to reuse just part of the code. Scientific Workflow Management Systems (SWfMSs) are being increasingly recognized to mitigate these problems. They help to document and reuse experiments by supporting scientists in the design and execution of their experiments, which are specified and run as interconnected (reusable) workflow components (a.k.a. building blocks). Taking this into account, we designed W2Share, a novel approach for the management, reuse, and reproducibility of script-based experiments. W2Share transforms a script into an executable workflow that is accompanied by annotations, example datasets and provenance traces of their execution, all of which encapsulated into a workflow research object. This allows third party users to understand the data analysis encoded by the original script, run the associated workflow using the same or different datasets, or even repurpose it for a different analysis. W2Share also enables traceability of the script-to-workflow process, thereby establishing trust in this process. All processes in W2Share follow a methodology that is based on requirements that we elicited for this purpose. The methodology exploits tools and standards that have been developed by the scientific community, in particular YesWorkflow, Research Objects and the W3C PROV. This paper highlights the main components of W2Share, which is showcased through a real world use case from Molecular Dynamics. We furthermore validate our approach by testing the ability to answer competency questions that address the script-to-workflow process.
Full PDF Version: 
Under Review