Review Comment:
Review
The paper describes DockerPedia, which is an engine for designing reproducible experiments.
The author designed DockerPedia following the conceptualisation presented in [3].
They aim at fostering experiment's reproducibility providing physical and logical conservation
stressing on the fact that little work was done on the conservation of the experiment infrastructure (Section 1, paragraph 2, page 2).
The authors clarify that Virtual Machines (VM) have been used to the same extent. Nevertheless,
they require a lot of disk space as they store the whole operating system.
On the contrary, containerization techniques seem to be promising. They require less space
still offering physical conservation.
Strong Points
- the paper touches the critical problem of experimental reproducibility
- the authors try to combine conceptualisation with a realistic technological stack
- the software is available on GitHub
Weak Point
- The are many imprecisions about containerization;
- the authors still oversell a bit their work: why is DockerPedia doing a better job than existing;
- the paper structure can be improved;
- the evaluation is still not convincing: it does not quantify the value of the approach.
The paper shows two major drawbacks that convinced me to do not do accept it.
The first problem regards the imprecisions related to Docker. Quoting from the paper
Section 1
- "Docker container can be seen as lightweight virtual machines".
I think that a scientific paper should not provide imprecise facts even though their intent is triggering an intuition.
- "Docker containers are used intensively in both industry and science, mostly to preserve the execution environment of software applications and also to
preserve the physical environment of an experiment"
I think the authors mean "docker images". Moreover, I think the second part of the sentence is a quite strong argument.
Section 2
- "Docker is a solution that allows virtualizing a minimal version of an Operating System (OS), sharing the resources from the host machine by means of software images."
Docker architecture include the docker client, dockerD, containerD, and runC
cf https://blog.docker.com/2017/08/what-is-containerd-runtime/
None of these components does virtualization. Moreover, containers are isolated and constrained process that run ON the host machine.
Software images are just software packages plus the necessary metadata that are used by runC and containerD to spawn containers.
It is an immutable file system.
Sec 2.2.
- "A Dockerfile is a text file that contains all commands to build a Docker image and run a container using this image."
A Dockerfile only contains instructions to build an image. Some of this instruction may condition the runtime behaviour of the container (CMD and ENTRYPOINT).
But this should be specified.
- "The first line in such files is the FROM keyword, which imports the base OS on which all software will be installed."
Sec 2.3.1
- "With Docker and AuFS that user can share the 1 GB data between all the containers. If that user has 1,000 containe"
Docker containers are mutable ephemeral copies of the Image file system. Therefore, each spawn container creates 1 GB copy.
Section 3
- "Moreover, as their evolution can be tracked along
the development process, it is possible to rollback to
previous Docker images in case new dependencies or
modifications introduce errors."
Docker images are immutable. Modifying an image requires to re-build it, which practically results in another image. An image is divided into layers, thus some of these layers may not be affected by a change. Nevertheless, the resulting image is a different one (different hash).
Rollback require to store the previous version.
Sec 3.2.1
- "Docker builds an image by either reading a set of instructions from a Dockerfile or just deploying that image on a host in case the Dockerfile is nor present."
The first line of a Dockerfile (FROM keyword) indicates the parent image that is used to start building the current one.
Docker images are built directly from running container using the "docker commit" command.
"docker build " automatizes the process. It spawns a container from the base image; it applied the following line in the
FROM 'scratch' indicate to start from an empty image
Sec 3.2.2
- ".. Docker image we are able to reproduce the Docker image only deploying it, and thus without modifying any parameter in the image. "
Docker images are immutable.
- "Each Docker image layer installs or removes software packages."
The second problem regards the proposed evaluation.
I understand that it is hard to assess the validity of Dockerpedia.
Nevertheless, the current proposal is not satisfying.
First of all, it does not assess that Dockerpedia works. What it "demonstrates" is that docker can replace VM as an execution environment. This is expectable as Docker is becoming the de-facto standard.
I think the authors should measure how much Dockerpedia impact the experimental workflow of a researcher.
How much it costs to use Dockerpedia in place of previous solutions?
Moreover, a lot of claims were made on the waste of storage caused by VM, but I miss to find the actual measurements for the Dockerpedia solution.
Finally, a side note regards the results. The authors claim they were successful in the 100% of the experiments they did. Although this is possible,
it usually raises the doubt you're using the wrong instrument to measure success. Research must be falsifiable, which requires to understand
the exact context where it can be applied.
Two final minor remarks:
- In Section 2, the authors explain how they extended the WICUS ontology to describe docker images and steps in the Dockerfiles.
How does this differ by [29] "Describing Docker file in RDF"?
- In Section 4, table 1 can be omitted as line 1 and 3 are always the same.
|