Recompute analysis results in dockerized environment #24

Merged
mih merged 6 commits from docker into master 2023-10-12 17:06:29 +00:00

6 commits

Author SHA1 Message Date
57d2565ab7 [DATALAD RUNCMD] Recompute analysis results in dockerized environment
The creation of a containerized environment for the analysis became
necessary, because 3+ years after the "final" results have been computed
originally, it is getting difficult to recreate a matching computational
environment.

Even with pinned versions of essential software dependencies, issues
of incompatibilities with modern Python versions slowly arise.

The container setup used for this recomputation is the result of a
detailed exploration on the effect of software versions and deployment
methods. A reports is provided at
https://github.com/psychoinformatics-de/paper-remodnav/issues/20

Importantly, the employed setup is NOT capable of yielded exactly
identical results. While all statistical scores reported in the paper
remain indeed identical, there is a visually small change to one
histogram panel in Fig 4. The change is illustrated at
https://github.com/psychoinformatics-de/paper-remodnav/issues/20#issuecomment-1757462683

Given the overall state of reproducibility, and the anticipated
longevity of the containerized computation, we decided that this small
difference with respect to the journal publication is tolerable.

This changeset support a DataLad-based re-execution (for verification):

```
datalad rerun <commitsha>
```

After this changeset, a complete manuscript can be compiled, also
via DataLad via a:

```
datalad containers-run -n docker-make main.pdf
```

By default this uses the local Python installation via `python` to
orchestrate Docker. If python is available via a different name,
overide, for example, via:

```
datalad -c datalad.run.subsitutions.python=python3 rerun <commitsha>
```

Closes #20

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "{python} -m datalad_container.adapters.docker run container/image sh -c \"mkdir /tmp/dockertmp; HOME=/tmp/dockertmp make -f Docker-Makefile clean results_def.tex && rm -rf /tmp/dockertmp\"",
 "dsid": "c5a79271-7d24-42aa-a0cf-38d84fd15eaa",
 "exit": 0,
 "extra_inputs": [
  "container/image"
 ],
 "inputs": [
  "remodnav/remodnav/tests/data/anderson_etal",
  "data/studyforrest-data-eyemovementlabels/sub-*/*.tsv",
  "data/raw_eyegaze/sub-*/ses-movie/func/*_recording-eyegaze_physio.tsv.gz",
  "data/raw_eyegaze/sub-*/beh/*_recording-eyegaze_physio.tsv.gz"
 ],
 "outputs": [
  "img",
  "results_def.tex"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2023-10-12 15:58:02 +02:00
367bbeead0 [DATALAD] Configure containerized environment 'docker-make'
This can be used with `containers-run` and normal "make" targets,
but from the Docker-Makefile, and with their execution actually
taking place inside the container. For example

```
datalad containers-run -n docker-make main.pdf
```
2023-10-12 15:57:25 +02:00
3dd49e368d [DATALAD RUNCMD] Build docker image with analysis environment
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "sh -c 'rm -rf container/image; docker build -t remodnav:latest container && python -m datalad_container.adapters.docker save remodnav:latest container/image && echo '\"'\"'**/*json annex.largefiles=nothing\\nrepositories annex.largefiles=nothing\\n**/VERSION annex.largefiles=nothing'\"'\"' > container/image/.gitattributes'",
 "dsid": "c5a79271-7d24-42aa-a0cf-38d84fd15eaa",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [
  "container/Dockerfile"
 ],
 "outputs": [
  "container/image"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2023-10-11 20:18:59 +02:00
92dbad28ed Avoid declaring an ENV override that last till container runtime
We only need it for installation time
2023-10-11 20:11:16 +02:00
70c9de37ac Prevent undesired annexing of file content 2023-10-11 20:11:16 +02:00
5468b6a9cc [DATALAD] new dataset 2023-10-11 19:32:55 +02:00