distribits #1
4 changed files with 406 additions and 0 deletions
1
distribits-2025/figures/ddorf.jpg
Symbolic link
1
distribits-2025/figures/ddorf.jpg
Symbolic link
|
|
@ -0,0 +1 @@
|
||||||
|
../../.git/annex/objects/Vm/GP/SHA256E-s367193--38caa10cc8ee13d55f9e6721600e55d90c2511befca574eec955afdbb55b2ea1.jpg/SHA256E-s367193--38caa10cc8ee13d55f9e6721600e55d90c2511befca574eec955afdbb55b2ea1.jpg
|
||||||
|
After Width: | Height: | Size: 202 B |
1
distribits-2025/figures/individual-vs-template.png
Symbolic link
1
distribits-2025/figures/individual-vs-template.png
Symbolic link
|
|
@ -0,0 +1 @@
|
||||||
|
../../.git/annex/objects/KF/GV/SHA256E-s111766--1f456cb2a78d0d8c35418deb81a256b7481a130fd87216203c60874904bfc277.png/SHA256E-s111766--1f456cb2a78d0d8c35418deb81a256b7481a130fd87216203c60874904bfc277.png
|
||||||
|
After Width: | Height: | Size: 202 B |
1
distribits-2025/figures/templateflow_fig-templates.png
Symbolic link
1
distribits-2025/figures/templateflow_fig-templates.png
Symbolic link
|
|
@ -0,0 +1 @@
|
||||||
|
../../.git/annex/objects/6q/7f/SHA256E-s1629740--60c77d1998ae4920616921eb3c86ee99cad93e3acf77ddfac9e06c832c5d2231.png/SHA256E-s1629740--60c77d1998ae4920616921eb3c86ee99cad93e3acf77ddfac9e06c832c5d2231.png
|
||||||
|
After Width: | Height: | Size: 204 B |
403
distribits-2025/index.qmd
Normal file
403
distribits-2025/index.qmd
Normal file
|
|
@ -0,0 +1,403 @@
|
||||||
|
---
|
||||||
|
title: Compute on demand
|
||||||
|
subtitle: an fMRIPrep use case
|
||||||
|
author: "[Michał Szczepanik](https://mszczepanik.eu)"
|
||||||
|
institute: Forschungszentrum Jülich
|
||||||
|
date: 2025-10-24
|
||||||
|
format:
|
||||||
|
revealjs:
|
||||||
|
footer: "{{< meta title >}} - <https://distribits.live>"
|
||||||
|
code-annotations: hover
|
||||||
|
---
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
## Special remotes
|
||||||
|
|
||||||
|
> Don't envision a special remote as merely a physical place or
|
||||||
|
> location -- a special-remote is a protocol that defines the
|
||||||
|
> underlying transport of your files to and/or from a specific
|
||||||
|
> location.
|
||||||
|
>
|
||||||
|
> --- DataLad Handbook, p. 194
|
||||||
|
|
||||||
|
To `get` files:
|
||||||
|
|
||||||
|
- download from S3, Nextcloud, web...
|
||||||
|
- extract from archive
|
||||||
|
- (re)create?!
|
||||||
|
|
||||||
|
## Independent implementations
|
||||||
|
|
||||||
|
In this talk:
|
||||||
|
|
||||||
|
- [git-annex compute](https://git-annex.branchable.com/special_remotes/compute/) (built-in) by JoeyH
|
||||||
|
- [DataLad remake](https://github.com/datalad/datalad-remake/) (extension / unreleased) by PsyInf
|
||||||
|
|
||||||
|
Prior art:
|
||||||
|
|
||||||
|
- [DataLad getexec](https://github.com/matrss/datalad-getexec) (extension / unreleased) by Matrss
|
||||||
|
|
||||||
|
## Credit
|
||||||
|
|
||||||
|
- git-annex & git-annex compute:
|
||||||
|
- Joey Hess
|
||||||
|
- DataLad remake:
|
||||||
|
- the Psychoinformatics Group (INM-7, FZ Jülich)
|
||||||
|
- Christian Mönch, Gosia Wierzba, Michael Hanke
|
||||||
|
- [eBRAIN-Health (HORIZON-INFRA-2021-TECH-01-01, grant no. 101058516)](https://cordis.europa.eu/project/id/101058516)
|
||||||
|
|
||||||
|
## Use cases
|
||||||
|
|
||||||
|
"Storage is cheap", right?
|
||||||
|
|
||||||
|
- provide data in alternative (file) formats (store CSV, provide XLSX on demand)
|
||||||
|
- render partial data for specific purposes (cut source video into clips)
|
||||||
|
- apply edits to a photo (RAW to JPEG)
|
||||||
|
- apply spatial transformations to fMRI images
|
||||||
|
|
||||||
|
## Example task (tutorial / comparison)
|
||||||
|
|
||||||
|
::: {.callout-note .incremental}
|
||||||
|
we'll do fMRI later
|
||||||
|
:::
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
```
|
||||||
|
gmic input image.jpg map_clut kodak_kodachrome_64 output kodachromed.jpg
|
||||||
|
```
|
||||||
|
|
||||||
|
::: footer
|
||||||
|
Photo by [Nicholas Peyrol](https://unsplash.com/@nicolaspeyrol)
|
||||||
|
on [Unsplash](https://unsplash.com/photos/city-skyline-under-blue-sky-during-daytime-l2VmsBG8nPE)
|
||||||
|
:::
|
||||||
|
|
||||||
|
# git-annex compute
|
||||||
|
|
||||||
|
## Compute program
|
||||||
|
|
||||||
|
```{.python code-line-numbers="7-9|12,14|16,18|20,22-25" filename=~/.local/bin/git-annex-compute-clut}
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
import argparse
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("in")
|
||||||
|
parser.add_argument("out")
|
||||||
|
parser.add_argument("clut")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
sys.stdout.write(f"INPUT {args.in}\n")
|
||||||
|
sys.stdout.flush()
|
||||||
|
input_file = sys.stdin.readline().rstrip()
|
||||||
|
|
||||||
|
sys.stdout.write(f"OUTPUT {args.out}\n")
|
||||||
|
sys.stdout.flush()
|
||||||
|
output_tempfile = sys.stdin.readline().rstrip()
|
||||||
|
|
||||||
|
subprocess.run(
|
||||||
|
[
|
||||||
|
"gmic",
|
||||||
|
"input", input_file,
|
||||||
|
"map_clut", args.clut,
|
||||||
|
"output", output_tempfile,
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## In action
|
||||||
|
|
||||||
|
prerequisites (to enable remote)
|
||||||
|
|
||||||
|
```
|
||||||
|
git config --global annex.security.allowed-compute-programs \
|
||||||
|
git-annex-compute-clut
|
||||||
|
```
|
||||||
|
|
||||||
|
usage
|
||||||
|
|
||||||
|
```
|
||||||
|
git annex initremote clut type=compute program=git-annex-compute-clut
|
||||||
|
git annex addcomputed \
|
||||||
|
[--fast]
|
||||||
|
[--reproducible]
|
||||||
|
--to clut
|
||||||
|
foo.jpg foo_k64.jpg kodak_kodachrome_64
|
||||||
|
```
|
||||||
|
|
||||||
|
# DataLad remake
|
||||||
|
|
||||||
|
## Compute template
|
||||||
|
```{.toml code-line-numbers="1|2-6" filename=".datalad/make/methods/clut.toml"}
|
||||||
|
parameters = ["in", "out", "clut"]
|
||||||
|
command = [
|
||||||
|
"gmic",
|
||||||
|
"input", "{in}",
|
||||||
|
"map_clut", "{clut}",
|
||||||
|
"output", "{out}"
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
## In action
|
||||||
|
|
||||||
|
prerequisites
|
||||||
|
|
||||||
|
```
|
||||||
|
git config --global --add datalad.make.trusted-keys <key-id>
|
||||||
|
```
|
||||||
|
|
||||||
|
make
|
||||||
|
|
||||||
|
```
|
||||||
|
datalad -c commit.gpgsign=true save -m "Add compute template"
|
||||||
|
datalad -c commit.gpgsign=true make \
|
||||||
|
-i foo.jpg
|
||||||
|
-o foo_k64.jpg
|
||||||
|
-p in=foo.jpg
|
||||||
|
-p out=foo_k64.jpg
|
||||||
|
-p clut=kodak_kodachrome_64
|
||||||
|
clut.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
## What gets recorded?
|
||||||
|
|
||||||
|
```{.json filename=".datalad/make/specifications/06a6ca0708e839a5ecea95d6d1bed9a3"}
|
||||||
|
{
|
||||||
|
"input": [
|
||||||
|
"foo.jpg"
|
||||||
|
],
|
||||||
|
"method": "clut.toml",
|
||||||
|
"output": [
|
||||||
|
"foo_k64.jpg"
|
||||||
|
],
|
||||||
|
"parameter": {
|
||||||
|
"clut": "kodak_kodachrome64",
|
||||||
|
"in": "foo.jpg",
|
||||||
|
"out": "foo_k64.jpg"
|
||||||
|
},
|
||||||
|
"stdout": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
datalad-remake:///?label=clut.toml
|
||||||
|
&root_version=0dc52b9eeca5838144ad07c3766cdc4ef84c37cf
|
||||||
|
&specification=06a6ca0708e839a5ecea95d6d1bed9a3
|
||||||
|
&this=foo_k64.jpg
|
||||||
|
```
|
||||||
|
|
||||||
|
# Comparison
|
||||||
|
|
||||||
|
## compute & remake
|
||||||
|
|
||||||
|
| | git-annex compute | datalad remake |
|
||||||
|
|---------------|---------------------------------|------------------------------------|
|
||||||
|
| specification | program / protocol | template / config |
|
||||||
|
| branch | git-annex | main (git-annex URL) |
|
||||||
|
| provision | paths to (annex) objects | secondary git worktree (in `/tmp`) |
|
||||||
|
| trust | executable in PATH + git config | signed commit + git config |
|
||||||
|
| submodules | one repo | subdatasets |
|
||||||
|
| reproducible | option | only |
|
||||||
|
|
||||||
|
## what about `datalad run` / `rerun`?
|
||||||
|
|
||||||
|
:::: {.columns}
|
||||||
|
|
||||||
|
::: {.column width="50%"}
|
||||||
|
Run record:
|
||||||
|
|
||||||
|
- stored in commit message
|
||||||
|
- used by `rerun`
|
||||||
|
- may commit
|
||||||
|
- uses branches (default: current HEAD)
|
||||||
|
- provenance capture
|
||||||
|
:::
|
||||||
|
|
||||||
|
::: {.column width="50%"}
|
||||||
|
Make spec:
|
||||||
|
|
||||||
|
- stored in file
|
||||||
|
- used by `get`
|
||||||
|
- never commits, "slow download"
|
||||||
|
- always temporary worktree, past state
|
||||||
|
- storage reduction
|
||||||
|
- more flexible
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
# fMRIPrep
|
||||||
|
|
||||||
|
## Motivation: spatial normalization
|
||||||
|
|
||||||
|
::: {layout-ncol=2}
|
||||||
|
{width=300}
|
||||||
|
|
||||||
|
{width=400}
|
||||||
|
:::
|
||||||
|
|
||||||
|
transforms: slow to compute ‧ small to store ‧ quick to apply
|
||||||
|
|
||||||
|
::: footer
|
||||||
|
left image: adapted from [fMRIPrep](https://fmriprep.org/en/stable/) docs;
|
||||||
|
right image: [TemplateFlow](https://www.templateflow.org/) docs
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Enablers
|
||||||
|
|
||||||
|
:::: {.columns}
|
||||||
|
|
||||||
|
::: {.column width=50%}
|
||||||
|
|
||||||
|
- BIDS
|
||||||
|
- Brain Imaging Data Structure
|
||||||
|
- standardized file names
|
||||||
|
- sidecar metadata
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::: {.column width="50%"}
|
||||||
|
|
||||||
|
- fMRIPrep
|
||||||
|
- state-of-the-art data preprocessing pipeline
|
||||||
|
- made for BIDS
|
||||||
|
- widely adopted
|
||||||
|
- easy to select templates
|
||||||
|
- modular (Nipype)
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
## Dataset
|
||||||
|
|
||||||
|
Output of `datalad run fmriprep ...`
|
||||||
|
|
||||||
|
``` {.txt code-line-numbers=false code-line-numbers="|2-3|5,7|8-10|8,11"}
|
||||||
|
[DS~0] /tmp/ds005479-remake-demo
|
||||||
|
├── inputs/
|
||||||
|
│ └── [DS~1] ds005479/
|
||||||
|
└── sub-01/
|
||||||
|
├── anat/
|
||||||
|
│ ├── preproc_T1w.nii.gz
|
||||||
|
│ ├── from-T1w_to-MNI152NLin2009cAsym_xfm.h5 # 90 MB
|
||||||
|
└── func/
|
||||||
|
├── from-boldref_to-T1w_desc-coreg_xfm.txt # 369 B
|
||||||
|
├── from-orig_to-boldref_desc-hmc_xfm.txt # 84 kB
|
||||||
|
└── space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz # 432 MB 🖜
|
||||||
|
```
|
||||||
|
|
||||||
|
<https://hub.datalad.org/mslw/ds005479-remake-demo>
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
```{.python filename="code/resample.py"}
|
||||||
|
from fmriprep.workflows.bold.apply import init_bold_volumetric_resample_wf
|
||||||
|
|
||||||
|
# Step 1: figure out data dependencies, parameters
|
||||||
|
|
||||||
|
# Step 2: set up workflow
|
||||||
|
|
||||||
|
# Step 3: connect inputs
|
||||||
|
|
||||||
|
# Step 4: run the pipeline
|
||||||
|
|
||||||
|
# Step 5: tweak file header
|
||||||
|
```
|
||||||
|
|
||||||
|
ca. 360 LOC
|
||||||
|
|
||||||
|
<https://hub.datalad.org/mslw/fmriprep-resampling>
|
||||||
|
|
||||||
|
[with thanks to Chris Markiewicz for suggestions]{style="font-size: 50%;"}
|
||||||
|
|
||||||
|
|
||||||
|
## Compute template
|
||||||
|
|
||||||
|
```{.toml filename=".datalad/make/methods/shortcut.toml"}
|
||||||
|
parameters = ["target_file"]
|
||||||
|
command = [
|
||||||
|
"python",
|
||||||
|
"code/resample.py",
|
||||||
|
"{target_file}",
|
||||||
|
"inputs/ds005479",
|
||||||
|
"."
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
::: {.callout-tip}
|
||||||
|
use pinned (locked) requirements / run inside container
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Data dependencies
|
||||||
|
|
||||||
|
```{.txt code-line-numbers=false filename=".datalad/make/inputs/sub-01_task-MID_space-MNI152NLin2009cAsym"}
|
||||||
|
inputs/ds005479/sub-01/func/sub-01_task-MID_bold.json
|
||||||
|
inputs/ds005479/sub-01/func/sub-01_task-MID_bold.nii.gz
|
||||||
|
sub-01/anat/sub-01_from-T1w_to-MNI152NLin2009cAsym_mode-image_xfm.h5
|
||||||
|
sub-01/fmap/sub-01_fmapid-auto00000_desc-coeff_fieldmap.nii.gz
|
||||||
|
sub-01/fmap/sub-01_fmapid-auto00000_desc-epi_fieldmap.nii.gz
|
||||||
|
sub-01/fmap/sub-01_fmapid-auto00000_desc-preproc_fieldmap.json
|
||||||
|
sub-01/func/sub-01_task-MID_desc-hmc_boldref.nii.gz
|
||||||
|
sub-01/func/sub-01_task-MID_from-boldref_to-T1w_mode-image_desc-coreg_xfm.txt
|
||||||
|
sub-01/func/sub-01_task-MID_from-boldref_to-auto00000_mode-image_xfm.txt
|
||||||
|
sub-01/func/sub-01_task-MID_from-orig_to-boldref_mode-image_desc-hmc_xfm.txt
|
||||||
|
sub-01/func/sub-01_task-MID_space-MNI152NLin2009cAsym_boldref.nii.gz
|
||||||
|
sub-01/func/sub-01_task-MID_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz
|
||||||
|
sub-01/func/sub-01_task-MID_space-MNI152NLin2009cAsym_desc-preproc_bold.json
|
||||||
|
```
|
||||||
|
|
||||||
|
::: {.callout-note}
|
||||||
|
this file is temporary, needs not be committed
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Prospective instruction
|
||||||
|
|
||||||
|
create:
|
||||||
|
``` {.bash code-line-numbers=false}
|
||||||
|
TARGET=sub-01_task-MID_space-MNI152NLin2009cAsym
|
||||||
|
|
||||||
|
datalad make \
|
||||||
|
--prospective-execution \
|
||||||
|
--input-list .datalad/make/inputs/${TARGET} \
|
||||||
|
--output sub-01/func/${TARGET}_desc-preproc_bold.nii.gz \
|
||||||
|
--parameter target_file=sub-01/func/${TARGET}_desc-preproc_bold.nii.gz \
|
||||||
|
shortcut.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
then:
|
||||||
|
``` {.bash code-line-numbers=false}
|
||||||
|
datalad drop ...
|
||||||
|
datalad get -s datalad-remake-auto ... # a few minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Worth it?
|
||||||
|
|
||||||
|
- kept 100 MB, dropped 430 MB --- 300 MB gain
|
||||||
|
- × 2--3 output spaces --- 0.5--1 GB gain
|
||||||
|
- × 2--3 runs --- 2--3+ GB gain
|
||||||
|
- × 50 subjects --- 0.1 TB for an average study
|
||||||
|
- noticeable chunk of a project quota
|
||||||
|
- even more for large projects
|
||||||
|
|
||||||
|
# Coda
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- hard to debug (code runs inside special remote)
|
||||||
|
- very situational
|
||||||
|
- many-to-many not efficient
|
||||||
|
- not tested at scale
|
||||||
|
- the remake that never was:
|
||||||
|
- CWL
|
||||||
|
- metadata instead of Git repo
|
||||||
|
|
||||||
|
## Key messages
|
||||||
|
|
||||||
|
- DataLad and git-annex now provide compute-on-demand
|
||||||
|
- room for further development
|
||||||
|
- praise fMRIPrep for reproducibility and modularity
|
||||||
|
- have fun with your examples
|
||||||
Loading…
Add table
Add a link
Reference in a new issue