datalad-course/html/ohbm26-brainhack.html
Adina Wagner 86bf5012f4 fix year
2026-06-11 08:10:23 +02:00

455 lines
20 KiB
HTML

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<!-- Edit me start! -->
<title>OHBM 2026 Traintrack</title>
<meta name="description" content=" OHBM 2026 Traintrack ">
<meta name="author" content=" Adina Wagner ">
<!-- Edit me end! -->
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
<link rel="stylesheet" href="../css/main.css">
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
</head>
<body>
<div class="reveal">
<div class="slides">
<!-- Start of slides -->
<section>
<section>
<h2>2026 OHBM Brainhack-Traintrack<br />👩‍💻👨‍💻<br />RDM & Version Control</h2>
<div style="margin-top:1em;text-align:center">
<table style="border: none;">
<tr>
<td>
Adina Wagner<br><small><img data-src="../pics/mastodon.svg" style="height:30px;margin:0px" /> <a href="https://mas.to/@adswa" target="_blank">@adswa@mas.to</a></small>
</td>
</tr>
<tr>
<td>
<img style="height:70px;margin-right:10px;vertical-align:middle" data-src="../pics/fzj_logo.svg" /><br>
</td>
<td style="vertical-align:middle">
<small><br> Institute of Neuroscience and Medicine (INM-7)<br>
<strong>Research Center Jülich</strong></small><br>
</td>
</tr>
</table>
</div>
<p style="z-index: 100;position: fixed;background-color:#ede6d5;font-size:35px;box-shadow: 10px 10px 8px #888888;margin-top:0px;margin-bottom:100px;margin-left:1000px">
<img src="../pics/ohbm26.png" height="200">
</p><small>
<br>
Slides: <a href="https://files.inm7.de/adina/talks/html/ohbm26-brainhack.html#/" target="_blank">files.inm7.de/adina/talks/html/ohbm26-brainhack.html</a>
</small>
</section>
</section>
<section>
<section>
<h2>RDM - what and why?</h2>
<div class="r-stack">
<p class="fragment fade-out" data-fragment-index="1">
<img width="1200" src="../pics/turingway_rdm.png">
<br><br>
... ideally, as <a href="https://www.go-fair.org/fair-principles/" target="_blank">F.A.I.R.</a> (<a href="https://doi.org/10.1038/sdata.2016.18" target="_blank">Wilkinson et al., 2016</a>) as possible
</p>
<img class="fragment fade-in-then-out" data-fragment-index="1" width="700" src="../pics/carrotsandsticks.png">
<img class="fragment fade-in-then-out" data-fragment-index="2" height="700" src="../pics/sidney_harris_miracle.jpg">
<p> <img class="fragment fade-in-then-out" data-fragment-index="3" src="../pics/reallifeexample.png" style="box-shadow: 10px 10px 8px #888888;height=200px" height="200"><br>
<img class="fragment fade-in-then-out" data-fragment-index="3" src="../pics/drive.png" s height="300"><br></p>
<img class="fragment fade-in-then-out" data-fragment-index="4" src="../pics/ownlegacycode_phd.png" height="400">
<img class="fragment fade-in-then-out" data-fragment-index="5" src="../pics/frustration2.gif" height="200"><br>
</div>
<div class="r-stack">
<p class="fragment fade-in-then-out" data-fragment-index="1">
Funders & publishers require it <br></p>
<p class="fragment fade-in-then-out" data-fragment-index="2">
Scientific peers & the public increasingly expect it</p>
<p class="fragment fade-in-then-out" data-fragment-index="3">
Win over academic staff (librarians, system administrators)</p>
<p class="fragment fade-in-then-out" data-fragment-index="4">
Your future self will be grateful</p>
<p class="fragment fade-in-then-out" data-fragment-index="5">
Without good RDM, any project becomes dreadful.</p>
</div>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div>
You write a paper and stay up late to generate nice figures.
But you have to tweak parameters & display options to make it work AND look good.
<br>The next morning, you have <strong>no idea which parameters produced which figures, and which of the figures
fits to what you report in the paper</strong>.<br>
<img height="400" src="../pics/turingway/findfiles.png">
<img height="400" src="../pics/turingway/projectstack.png"</div>
<imgcredit>Illustration adapted from Scriberia and The Turing Way</imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div>
Your research project produces phenomenal results, but your laptop,
<strong>the only place that stores the source code </strong> for the results, is
<strong>stolen/breaks</strong> <br>
<img height="700" src="../pics/stolenlaptop.jpg"></div>
<imgcredit>https://co.pinterest.com/pin/551128073121451139//imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div>
A student approaches their supervisor and complains that the
research idea does not work. But <strong>oral communication doesn't help</strong> - the
student can't sufficiently explain their environment (data, algorithms,
...), and <strong>if the supervisor can't explore the students project there's no way to find a fix.</strong>
<br>
<img height="500" src="../pics/badsupervision.gif"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div>
A Post-doc wrote a script during the PhD that applied a specific
method to a dataset. Now, with new data and a new project, they
try to reuse the script, but <strong>forgot how it worked</strong>.
<br>
<img height="500" src="../pics/frustration.jpg"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section data-transition="None">
<h2>common problems in science</h2>
<div>
You try to recreate results from another lab's published paper.
You base your <strong>re-implementation</strong> on everything reported in their paper,
but the results you obtain <strong>look nowhere like the original</strong>.
<br>
<img height="500" src="../pics/turingway/ReadableCode.png"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section>
<h2>Sounds familiar?</h2>
Show off hands: Did you encounter any of those in your work so far?
<br><br>
<div class="fragment fade-in">
All these problems were paraphrased from
<a href="https://sci-hub.se/https://link.springer.com/chapter/10.1007%2F978-1-4612-2544-7_5" target="_blank">
Buckheit & Donoho, <b>1995</b></a>
<br><br><br>
<img src="../pics/munafo_nathumbehav_screenshot.png" style="box-shadow: 10px 10px 8px #888888;height=400px" height="400"><br>
</div>
<!--<table>
<tr>
<td style="width:40%">
<ol>
<li>Forgot how own results were generated</li>
<li>Lost single source of data</li>
<li>Miscommunication about analysis with supervisor</li>
<li>Can't get previous code to run</li>
<li>Failure to reproduce other's work</li>
<li>Something else related to reproducibility</li>
</ol>
</td>
<td>
<iframe src="https://directpoll.com/r?XDbzPBd3ixYqg80MlfkonQrATgiLIMEH4Ji3DzR6Pqd3m",
style="border: 0" width="1500" height="700"></iframe>
</td>
</tr>
</table>-->
</section>
</section>
<section>
<section data-transition="None">
<h2>The road to reproducibility</h2>
<img src="../pics/reproduciblejourney.png">
<imgcredit>CC-BY Scriberia and <a href="https://the-turing-way.netlify.app/reproducible-research/rdm.html" target="_blank">
The Turing Way</a>
</imgcredit>
</section>
<section data-transition="None">
<dl>
<dt>The building blocks of a scientific result are rarely static</dt>
<table>
<tr>
<td style="vertical-align:middle">Analysis code evolves<br>
<small>(Fix bugs, add functions,
refactor, ...)</small></td>
<td>
<img src="../pics/final.png" height="500">
<imgcredit>Based on Piled Higher and Deeper
<a href="https://phdcomics.com/comics/archive_print.php?comicid=1531" target="_blank">
1531
</a> </imgcredit></td>
</tr>
</table>
</dl>
<img src="../pics/findfiles.png" height="400">
<img src="../pics/projectstack.png" height="350">
<imgcredit >Scriberia and <a href="https://the-turing-way.netlify.app">The Turing Way </a> (CC-BY)</imgcredit>
</section>
<section data-transition="None">
<h2>Version control</h2>
<table>
<tr>
<td>
<img src="../pics/turingway/ProjectHistory.png" width="500">
<imgcredit><a href="https://the-turing-way.netlify.app/reproducible-research/vcs/vcs-data.html" target="_blank">
CC-BY Scriberia & The Turing Way</a>
</imgcredit>
</td>
<td>
<ul style="font-size:35px">
<li>keep things organized</li>
<li>track changes, revert them or go <br>
back to previous states</li>
<li>collect and share digital provenance</li>
<li>collaborate and distribute</li>
<li>industry standard: <a href="https://git-scm.com" target="_blank">Git</a></li>
</ul>
</td>
</tr>
</table>
<img src="../pics/git.png" height="100px">
<img src="../pics/git-paper.png">
</section>
<section data-transition="None">
<dl>
<dt>The building blocks of a scientific result are rarely static</dt>
<table>
<tr>
<td style="vertical-align:middle">Data changes <br>
<small>(errors are fixed, data is extended,<br>
naming standards change, an analysis <br>
requires only a subset of your data...)</small></td>
<td>
<div class="r-stack">
<img src="../pics/phd052810s.png" height="400">
</div>
<imgcredit>Piled Higher and Deeper
<a href="https://phdcomics.com/comics/archive_print.php?comicid=1323" target="_blank">
1323
</a> </imgcredit></td>
</tr>
</table>
</dl>
<p class="fragment fade-in" data-fragment-index="2">
Large data version control (e.g., <a href="https://git-annex.branchable.com" target="_blank">git-annex</a>,
<a href="https://datalad.org" target="_blank">DataLad</a>)
<div class="r-stack">
<img class="fragment fade-in" data-fragment-index="2" src="../pics/tigdata.png">
<img class="fragment fade-in" data-fragment-index="3" src="../pics/tigdata3.png">
<img class="fragment fade-in" data-fragment-index="4" src="../pics/tigdata2.png">
</div>
</section>
<section data-transition="None">
<h2>Leaving a trace </h2>
<div class="r-stack">
<p class="fragment fade-out" data-fragment-index="1">"Shit, which version of which script produced these outputs from which version
of what data?"</p>
<p class="fragment fade-in" data-fragment-index="1">
"Shit, why buttons did I click and in which order did I use all those tools?"</p>
</div>
<div class="r-stack">
<p>
<img class="fragment fade-in-then-out" data-fragment-index="1" src="../pics/manuallabor.png">
<img class="fragment fade-out" data-fragment-index="2" src="../pics/findfiles.png" height="300">
<img class="fragment fade-out" data-fragment-index="2" src="../pics/projectstack.png" height="300">
<imgcredit>CC-BY Scriberia and <a href="https://the-turing-way.netlify.app/reproducible-research/rdm.html" target="_blank">
The Turing Way</a>
</imgcredit>
</p>
<p>
<img class="fragment fade-in" data-fragment-index="2" height="200px" src="../pics/file-management-manual-with-text.png">
<img class="fragment fade-in" data-fragment-index="3" height="200px" src="../pics/documentation.png">
<img class="fragment fade-in" data-fragment-index="4" height="200px" src="../pics/turingway/MachineReadable.png">
</p>
</div>
<div style="font-size:30px">
<p class="fragment fade-in" data-fragment-index="2">1) Create an intuitive structure (<a href="https://bids-specification.readthedocs.io/en/stable/" target="_blank">BIDS</a>!!!), and </p>
<p class="fragment fade-in" data-fragment-index="3">2) write (plenty! of) documentation as you go, and<br></p>
<p class="fragment fade-in" data-fragment-index="4">
3) make your processes machine-readable <br><small>Tools and tricks: Perkel, 2020,
<a href="https://www.nature.com/articles/d41586-020-02462-7" target="_blank">
checklist for computational reproducibility
</a></small>
</p></div>
</section>
<section data-transition="None">
<h2>Methods documentation and provenance</h2>
Analytic flexibility leads to sizeable variations in results
<br><small>(see e.g., Carp. 2012 and Botvinik-Nezer, 2020 for examples from neuroimaging)</small><br>
<img src="../pics/sidney_harris_miracle.jpg" style="box-shadow: 10px 10px 8px #888888;height=500px" height="500"><br>
<ul>
<li>provide information on how data came into existence</li>
<li>change data through documented code, not manually</li>
<li>relate changes in data to changes in code</li>
</ul>
</section>
</section>
<section>
<section data-transition="None">
<img style="height:150px;margin-bottom:30px" data-src="../pics/datalad_logo_wide.svg"><br>
<ul style="font-size:37px">
<li>Domain-agnostic <strong>command-line tool</strong> (+ <strong>graphical user interface</strong>),
built on top of <a href="https://git-scm.com/" target="_blank">Git</a>
& <a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a></li>
<li>10+ year open source project (100+ contributors), available for all major OS</li>
<li>Major features:</li>
<dt>Version-controlling arbitrarily large content </dt>
<dd>Version control data & software alongside to code!</dd>
<dt>Transport mechanisms for sharing, updating & obtaining data </dt>
<dd>Consume & collaborate on data (analyses) like software</dd>
<dt>(Computationally) reproducible data analysis</dt>
<dd>Track and share provenance of all digital objects</dd>
<dt>(... and <i>much</i> more) </dt>
<br>
</ul>
</section>
<section data-transition="None">
<h3>Examples of what DataLad can be used for:</h3>
<ul>
<li>
Install yourself a dataset to hack on from <a href="https://openneuro.org/" target="_blank"> OpenNeuro</a>:
</li>
</ul>
<img height="500" src="../pics/openneuro_new_2.gif" alt="a screenrecording of browsing open neuro">
</section>
<section data-transition="None">
<h3>Examples of what DataLad can be used for:</h3>
<ul>
<li>
Or <b>publish or consume datasets</b> from Git Forges - or many other services
</li>
</ul>
<img height="500" src="../pics/getdata_studyforrest.gif" alt="a screenrecording of cloning studyforrest data from github">
</section>
<section data-transition="None">
<h3>Examples of what DataLad can be used for:</h3>
<ul>
<li>
<b>Create and share reproducible science</b>: Data, software, code, and provenance
</li>
</ul>
<img height="500" src="../pics/remodnavpaper_2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
</section>
<section>
<h2>Further Information</h2>
<ul>
<br>
<ul style="font-size:30px">
<dt>Reach out to the <b>DataLad</b> team via</dt>
<li>
<a href="https://matrix.to/#/!NaMjKIhMXhSicFdxAj:matrix.org?via=matrix.waite.eu&via=matrix.org&via=inm7.de" target="_blank">
Matrix</a> (free, decentralized communication app, no app needed).
We run a weekly Zoom office hour (Monday, 2pm Berlin time) from this room as well.
</li>
<li>the development repositories on
<a href="https://github.com/datalad/datalad" target="_blank">
<s>GitHub</s></a> Codeberg: <a href="https://codeberg.org/datalad">codeberg.org/datalad</a>
</li>
<li><strong>Talk to me, Michael, Stephan or Yarik right here, or hack with us!</strong></li>
<br>
<dt>Reach out to the user community with</dt>
<li>A question on <a href="https://neurostars.org/" target="_blank">neurostars.org</a>
with a <code>datalad</code> tag</li>
<br>
<dt>Find more user tutorials or workshop recordings</dt>
<li>On DataLad's YouTube channel <a href="https://www.youtube.com/channel/datalad" target="_blank">
(www.youtube.com/channel/datalad) </a>
</li>
<li>
In the DataLad Handbook<a href="http://handbook.datalad.org/en/latest/" target="_blank">
(handbook.datalad.org)</a>
</li>
<li>In the DataLad RDM course <a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">
(psychoinformatics-de.github.io/rdm-course)</a> </li>
<li>In the Official API documentation <a href="http://docs.datalad.org" target="_blank">
(docs.datalad.org)</a> </li>
<br>
<li>On the advantages of decentralized research data management:
<a href="https://www.degruyter.com/document/doi/10.1515/nf-2020-0037/html" target="_blank">
doi.org/10.1515/nf-2020-0037
</a></li>
</ul>
</ul>
<br>
Install it on your own hardware: <a href="http://handbook.datalad.org/r.html?install" target="_blank">handbook.datalad.org/r.html?install</a>
</section>
</section>
<!-- End of slides -->
</div>
</div>
<script src="../reveal.js/dist/reveal.js"></script>
<script src="../reveal.js/plugin/notes/notes.js"></script>
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// The "normal" size of the presentation, aspect ratio will be preserved
// when the presentation is scaled to fit different resolutions. Can be
// specified using percentage units.
width: 1280,
height: 960,
// Factor of the display size that should remain empty around the content
margin: 0.01,
// Bounds for smallest/largest possible scale to apply to content
minScale: 0.2,
maxScale: 2.0,
controls: true,
progress: true,
history: true,
center: true,
slideNumber: 'c',
pdfSeparateFragments: false,
pdfMaxPagesPerSlide: 1,
pdfPageHeightOffset: -1,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Learn about plugins: https://revealjs.com/plugins/
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
});
</script>
</body>
</html>