1676 lines
68 KiB
HTML
1676 lines
68 KiB
HTML
<!doctype html>
|
|
<html>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
|
|
|
<!-- Edit me start! -->
|
|
<title>Love your data</title>
|
|
<meta name="description" content=" Data & Reproducibility Management with DataLad ">
|
|
<meta name="author" content=" Adina Wagner ">
|
|
<!-- Edit me end! -->
|
|
|
|
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
|
|
<link rel="stylesheet" href="../css/main.css">
|
|
<!-- Theme used for syntax highlighted code -->
|
|
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
|
|
</head>
|
|
<body>
|
|
<div class="reveal">
|
|
<div class="slides">
|
|
|
|
<!--...Datalad Basics...-->
|
|
|
|
<section>
|
|
|
|
|
|
<section>
|
|
<h2>Data and Reproducibility Management with DataLad</h2>
|
|
|
|
<div style="margin-top:1em;text-align:center">
|
|
<table style="border: none;">
|
|
<tr>
|
|
<td style="border: none;">Adina Wagner
|
|
<br><small>
|
|
<a href="https://mas.to/@adswa" target="_blank">
|
|
<img data-src="../pics/mastodon.svg" style="height:30px;margin:0px" />
|
|
mas.to/@adswa</a></small></td>
|
|
<td style="border: none;">
|
|
<br></td>
|
|
</tr>
|
|
<tr>
|
|
<td style="border: none; vertical-align:top">
|
|
<small><a href="http://psychoinformatics.de" target="_blank">Psychoinformatics lab</a>,
|
|
<br> Institute of Neuroscience and
|
|
Medicine, Brain & Behavior (INM-7)<br>
|
|
Research Center Jülich</small><br>
|
|
</td>
|
|
<td><img style="height:100px;margin-right:10px" data-src="../pics/fzj_logo.png" /></td>
|
|
</tr>
|
|
</table>
|
|
</div>
|
|
<p style="z-index: 100;position: fixed;background-color:#ede6d5;font-size:35px;box-shadow: 10px 10px 8px #888888;margin-top:0px;margin-bottom:100px;margin-left:1000px">
|
|
<img src="../pics/qr_lovedata.png" height="200">
|
|
</p>
|
|
<br><br><small>
|
|
|
|
Slides: <a href="https://doi.org/10.5281/zenodo.7627723" target="_blank">
|
|
DOI 10.5281/zenodo.7627723</a> (Scan the QR code)
|
|
</small>
|
|
|
|
|
|
</a>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Logistics</h2>
|
|
<ul style="font-size:35px">
|
|
<li class="fragment fade-in">
|
|
Collaborative, public notes, networking, & anonymous questions at <a href="https://etherpad.wikimedia.org/p/love-your-data-datalad" target="_blank">
|
|
etherpad.wikimedia.org/p/love-your-data-datalad</a>
|
|
</li>
|
|
<br>
|
|
<li class="fragment fade-in">
|
|
We are using a JupyterHub at <a href="https://datalad-hub.inm7.de" target="_blank">datalad-hub.inm7.de</a>.
|
|
Your username is the email you registered with e.g., a.wagner@fz-juelich.de → a.wagner <br>
|
|
You can log in with a password of your choice.
|
|
</li>
|
|
<br>
|
|
|
|
<li class="fragment fade-in">
|
|
Format:
|
|
</li>
|
|
<ul class="fragment fade-in">
|
|
<li>Mostly hands-on: Watch me live-code, and try out the software
|
|
yourself in the browser. Conceptual wrap-up at the end.</li>
|
|
<li>Ask questions any time (But please mute yourself when you don't speak &
|
|
make use of the "Raise hand" feature)</li>
|
|
<li>Quick ☕-break after ~1 hour</li>
|
|
</ul>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Further resources and stay in touch</h2>
|
|
<ul>
|
|
If you have questions after the workshop...
|
|
<br><br>
|
|
<ul style="font-size:35px">
|
|
<dt>Reach out to to the <b>DataLad</b> team via</dt>
|
|
<li>
|
|
<a href="https://matrix.to/#/!NaMjKIhMXhSicFdxAj:matrix.org?via=matrix.waite.eu&via=matrix.org&via=inm7.de" target="_blank">
|
|
Matrix</a> (free, decentralized communication app, no app needed).
|
|
We run a weekly Zoom office hour (Tuesday, 4pm Berlin time) from this room as well.
|
|
</li>
|
|
<li>
|
|
<a href="https://github.com/datalad/datalad" target="_blank">
|
|
The development repository on GitHub</a>
|
|
</li>
|
|
<br>
|
|
<dt>Reach out to the (Neuro-) user community with</dt>
|
|
<li>A question on <a href="https://neurostars.org/" target="_blank">neurostars.org</a>
|
|
with a <code>datalad</code> tag</li>
|
|
<br>
|
|
<dt>Find more user tutorials or workshop recordings</dt>
|
|
<li>On <a href="https://www.youtube.com/datalad" target="_blank">
|
|
DataLad's YouTube channel</a>
|
|
</li>
|
|
<li>
|
|
In the <a href="http://handbook.datalad.org/en/latest/" target="_blank">
|
|
DataLad Handbook </a>
|
|
</li>
|
|
<li>In the <a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">DataLad RDM course</a> </li>
|
|
<li>In the <a href="http://docs.datalad.org" target="_blank">Official API documentation</a> </li>
|
|
<li> In an overview of most tutorials, talks, videos at
|
|
<a href="https://github.com/datalad/tutorials" target="_blank">github.com/datalad/tutorials</a> </li>
|
|
</ul>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Acknowledgements</h2>
|
|
<table>
|
|
<tr style="vertical-align:middle">
|
|
<td style="vertical-align:middle">
|
|
<dl>
|
|
<dt style="margin-top:20px">DataLad software <br>
|
|
& ecosystem</dt>
|
|
<dd style="margin-left:5px!important">
|
|
<ul style="margin-left:5px!important">
|
|
<li>Psychoinformatics Lab, <br>
|
|
Research center Jülich</li>
|
|
<li>Center for Open <br>
|
|
Neuroscience, <br>
|
|
Dartmouth College</li>
|
|
<li>Joey Hess (git-annex)</li>
|
|
<li><em>>100 additional contributors</em></li>
|
|
</ul>
|
|
</dd>
|
|
</td>
|
|
<td style="vertical-align:middle">
|
|
<div style="margin-bottom:-20px;text-align:center"><strong>Funders</strong></div>
|
|
<img style="height:150px;margin-right:50px" data-src="../pics/nsf.png" />
|
|
<img style="height:150px;margin-right:50pxi;margin-left:50px" data-src="../pics/binc.png" />
|
|
<img style="height:150px;margin-left:50px" data-src="../pics/bmbf.png" />
|
|
<div style="margin-top:-20px">
|
|
<img style="height:80px;margin-top:-40px;margin-left:40px" data-src="../pics/fzj_logo.svg" />
|
|
<img style="height:60px;margin-left:50px;margin-bottom:25px" data-src="../pics/dfg_logo.png" />
|
|
</div>
|
|
<div style="margin-top:-20px">
|
|
<img style="height:60px;margin-right:20px" data-src="../pics/erdf.png" />
|
|
<img style="height:60px;margin-right:20px" data-src="../pics/cbbs_logo.png" />
|
|
<img style="height:60px" data-src="../pics/LSA-Logo.png" />
|
|
</div>
|
|
<div style="margin-top:40px;margin-bottom:20px;text-align:center"><strong>Collaborators</strong></div>
|
|
<div style="margin-top:-20px">
|
|
<img style="height:100px;margin:20px" data-src="../pics/hbp_logo.png" />
|
|
<img style="height:100px;margin:20px" data-src="../pics/conp_logo.png" />
|
|
<img style="height:120px;margin:10px" data-src="../pics/openneuro_logo.png" />
|
|
</div>
|
|
<div style="margin-top:-40px">
|
|
<img style="height:100px;margin:20px" data-src="../pics/ebrains-logo.png"/>
|
|
<img style="height:100px;margin:0px" data-src="../pics/gin-logo.png" />
|
|
<img style="height:120px;margin:10px" data-src="../pics/sfb1451_logo.png" />
|
|
</div>
|
|
<div style="margin-top:-40px;align:middle">
|
|
<img style="height:140px;margin:10px" data-src="../pics/brainlife_logo.png" />
|
|
<img style="height:100px;margin:0px" data-src="../pics/cbrain_logo.png" />
|
|
<img style="height:100px;margin:20px" data-src="../pics/vbc_logo.png" />
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Let's get to know each other</h2>
|
|
Please use your phone to scan to QR code, or open the link in a new browser window <br>
|
|
<iframe src="https://www.directpoll.com/r?XDbzPBd3ixYqg8VRC7Mz8FH4nJ3iEPxTGiMZyeyf",
|
|
style="border: 0" width="900" height="800"></iframe>
|
|
</section>
|
|
|
|
<section>
|
|
<h3>DataLad usecases</h3>
|
|
<div class="r-stack">
|
|
<li data-fragment-index="1" class="fragment fade-in-then-out"> <b>Publish or consume datasets</b>
|
|
via GitHub, GitLab, OSF, the European Open Science Cloud, or similar services</li>
|
|
<li data-fragment-index="2" class="fragment fade-in-then-out">
|
|
Behind-the-scenes <b>infrastructure component for data transport and versioning</b>
|
|
(e.g., used by <a href="https://openneuro.org/" target="_blank"> OpenNeuro</a>,
|
|
<a href="https://brainlife.io/" target="_blank"> brainlife.io </a>,
|
|
the <a href="https://conp.ca/" target="_blank">Canadian Open Neuroscience Platform (CONP)</a>,
|
|
<a href="https://mcin.ca/technology/cbrain/" target="_blank"> CBRAIN</a>)</li>
|
|
<li data-fragment-index="3" class="fragment fade-in-then-out"><b>Central data management</b> and archival system</li>
|
|
<li data-fragment-index="4" class="fragment fade-in-then-out"><b>Decentral data and metadata catalog</b></li>
|
|
<li data-fragment-index="5" class="fragment fade-in-then-out"> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
|
|
</div>
|
|
<div class="r-stack">
|
|
<img data-fragment-index="1" height="700" class="fragment fade-in-then-out" src="../pics/getdata_studyforrest.gif" alt="a screenrecording of cloning studyforrest data from github">
|
|
<img height="700" class="fragment fade-in-then-out" data-fragment-index="2" src="../pics/openneuro_new_2.gif" alt="a screenrecording of browsing open neuro">
|
|
<img height="700" data-fragment-index="3" class="fragment fade-in-then-out" src="../pics/centralmanagement2.gif">
|
|
<img height="1000" data-fragment-index="4" class="fragment fade-in-then-out" src="../pics/sfb-catalog.gif">
|
|
<img height="700" class="fragment fade-in" data-fragment-index="5" src="../pics/remodnavpaper_2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
|
|
</div>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
|
|
<!-------Examples-------->
|
|
|
|
<section>
|
|
|
|
<section data-transition="None">
|
|
<h2>A common usecase</h2>
|
|
<div style="margin-top:0.5em;">
|
|
<table style="border: none;table-layout: fixed;">
|
|
<tr>
|
|
<td width="60%"><img style="height:500px; margin-top: 0; margin-right:1px;vertical-align:middle;" data-src="../pics/comic_box1.svg" /></td>
|
|
<td>
|
|
<ul style="vertical-align:middle;">
|
|
<li class="fragment fade-in">
|
|
Alice is a PhD student in a research team.</li>
|
|
<li class="fragment fade-in">
|
|
She works on a fairly typical research project:
|
|
Data collection & processing.</li>
|
|
<li class="fragment fade-in">
|
|
First sample → final result = complex process</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</div><br>
|
|
<h3 class="fragment fade-in">How does Alice go about her daily job?</h3>
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>A common usecase</h2>
|
|
<ul>
|
|
<li class="fragment fade-in">
|
|
In her project, Alice likes to have an automated record of:
|
|
<ul>
|
|
<li>when a given file was last changed</li>
|
|
<li>where it came from</li>
|
|
<li>what input files were used to generate a given output</li>
|
|
<li>why some things were done.</li>
|
|
</ul>
|
|
</li>
|
|
<br>
|
|
<li class="fragment fade-in">
|
|
Even if she doesn't share her work, this is essential for her future self</li>
|
|
<li class="fragment fade-in">
|
|
Her project is exploratory: Frequent changes to her analysis scripts</li>
|
|
<li class="fragment fade-in">
|
|
She enjoys the comfort of being able to return to a previously recorded state</li>
|
|
</ul>
|
|
<br><br>
|
|
<h3 class="fragment fade-in">This is: *local version control*</h3>
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>A common usecase</h2>
|
|
<ul>
|
|
<li class="fragment fade-in" data-fragment-index="1">
|
|
Alice's work is not confined to a single computer:
|
|
<ul>
|
|
<li>Laptop / desktop / remote server / dedicated back-up</li>
|
|
<li>Alice wants to automatically & efficiently synchronize</li>
|
|
</ul>
|
|
</li>
|
|
<br>
|
|
<li class="fragment fade-in" data-fragment-index="2">
|
|
Parts of the data are collected or analyzed by colleagues.
|
|
This requires:
|
|
<ul>
|
|
<li>distributed synchronization with centralized storage</li>
|
|
<li>preservation of origin & authorship of changes</li>
|
|
<li>effective combination of simultaneous contributions</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<br><br>
|
|
<h3 class="fragment fade-in" data-fragment-index="3">This is: *distributed version control*</h3>
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>A common usecase</h2>
|
|
<ul>
|
|
<li class="fragment fade-in">
|
|
Alice applies local version control for her own work, and reproducibly records it
|
|
</li>
|
|
<li class="fragment fade-in">
|
|
She also applies distributed version control when working with colleagues
|
|
and collaborators
|
|
</li>
|
|
<li class="fragment fade-in">
|
|
She often needs to work on a subset of data at any given time:
|
|
<ul>
|
|
<li>all files are kept on a server</li>
|
|
<li>a few files are rotated into and out of her laptop</li>
|
|
</ul>
|
|
</li>
|
|
<li class="fragment fade-in">
|
|
Alice wants to publish the data at project's end:
|
|
<ul>
|
|
<li>raw data / outputs / both</li>
|
|
<li>completely or selectively</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<br><br>
|
|
<h3 class="fragment fade-in">This is: *data management (with DataLad 😀)*</h3>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2>DataLad</h2>
|
|
<img style="height:300px; margin-top: 0; margin-right:1px;vertical-align:middle;" src="../pics/comic_box3.svg" alt="">
|
|
<br>
|
|
<ul style="font-size:37px">
|
|
<li>Domain-agnostic <strong>command-line tool</strong>
|
|
(+ <strong>graphical user interface</strong>),
|
|
built on top of <a href="https://git-scm.com/" target="_blank">Git</a>
|
|
& <a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a></li>
|
|
<li>Major features:</li>
|
|
<dt>Version-controlling arbitrarily large content </dt>
|
|
<dd>Version control data & software alongside to code!</dd>
|
|
<dt>Transport mechanisms for sharing & obtaining data </dt>
|
|
<dd>Consume & collaborate on data (analyses) like software</dd>
|
|
<dt>(Computationally) reproducible data analysis</dt>
|
|
<dd>Track and share provenance of all digital objects</dd>
|
|
<dt>(... and <i>much</i> more) </dt>
|
|
<br>
|
|
</ul>
|
|
|
|
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Let's try it out</h2>
|
|
<img src="../pics/jupyterhub-login.png">
|
|
<dl style="font-size:37px">
|
|
<a href="https://datalad-hub.inm7.de" target="_blank">datalad-hub.inm7.de</a>
|
|
<dt>username:</dt>
|
|
<dd>email without <code>@domain</code> (a.wagner@fz-juelich.de -> a.wagner)<br>
|
|
(must be the email you registered with for this workshop)</dd>
|
|
<dt>password:</dt>
|
|
<dd>Set at first login, at least 8 characters</dd>
|
|
</dl>
|
|
<p class="fragment fade-in"><strong>Important!</strong> The Hub is a shared resource. Don't fill it up :)</p>
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h3>Git identity setup</h3>
|
|
Check Git identity:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git config --get user.name
|
|
git config --get user.email
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
Configure Git identity:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git config --global user.name "Adina Wagner"
|
|
git config --global user.email "adina.wagner@t-online.de"
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Configure DataLad to use latest features:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git config --global --add datalad.extensions.load next
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h3>Using DataLad in a terminal</h3>
|
|
|
|
Check the installed version:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad --version
|
|
</code>
|
|
<p id="displayArea"></p>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
For help on using DataLad from the command line:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad --help
|
|
</code>
|
|
The help may be displayed in a pager - exit it by pressing "q"
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
For extensive info about the installed package, its dependencies, and extensions, use <code>datalad wtf</code>.
|
|
Let's find out what kind of system we're on:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad wtf -S system
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>Using datalad via its Python API</h3>
|
|
Open a Python environment:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
ipython
|
|
</code>
|
|
</pre>
|
|
<div class="fragment">
|
|
Import and start using:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-python" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
import datalad.api as dl
|
|
dl.create(path='mydataset')
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<div class="fragment">
|
|
Exit the Python environment:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-python" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
exit
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3 style="text-align: left;">Datalad datasets...</h3>
|
|
<img src="../pics/comic_box4.svg" alt="">
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Datalad datasets</h3>
|
|
Create a dataset (here, with the <code>yoda</code> configuration, which adds
|
|
a helpful structure and configuration for data analyses): <br>
|
|
<img height="100px" src="../pics/yoda.png">
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad create -c yoda my-analysis
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
Let's have a look inside. Navigate using <code>cd</code> (change directory):
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cd my-analysis
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
List the directory content, including hidden files, with <code>ls</code>:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
ls -la .
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3 style="text-align: left;">Version control...</h3>
|
|
<img src="../pics/comic_box5.svg" alt="">
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Version control</h3>
|
|
The yoda-configuration added a README placeholder in the dataset.
|
|
Let's add Markdown text (a project title) to it:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
echo "# My example DataLad dataset" > README.md
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
Now we can check the <code>status</code> of the dataset:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad status
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
We can save the state with <code>save</code>
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad save -m "Add project title into the README"
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Further modifications:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
echo "Contains a small data analysis for my project" >> README.md
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
You can also checkout what has changed:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git diff
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Save again:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad save -m "Add information on the dataset contents to the README"
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Version control</h3>
|
|
<div class="fragment">
|
|
Now, let's check the dataset history:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git log
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
We can also make the history prettier:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
tig
|
|
</code>
|
|
(navigate with arrow keys and enter, press "q" to go back and exit the program)
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Convenience functions make downloads easier. Let's add code for a data analysis from an external source:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">datalad download-url -m "Add an analysis script" \
|
|
-O code/classification_analysis.py \
|
|
https://raw.githubusercontent.com/datalad-handbook/resources/master/classification_analysis.py
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Check out the file's history:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">git log code/classification_analysis.py</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Local version control</h2>
|
|
|
|
<p>Procedurally, version control is easy with DataLad!</p>
|
|
<img class="fragment fade-in" src="../pics/local_wf.svg" height="500"> <!-- .element: class="fragment" -->
|
|
<br>
|
|
|
|
<b class="fragment fade-in">Advice:</b>
|
|
<ul>
|
|
<li class="fragment fade-in">Save <i>meaningful</i> units of change</li>
|
|
<li class="fragment fade-in">Attach helpful commit messages</li>
|
|
</ul>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3 style="text-align: left;">Computationally reproducible execution I...</h3>
|
|
<img src="../pics/comic_box7.svg" width="65%" alt="">
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out">which script/pipeline version</li>
|
|
<li class="fragment fade-in-then-semi-out">was run on which version of the data</li>
|
|
<li class="fragment fade-in-then-semi-out">to produce which version of the results?</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section style="text-align:left;">
|
|
<h3>... Computationally reproducible execution I</h3>
|
|
<div class="fragment">
|
|
A variety of processes can modify files. A simple example: Code formatting
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">black code/classification_analysis.py</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Version control makes changes transparent:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">git diff</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
But its useful to keep track beyond that. Let's discard the latest changes...
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">git restore code/classification_analysis.py</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
... and record precisely what we did
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">datalad run -m "Reformat code with black" \
|
|
"black code/classification_analysis.py"</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
let's take a look:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">git show</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
... and repeat!
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">datalad rerun</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3 style="text-align: left;">Data consumption & transport...</h3>
|
|
<img src="../pics/comic_box6_consumption.svg" alt="">
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Data consumption & transport...</h3>
|
|
|
|
You can install a dataset from remote URL (or local path) using <code>clone</code>.
|
|
Either as a stand-alone entity:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" >
|
|
# just an example:
|
|
datalad clone \
|
|
https://github.com/psychoinformatics-de/studyforrest-data-phase2.git
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
Or as linked dataset, nested in another dataset in a superdataset-subdataset hierarchy:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" >
|
|
# just an example:
|
|
datalad clone -d . \
|
|
https://github.com/psychoinformatics-de/studyforrest-data-phase2.git
|
|
</code>
|
|
</pre>
|
|
<img src="../pics/linkage_subds.png" alt="">
|
|
</div>
|
|
<ul style="font-size:30px" class="fragment">
|
|
<li>Helps with scaling (see e.g. the <a href="https://github.com/datalad-datasets/human-connectome-project-openaccess" target="_blank">Human Connectome Project dataset</a> )</li>
|
|
<li>Version control tools struggle with >100k files</li>
|
|
<li>Modular units improves intuitive structure and reuse potential</li>
|
|
<li>Versioned linkage of inputs for reproducibility</li>
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Dataset nesting</h3>
|
|
|
|
Let's make a nest!
|
|
<div class="fragment">
|
|
Clone a dataset with analysis data into a specific
|
|
location ("input/") in the existing dataset,
|
|
making it a <em>sub</em>dataset:
|
|
<pre style="margin-left: 0;">
|
|
<code class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">datalad clone --dataset . \
|
|
https://github.com/datalad-handbook/iris_data.git \
|
|
input/</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Let's see what changed in the dataset, using the <code>subdatasets</code> command:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad subdatasets
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<div class="fragment">
|
|
... and also <code>git show</code>:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git show
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
|
|
<section style="text-align:left;">
|
|
<div class="fragment">
|
|
We can now view the cloned dataset's file tree:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cd input
|
|
ls
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
...and also its history
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
tig
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Let's check the dataset size (with the <code>du</code> disk-usage command):
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
du -sh
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Let's check the <em>actual</em> dataset size:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad status --annex
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Let's check try to print the file contents into the terminal (<code>cat</code>):
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cat iris.csv
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Data consumption & transport</h3>
|
|
|
|
We can retrieve actual file content with <code>get</code>:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad get iris.csv
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
If we don't need a file locally anymore, we can <code>drop</code> its content:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad drop iris.csv</code>
|
|
</pre>
|
|
</div>
|
|
<div class="fragment">
|
|
No need to store all files locally, or archive results with
|
|
Giga/Terra-Bytes of source data:
|
|
<pre><code class="python">dl.get('input/sub-01')
|
|
[really complex analysis]
|
|
dl.drop('input/sub-01')</code></pre>
|
|
If data is published anywhere, your data analysis can carry an actionable link to it,
|
|
with barely any space requirements.
|
|
</div>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Git versus Git-annex</h2>
|
|
<dl>
|
|
<dt>Data in datasets is either stored in Git or git-annex</dt>
|
|
<dd>By default, everything is <i>annexed</i>, i.e., stored in a dataset annex by git-annex</dd><br>
|
|
<img height="500" src="../pics/artwork/src/publishing/publishing_gitvsannex.svg">
|
|
<br><br>
|
|
<li class="fragment fade-in-then-semi-out">With annexed data, only content identity (hash)
|
|
and location information is put into Git, rather than file content.
|
|
The annex, and transport to and from it is managed with <b>git-annex</b>
|
|
</dl>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Git versus Git-annex</h2>
|
|
<dl>
|
|
<dt>Configurations (e.g., YODA), custom <a href="http://handbook.datalad.org/en/latest/basics/101-123-config2.html" target="_blank">
|
|
rules</a>, or command parametrization determines if a file is annexed</dt>
|
|
<dd>Storing files in Git or git-annex has distinct advantages:</dd><br>
|
|
|
|
<br>
|
|
|
|
<table >
|
|
<tr style="font-size:35px">
|
|
<td><b>Git</b></td>
|
|
<td><b>git-annex</b></td>
|
|
</tr>
|
|
<tr style="font-size:30px">
|
|
<td>handles <b>small</b> files well (text, code)</td>
|
|
<td>handles <b>all</b> types and sizes of files well</td>
|
|
</tr>
|
|
<tr style="font-size:30px">
|
|
<td>file contents are in the Git history
|
|
and will be <b>shared</b> upon git/datalad push</td>
|
|
<td>file contents are in the annex. Not necessarily shared</td>
|
|
</tr>
|
|
<tr style="font-size:30px">
|
|
<td>Shared with every dataset clone</td>
|
|
<td><b>Can be kept private</b> on a per-file level when sharing the dataset</td>
|
|
</tr>
|
|
<tr style="font-size:30px">
|
|
<td>Useful: Small, non-binary, frequently modified, need-to-be-accessible (DUA, README) files </td>
|
|
<td>Useful: Large files, private files</td>
|
|
</tr>
|
|
</table>
|
|
<br><br>
|
|
<div style="text-align:center" class="fragment">YODA configures the contents of the <code>code/</code>
|
|
directory and the dataset descriptions (e.g., README files) to be in Git.
|
|
There are many other configurations, and you can also
|
|
<a href="http://handbook.datalad.org/en/latest/basics/101-124-procedures.html" target="_blank">
|
|
write your own</a>.<br>
|
|
<img height="100px" src="../pics/yoda.png">
|
|
</div>
|
|
</dl>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section style="text-align: left;">
|
|
<h3>...Computationally reproducible execution...</h3>
|
|
|
|
Try to execute the downloaded analysis script. Does it work?
|
|
<div><pre style="margin-left: 0;"><code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cd ..
|
|
python code/classification_analysis.py</code></pre></div>
|
|
|
|
<ul class="fragment">
|
|
<li>
|
|
Software can be difficult or impossible to install (e.g. conflicts with existing software,
|
|
or on HPC) for you or your collaborators
|
|
</li>
|
|
<li>
|
|
Different software versions/operating systems can produce different results:
|
|
<a href="https://doi.org/10.3389/fninf.2015.00012" target="_blank">Glatard et al., doi.org/10.3389/fninf.2015.00012</a>
|
|
</li>
|
|
<li class="fragment fade-in">
|
|
<strong>Software containers</strong> encapsulate a software environment and isolate it from
|
|
a surrounding operating system. Two common solutions: Docker, Singularity
|
|
</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Computationally reproducible execution...</h3>
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out">The <code>datalad run</code>
|
|
can run any command in a way that links the command or script to the
|
|
results it produces and the data it was computed from</li>
|
|
<li class="fragment fade-in-then-semi-out">The <code>datalad rerun</code>
|
|
can take this recorded provenance and recompute the command</li>
|
|
<li class="fragment fade-in-then-semi-out">The <code>datalad containers-run</code>
|
|
(from the extension "datalad-container") can capture software provenance in the form of software containers in addition to the provenance that datalad run captures</li>
|
|
</ul>
|
|
<br><br>
|
|
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Computationally reproducible execution</h3>
|
|
|
|
<div class="fragment">
|
|
With the <code>datalad-container</code> extension, we can add software containers
|
|
to datasets and work with them.
|
|
Let's add a software container with Python software to run the script
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad containers-add python-env --url shub://adswa/resources:2
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<div class="fragment">
|
|
inspect the list of registered containers:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad containers-list
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Now, let's try out the <code>containers-run</code> command:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad containers-run -m "run classification analysis in python environment" \
|
|
--container-name python-env \
|
|
--input "input/iris.csv" \
|
|
--output "pairwise_relationships.png" \
|
|
--output "prediction_report.csv" \
|
|
"python3 code/classification_analysis.py {inputs} {outputs}"
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<div class="fragment">
|
|
What changed after the <code>containers-run</code> command has completed?
|
|
<br>
|
|
We can use <code>datalad diff</code> (based on <code>git diff</code>):
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad diff -f HEAD~1
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
We see that some files were added to the dataset!
|
|
<br>
|
|
And we have a complete provenance record as part of the git history:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
git log -n 1
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h3 style="text-align: left;">Publishing datasets...</h3>
|
|
<div style="margin-top:1em;">
|
|
<table style="border: none;">
|
|
<tr>
|
|
<td><img style="width: 800px; margin-right:1px;margin-bottom:10px;vertical-align:middle;" data-src="../pics/comic_box6_publishing.svg" /></td>
|
|
<td><img style="width: 1000px; margin-right:1px;margin-bottom:10px;vertical-align:middle;" data-src="../pics/comic_box9.svg" /></td>
|
|
</tr>
|
|
</table>
|
|
</div>
|
|
<br>
|
|
<div class="fragment">We will use GIN: <a href="https://gin.g-node.org/" target="_blank">gin.g-node.org</a>:</div>
|
|
<img class="fragment" src="../pics/artwork/src/publishing/startingpoint.svg">
|
|
</section>
|
|
|
|
<section>
|
|
<h3 style="text-align: left;">Publishing datasets...</h3>
|
|
<ul>
|
|
<li>Create a GIN user account and log in:
|
|
<a href="https://gin.g-node.org/user/sign_up" target="_blank">gin.g-node.org/user/sign_up</a> </li>
|
|
<li>
|
|
<a href="https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent?platform=linux" target="_blank">
|
|
Create</a> an SSH key </li>
|
|
<div>
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
ssh-keygen -t ed25519 -C "your-email"
|
|
eval "$(ssh-agent -s)"
|
|
ssh-add ~/.ssh/id_ed25519
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<li> <a href="https://handbook.datalad.org/en/latest/basics/101-139-gin.html#prerequisites" target="_blank">
|
|
upload</a> the SSH key to GIN</li>
|
|
<div class="fragment">
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cat ~/.ssh/id_ed25519.pub
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<img src="../pics/screenshot-gin3.png" height="400">
|
|
<li>Publish your dataset!</li>
|
|
</ul>
|
|
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>...Publishing datasets</h3>
|
|
|
|
DataLad has convenience functions to create <code>sibling</code>-repositories
|
|
on various infrastructure and third party services (GitHub, GitLab, OSF, WebDAV-based services, DataVerse, ...)
|
|
, to which data can then be published with <code>push</code>.
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad create-sibling-gin example-analysis --access-protocol ssh
|
|
</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
You can verify the dataset's siblings with the <code>siblings</code> command:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad siblings
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
And we can push our complete dataset (Git repository and annex) to GIN:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad push --to gin
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<img class="fragment" src="../pics/in_case_of_fire.png" style="border:20px; margin:0px; float:center; width:500px;"/>
|
|
</section>
|
|
|
|
|
|
<section style="text-align: left;">
|
|
<h3>Using published data...</h3>
|
|
|
|
Let's see how the analysis feels like to others:
|
|
<br><br>
|
|
<pre style="margin-left: 0;">
|
|
<code class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">cd ../
|
|
datalad clone \
|
|
https://gin.g-node.org/adswa/example-analysis \
|
|
myclone</code>
|
|
</pre>
|
|
|
|
<div class="fragment">
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
cd myclone
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Get results:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad get prediction_report.csv
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
<div class="fragment">
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad drop prediction_report.csv
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="fragment">
|
|
Or recompute results:
|
|
<pre style="margin-left: 0;">
|
|
<code data-trim class="language-bash" onmousemove="showHover(event)" onmousedown="clickCopy(event)" onmouseleave="leaveElement(event)">
|
|
datalad rerun
|
|
</code>
|
|
</pre>
|
|
</div>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<section>
|
|
<h2>How does this relate to reproducibility?</h2>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Exhaustive tracking</h2>
|
|
<dl style="font-size:35px">
|
|
<dt>The building blocks of a scientific result are rarely static</dt>
|
|
<table>
|
|
<tr>
|
|
<td style="vertical-align:middle">Data changes <br>
|
|
<small>(errors are fixed, data is extended,<br>
|
|
naming standards change, an analysis <br>
|
|
requires only a subset of your data...)</small></td>
|
|
<td><img src="../pics/phd052810s.png" height="500">
|
|
<imgcredit>Piled Higher and Deeper
|
|
<a href="https://phdcomics.com/comics/archive_print.php?comicid=1323" target="_blank">
|
|
1323
|
|
</a> </imgcredit></td>
|
|
</tr>
|
|
</table>
|
|
</dl>
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>Exhaustive tracking</h2>
|
|
"Shit, which version of which script produced these outputs from which version
|
|
of what data... and which software version?"<br>
|
|
<img src="../pics/manuallabor.png">
|
|
<img src="../pics/findfiles.png" height="400">
|
|
<img src="../pics/projectstack.png" height="350">
|
|
<imgcredit>CC-BY Scriberia and <a href="https://the-turing-way.netlify.app/reproducible-research/rdm.html" target="_blank">
|
|
The Turing Way</a>
|
|
</imgcredit>
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h3>Exhaustive tracking</h3>
|
|
Once you track changes to data with version control tools,
|
|
you can find out <em>why</em> it changed, <em>what</em> has changed, <em>when</em> it changed,
|
|
and <em>which version</em> of your data was used at which point in time.
|
|
<div class="r-stack">
|
|
<img height="450px" class="fragment fade-out" data-fragment-index="1" src="../pics/tigdata.png">
|
|
<img height="450px" class="fragment" data-fragment-index="1" src="../pics/tigdata3.png">
|
|
<img height="450px" class="fragment" src="../pics/tigdata2.png">
|
|
</div>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Digital provenance</h2>
|
|
<ul>
|
|
<p >
|
|
= <i>"The tools and processes used to create a
|
|
digital file, the responsible entity, and when and where the process
|
|
events occurred"</i>
|
|
</p>
|
|
<li class="fragment fade-in">
|
|
Have you ever saved a PDF to read later onto your computer, but forgot
|
|
where you got it from? Or did you ever find a figure in your project,
|
|
but forgot which analysis step produced it?
|
|
</li>
|
|
<img src="../pics/Provenance_alpha.png">
|
|
<imgcredit data-fragment-index="1" >Scriberia and <a href="https://the-turing-way.netlify.app">The Turing Way </a> (CC-BY)</imgcredit>
|
|
</ul>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h3>Data transport: Security and reliability - for data</h3>
|
|
Decentral version control for data integrates with a variety of services
|
|
to let you store data in different places - creating a resilient network for data
|
|
<img src="../pics/decentral_RDM_overview_left.png">
|
|
<small> <a href="https://doi.org/10.1515/nf-2020-0037" target="_blank">"In defense of decentralized Research Data Management", doi.org/10.1515/nf-2020-0037</a> </small>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h3>Ultimate goal: Reusability</h3>
|
|
Teamscience on more than code:
|
|
<img src="../pics/teamscience.png">
|
|
<img class="fragment" src="../pics/datahistory.png">
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3>The YODA principles</h3>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>DataLad Datasets for data analysis</h2>
|
|
|
|
<ul style="font-size:30px">
|
|
<li>A DataLad dataset can have <i>any</i> structure, and use as many or few
|
|
features of a dataset as required.</li>
|
|
|
|
<li>However, for <b>data analyses</b> it is beneficial to make
|
|
use of DataLad features and structure datasets according to the <b>YODA principles</b>:</li>
|
|
</ul>
|
|
|
|
<img style="" data-src="../pics/yoda.png" height="200">
|
|
<dl style="font-size:30px">
|
|
<dt>P1: One thing, one dataset</dt>
|
|
<dt>P2: Record where you got it from, and where it is now</dt>
|
|
<dt>P3: Record what you did to it, and with what</dt>
|
|
</dl><br><br<br>
|
|
<small>Find out more about the YODA principles in
|
|
<a href="http://handbook.datalad.org/en/latest/basics/101-127-yoda.html" target="_blank">
|
|
the handbook</a>, and more about structuring dataset at
|
|
<a href="https://psychoinformatics-de.github.io/rdm-course/02-structuring-data/index.html#example-structure-yoda-principles" target="_blank">
|
|
psychoinformatics-de.github.io/rdm-course/02-structuring-data</a>
|
|
</small>
|
|
</section>
|
|
|
|
<section data-markdown style="font-size:30px">
|
|
## P1: One thing, one dataset
|
|

|
|
|
|
- Create **modular** datasets: Whenever a particular collection of files could anyhow be useful in more
|
|
than one context (e.g. data), put them in their own dataset, and install it as
|
|
a subdataset.
|
|
- Keep everything **structured**: Bundle all components of one analysis into one superdataset, and
|
|
within this dataset, separate code, data, output, execution environments.
|
|
- Keep a dataset **self-contained**, with relative paths in scripts to subdatasets or files.
|
|
Do not use absolute paths.
|
|
|
|
</section>
|
|
|
|
<section style="font-size:30px" data-transition="None">
|
|
<h2>Why Modularity?</h2>
|
|
<ul>
|
|
<li>1. Reuse and access management</li>
|
|
<li>2. Scalability</li>
|
|
<li>3. Transparency</li><br>
|
|
|
|
Original:
|
|
<pre><code class="sh" style="max-height:none" data-trim>
|
|
/dataset
|
|
├── sample1
|
|
│ └── a001.dat
|
|
├── sample2
|
|
│ └── a001.dat
|
|
...
|
|
</code></pre>
|
|
<div class="fragment">
|
|
Without modularity, after applied transform (preprocessing, analysis, ...):
|
|
<pre><code class="sh" style="max-height:none" data-trim>
|
|
/dataset
|
|
├── sample1
|
|
│ ├── ps34t.dat
|
|
│ └── a001.dat
|
|
├── sample2
|
|
│ ├── ps34t.dat
|
|
│ └── a001.dat
|
|
...
|
|
</code></pre>
|
|
Without expert/domain knowledge, no distinction between original and derived data
|
|
possible.
|
|
</div>
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section style="font-size:30px" data-transition="None">
|
|
<h2>Why Modularity?</h2>
|
|
<ul>
|
|
<li>3. Transparency</li><br>
|
|
|
|
Original:
|
|
<pre><code class="sh" style="max-height:none" data-trim>
|
|
/raw_dataset
|
|
├── sample1
|
|
│ └── a001.dat
|
|
├── sample2
|
|
│ └── a001.dat
|
|
...
|
|
</code></pre>
|
|
<strong>With modularity</strong> after applied transform (preprocessing, analysis, ...)
|
|
<pre><code class="sh" style="max-height:none" data-trim>
|
|
/derived_dataset
|
|
├── sample1
|
|
│ └── ps34t.dat
|
|
├── sample2
|
|
│ └── ps34t.dat
|
|
├── ...
|
|
└── inputs
|
|
└── raw
|
|
├── sample1
|
|
│ └── a001.dat
|
|
├── sample2
|
|
│ └── a001.dat
|
|
...
|
|
</code></pre>
|
|
Clearer separation of semantics, through use of pristine version of original dataset within a
|
|
<em>new, additional</em> dataset holding the outputs.</ul>
|
|
</section>
|
|
|
|
|
|
<section style="font-size:30px" data-transition="None" data-markdown><script type="text/template">
|
|
## When to modularize?
|
|
|
|
- Target audience is different
|
|
- public vs. private
|
|
- domain specific vs. domain general
|
|
|
|
- Pace of evolution is different
|
|
- "factual" raw data vs. choices of (pre-)processing
|
|
- completed acquisition vs. ongoing study
|
|
|
|
- Size impacts I/O and logistics
|
|
- Git can struggle with 1M+ files
|
|
- filesystems (licensing) can struggle with large numbers of inodes
|
|
- More infos: [Go Big or Go Home chapter](http://handbook.datalad.org/en/latest/beyond_basics/basics-scaling.html)
|
|
|
|
- Legal/Access constraints
|
|
- personal vs. anonymized data
|
|
|
|
<aside class="notes">
|
|
Note to self
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
|
|
<section style="font-size:30px" data-markdown data-transition="None">
|
|
## P2: Record where you got it from, and where it is now
|
|

|
|
|
|
- **Link** individual datasets to declare data-dependencies (e.g. as subdatasets).
|
|
- **Record data's origin** with appropriate commands, for example
|
|
to record access URLs for individual files obtained from (unstructured) sources "in the cloud".
|
|
- Share and **publish** datasets for collaboration or back-up.
|
|
|
|
</section>
|
|
|
|
|
|
<section data-transition="None" style="font-size:30px">
|
|
<h2>Dataset linkage</h2>
|
|
<img data-src="../pics/dataset_linkage.png">
|
|
<pre><code class="bash" style="font-size:115%;max-height:none">$ datalad clone --dataset . http://example.com/ds inputs/rawdata
|
|
</code></pre>
|
|
|
|
<pre><code class="diff" style="max-height:none">$ git diff HEAD~1
|
|
diff --git a/.gitmodules b/.gitmodules
|
|
new file mode 100644
|
|
index 0000000..c3370ba
|
|
--- /dev/null
|
|
+++ b/.gitmodules
|
|
@@ -0,0 +1,3 @@
|
|
+[submodule "inputs/rawdata"]
|
|
+ path = inputs/rawdata
|
|
+ url = http://example.com/importantds
|
|
diff --git a/inputs/rawdata b/inputs/rawdata
|
|
new file mode 160000
|
|
index 0000000..fabf852
|
|
--- /dev/null
|
|
+++ b/inputs/rawdata
|
|
@@ -0,0 +1 @@
|
|
+Subproject commit fabf8521130a13986bd6493cb33a70e580ce8572
|
|
</code></pre>
|
|
Each (sub)dataset is a separately, but jointly version-controlled entity.
|
|
If none of its data is retrieved, subdatasets are an extremely <strong>lightweight</strong> data dependency
|
|
and yet <strong>actionable</strong> (<strong>datalad get</strong> retrieves contents on demand)
|
|
<aside class="notes">weighs just a few bytes</aside>
|
|
</section>
|
|
|
|
<section data-markdown style="font-size:30px">
|
|
## P3: Record what you did to it, and with what
|
|

|
|
|
|
- Collect and store **provenance** of all contents of a dataset that you create
|
|
- "Which script produced which output?", "From which data?", "In which **software environment**?"
|
|
... Record it in an ideally machine-readable way with **datalad (containers-)run**
|
|
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h3>Take home messages</h3>
|
|
<dl>
|
|
<dt class="fragment fade-in-then-semi-out" data-fragment-index="1">Data deserves version control</dt>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="1">
|
|
It changes and evolves just like code, and exhaustive tracking lays a foundation for reproducibility</dd>
|
|
<dt class="fragment fade-in-then-semi-out" data-fragment-index="2">
|
|
Reproducible science relies on good data management
|
|
</dt>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="2">
|
|
But effort pays off: Increased transparency, better reproducibility, easier accessibility,
|
|
efficiency through automation and collaboration, streamlined procedures for synchronizing and updating your work, ...</dd>
|
|
<dt class="fragment fade-in-then-semi-out" data-fragment-index="3">DataLad can help with some things</dt>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="3">
|
|
Have access to more data than you have disk space</dd>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="3">
|
|
Who needs short-term memory when you can have automatic provenance capture?
|
|
</dd>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="3">
|
|
Link versioned data to your analysis at no disk-space cost</dd>
|
|
<dd class="fragment fade-in-then-semi-out" data-fragment-index="3">...</dd>
|
|
</dl>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
|
|
<section>
|
|
<h3>Scalability</h3>
|
|
</section>
|
|
|
|
<section data-markdown data-transition="None"><script type="text/template">
|
|
## FAIRly big: Scaling up
|
|
|
|
Objective: Process the UK Biobank (imaging data)
|
|
<!-- .element: height="400" -->
|
|
|
|
- 76 TB in 43 million files in total
|
|
- 42,715 participants contributed personal health data
|
|
- Strict DUA
|
|
- Custom binary-only downloader
|
|
- Most data records offered as (unversioned) ZIP files
|
|
</script></section>
|
|
|
|
<section data-markdown data-transition="None"><script type="text/template">
|
|
## Challenges
|
|
|
|
- Process data such that
|
|
- Results are computationally reproducible (without the original compute infrastructure)
|
|
- There is complete linkage from results to an individual data record download
|
|
- It scales with the amount of available compute resources
|
|
|
|
- Data processing pipeline
|
|
- Compiled MATLAB blob
|
|
- 1h processing time per image, with 41k images to process
|
|
- 1.2 M output files (30 output files per input file)
|
|
- 1.2 TB total size of outputs
|
|
</script></section>
|
|
|
|
<section data-transition="None">
|
|
<h2> FAIRly big setup</h2>
|
|
<img src="../pics/fairlybig_ukbsetup.png" width="1200" style="margin-top:-35px;margin-bottom:-30px">
|
|
|
|
<ul style="font-size:30px">
|
|
<strong>Exhaustive tracking</strong>
|
|
<li><a href="https://github.com/datalad/datalad-ukbiobank" target="_blank">datalad-ukbiobank</a>
|
|
extension downloads, transforms & track the evolution of the complete data release
|
|
in DataLad datasets
|
|
</li>
|
|
<li>Native and BIDSified data layout (at no additional disk space usage)</li>
|
|
<li>Structured in 42k individual datasets, combined to one superdataset</li>
|
|
<li>Containerized pipeline in a software container</li>
|
|
<li>Link input data & computational pipeline as dependencies</li>
|
|
</ul>
|
|
<br><br>
|
|
<small><a href="https://www.nature.com/articles/s41597-022-01163-2" target="_blank">
|
|
Wagner, Waite, Wierzba et al. (2021). FAIRly big: A framework for computationally reproducible processing of large-scale data.</a>
|
|
</small>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>FAIRly big workflow</h2>
|
|
<div class="r-stack">
|
|
<img class="fragment fade-out" src="../pics/fairlybig_workflow.png" width="1200" style="margin-top:-35px;margin-bottom:-30px">
|
|
<img src="../pics/htcondor.svg" class="fragment fade-in">
|
|
</div>
|
|
<br>
|
|
<ul style="font-size:30px">
|
|
<strong>portability</strong>
|
|
<li>Parallel processing: 1 job = 1 subject
|
|
(number of concurrent jobs capped at the capacity of the compute cluster)
|
|
</li>
|
|
<li>Each job is computed in a ephemeral (short-lived) dataset clone, results are pushed back:
|
|
Ensure exhaustive tracking &
|
|
portability during computation</li>
|
|
<li>Content-agnostic persistent (encrypted) storage (minimizing storage and inodes)</li>
|
|
<li>Common data representation in secure environments</li>
|
|
</ul>
|
|
<br><br>
|
|
<small><a href="https://www.nature.com/articles/s41597-022-01163-2" target="_blank">
|
|
Wagner, Waite, Wierzba et al. (2021). FAIRly big: A framework for computationally reproducible processing of large-scale data.</a>
|
|
</small></section>
|
|
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>FAIRly big provenance capture</h2>
|
|
<img src="../pics/fairlybig_prov.png" width="1200" style="margin-top:-35px;margin-bottom:-30px">
|
|
<br><br>
|
|
<ul style="font-size:30px">
|
|
<strong>Provenance</strong>
|
|
<li>Every single pipeline execution is tracked</li>
|
|
<li>Execution in ephemeral workspaces ensures results
|
|
individually reproducible without HPC access</li>
|
|
</ul>
|
|
<br><br>
|
|
<small><a href="https://www.nature.com/articles/s41597-022-01163-2" target="_blank">
|
|
Wagner, Waite, Wierzba et al. (2021). FAIRly big: A framework for computationally reproducible processing of large-scale data.</a>
|
|
</small></section>
|
|
|
|
<section data-markdown><script type="text/template">
|
|
## FAIRly big movie
|
|
|
|
<iframe width="1120" height="630" src="https://www.youtube-nocookie.com/embed/UsW6xN2f2jc?start=17" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
|
|
|
- Two computations on clusters of different scale (small cluster, supercomputer). Full video: https://youtube.com/datalad
|
|
- Two full (re-)computations, programmatically comparable, verifiable, reproducible -- on any system with data access
|
|
</script></section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2>Thank you for your attention!</h2>
|
|
|
|
<img src="../pics/qr_lovedata.png" height="400">
|
|
<br><br><small>
|
|
|
|
Slides: <a href="https://doi.org/10.5281/zenodo.7627723" target="_blank">
|
|
DOI 10.5281/zenodo.7627723</a> (Scan the QR code)
|
|
<br><br>
|
|
</small>
|
|
<table>
|
|
<tr>
|
|
</tr>
|
|
<tr style="vertical-align:middle">
|
|
<td style="vertical-align:middle">
|
|
<img src="../pics/winrepo.png">
|
|
</td>
|
|
<td style="font-size: 18px">
|
|
<br><br>
|
|
Women neuroscientists are <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/ejn.14397" target="_blank">
|
|
underrepresented in neuroscience</a>. You can use the <br>
|
|
<a href="https://www.winrepo.org/" target="_blank"> Repository for Women in Neuroscience</a> to find
|
|
and recommend neuroscientists for <br>
|
|
conferences, symposia or collaborations, and help making neuroscience more open & divers.
|
|
</td>
|
|
</tr>
|
|
|
|
</table>
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
<section>
|
|
<h3>Command summaries</h3>
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Summary - Local version control</h3>
|
|
|
|
<dl>
|
|
<dt class="fragment fade-in"><code>datalad create</code> creates an empty dataset.</dt>
|
|
<dd class="fragment fade-in">Configurations (<b>-c yoda</b>, <b>-c text2git</b>)
|
|
add useful structure and/or configurations.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">A dataset has a <i>history</i> to track files and their modifications. </dt><dd class="fragment fade-in">Explore it with Git (<b>git log</b>) or external tools (e.g., <b>tig</b>).</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad save</code> records the dataset or file state to the history. </dt><dd class="fragment fade-in">Concise <b>commit messages</b> should summarize the change for future you and others.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad download-url</code> obtains web content and records its origin. </dt><dd class="fragment fade-in">It even takes care of saving the change.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad status</code> reports the current state of the dataset.</dt>
|
|
<dd class="fragment fade-in">A clean dataset status (no modifications, not untracked files) is good practice.</dd>
|
|
</dl>
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Summary - Dataset consumption & nesting</h3>
|
|
|
|
<ul>
|
|
<dt class="fragment fade-in"><code>datalad clone</code> installs a dataset.</dt><dd class="fragment fade-in"> It can be installed “on its own”:
|
|
Specify the source (url, path, ...) of the dataset, and an optional <b>path</b> for it to be installed to.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets can be installed as subdatasets within an existing dataset. </dt> <dd class="fragment fade-in"> The <b>--dataset/-d</b> option needs a path to the root of the superdataset.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Only small files and metadata about file availability are present locally after an install. </dt>
|
|
<dd class="fragment fade-in">To retrieve actual file content of annexed files,
|
|
<code>datalad get </code> downloads file content on demand.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets preserve their history.</dt> <dd class="fragment fade-in">The superdataset records only the <i>version state</i> of the subdataset.</dd>
|
|
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h3>Summary - Reproducible execution</h3>
|
|
|
|
<ul>
|
|
<dt class="fragment fade-in"><code>datalad run</code> records a command and
|
|
its impact on the dataset.</dt>
|
|
<dd class="fragment fade-in">All dataset modifications are saved - use it
|
|
in a clean dataset.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Data/directories specified as <code>--input</code>
|
|
are retrieved prior to command execution.</dt>
|
|
<dd class="fragment fade-in"> Use one flag per input.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Data/directories specified as <code>--output</code>
|
|
will be unlocked for modifications prior to a rerun of the command. </dt>
|
|
<dd class="fragment fade-in">Its optional to specify, but helpful for recomputations.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad containers-run</code> can be used
|
|
to capture the software environment as provenance.</dt>
|
|
<dd class="fragment fade-in">Its ensures computations are ran in the desired software set up.
|
|
Supports Docker and Singularity containers</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad rerun</code> can automatically re-execute run-records later.</dt>
|
|
<dd class="fragment fade-in">They can be identified with any commit-ish (hash, tag, range, ...)</dd>
|
|
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</div>
|
|
</div>
|
|
|
|
<script src="../reveal.js/dist/reveal.js"></script>
|
|
<script src="../reveal.js/plugin/notes/notes.js"></script>
|
|
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
|
|
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
|
|
<script src="../custom_functions.js"></script>
|
|
<script>
|
|
// More info about initialization & config:
|
|
// - https://revealjs.com/initialization/
|
|
// - https://revealjs.com/config/
|
|
Reveal.initialize({
|
|
hash: true,
|
|
// The "normal" size of the presentation, aspect ratio will be preserved
|
|
// when the presentation is scaled to fit different resolutions. Can be
|
|
// specified using percentage units.
|
|
width: 1280,
|
|
height: 960,
|
|
// Factor of the display size that should remain empty around the content
|
|
margin: 0.3,
|
|
// Bounds for smallest/largest possible scale to apply to content
|
|
minScale: 0.2,
|
|
maxScale: 1.0,
|
|
|
|
controls: true,
|
|
progress: true,
|
|
history: true,
|
|
center: true,
|
|
slideNumber: 'c',
|
|
pdfSeparateFragments: false,
|
|
pdfMaxPagesPerSlide: 1,
|
|
pdfPageHeightOffset: -1,
|
|
transition: 'slide', // none/fade/slide/convex/concave/zoom
|
|
// Learn about plugins: https://revealjs.com/plugins/
|
|
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
|
|
});
|
|
</script>
|
|
</body>
|
|
</html>
|