928 lines
37 KiB
HTML
928 lines
37 KiB
HTML
<!doctype html>
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
||
|
||
<!-- Edit me start! -->
|
||
<title>Welcome</title>
|
||
<meta name="description" content=" Workshop introduction ">
|
||
<meta name="author" content=" Adina Wagner, Michael Hanke ">
|
||
<!-- Edit me end! -->
|
||
|
||
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
|
||
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
|
||
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
|
||
<link rel="stylesheet" href="../css/main.css">
|
||
<!-- Theme used for syntax highlighted code -->
|
||
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
|
||
</head>
|
||
<body>
|
||
<div class="reveal">
|
||
<div class="slides">
|
||
|
||
<section>
|
||
<section>
|
||
<script src="https://cdn.logwork.com/widget/countdown.js"></script>
|
||
<a href="https://logwork.com/countdown-2zu8" class="countdown-timer"
|
||
data-style="columns" data-timezone="Europe/Berlin" data-date="2022-07-20 13:30">
|
||
Introduction starts in</a>
|
||
Have a ☕!
|
||
</section>
|
||
<section>
|
||
<h2>Research data management<br />👩💻👨💻<br />with DataLad</h2>
|
||
<div style="margin-top:1em;text-align:center">
|
||
<table style="border: none;">
|
||
<tr>
|
||
<td>
|
||
Adina Wagner<br><small><a href="https://twitter.com/AdinaKrik" target="_blank">
|
||
<img data-src="../pics/twitter.png" style="height:30px;margin:0px" />@AdinaKrik</a></small>
|
||
</td>
|
||
<td>
|
||
Michael Hanke<br><small><a href="https://twitter.com/eknahm" target="_blank">
|
||
<img data-src="../pics/twitter.png" style="height:30px;margin:0px" />@eknahm</a></small>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td>
|
||
<img style="height:70px;margin-right:10px" data-src="../pics/fzj_logo.svg" /><br>
|
||
</td>
|
||
<td>
|
||
<small><a href="http://psychoinformatics.de" target="_blank">Psychoinformatics lab</a>,
|
||
<br> Institute of Neuroscience and Medicine (INM-7)<br>
|
||
Research Center Jülich</small><br>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
|
||
<br><br><small>
|
||
Slides: <a href="https://github.com/datalad-handbook/course/blob/master/talks/PDFs" target="_blank">
|
||
https://github.com/datalad-handbook/course/</a></small>
|
||
</a>
|
||
</section>
|
||
</section>
|
||
|
||
<!--...WORKSHOP INTRODUCTION...-->
|
||
|
||
<section>
|
||
|
||
<section>
|
||
<h2>Welcome!</h2>
|
||
Approximate workshop schedule<br><br>
|
||
<dl style="font-size:30px">
|
||
<dt>
|
||
Session 1 (now, 13.30-15.00)
|
||
</dt>
|
||
<dd>
|
||
Logistics & Intro🧑🏫, <br>
|
||
Hands-on Terminal Basics 💻, <br>
|
||
Demo of core functionality 🧑🏫💻
|
||
</dd>
|
||
<br>
|
||
<dt>
|
||
Session 2 (today, 16.00-18.00)
|
||
</dt>
|
||
<dd>
|
||
Hands-on DataLad Basics & Exercises 💻
|
||
</dd>
|
||
<br>
|
||
<dt>
|
||
Session 3 (tomorrow, 11.00-12.30)
|
||
</dt>
|
||
<dd>
|
||
Sharing and Collaboration 🧑🏫, <br>
|
||
Hands-on Data publication 💻
|
||
</dd>
|
||
<br>
|
||
<dt>
|
||
Session 4 (tomorrow, 13.30-15.00)
|
||
</dt>
|
||
<dd>
|
||
Computational reproducibility 🧑🏫💻, <br>
|
||
Outro 🧑🏫, <br>
|
||
Final QA ❔
|
||
</dd>
|
||
</dl>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Logistics and links</h2>
|
||
<ul style="font-size:30px">
|
||
<li>
|
||
You can download these slides at <a href="https://doi.org/10.5281/zenodo.6827086" target="_blank">
|
||
https://doi.org/10.5281/zenodo.6827086</a> (scan the QR code), and you can find their sources at
|
||
<a href="https://github.com/datalad-handbook/datalad-course/" target="_blank">
|
||
github.com/datalad-handbook/datalad-course </a> <br>
|
||
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
Some of today's code-along workshop contents are at
|
||
<a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">
|
||
psychoinformatics-de.github.io/rdm-course</a>
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
The workshop will be interactive. If you do not have the software installed
|
||
on your own system, you can access a Jupyterhub from your browser at
|
||
<a href="https://datalad-hub.inm7.de/" target="_blank">datalad-hub.inm7.de</a>
|
||
<strong>(WIFI is bad, Jupyterhub is the better choice)</strong>
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
You can log in to the Juypterhub with a pre-set username (take one out of the
|
||
jar) and a self-set password. Remember the password for tomorrow!
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
A <a href="https://pip.pypa.io/en/stable/user_guide/#requirements-files" target="_blank">
|
||
requirements.txt</a> file on Zenodo details the software
|
||
environment we setup on the Jupyterhub
|
||
</li>
|
||
</ul>
|
||
<img src="../pics/QRcode_mpsc.png" height="250px" align="middle">
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Interactivity</h2><br><br>
|
||
<ul style="font-size:30px">
|
||
<li class="fragment fade-in">
|
||
The workshop centers around
|
||
<a href="http://handbook.datalad.org/r.html?about" target="_blank">
|
||
<strong>DataLad</strong></a>
|
||
(version 0.16 and up) for real-world <strong>research data management </strong>use cases
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
There are no stupid questions; ask anything any time <br>
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
Something doesn't look right on your system?
|
||
Stick a post-it to your screen. We'll take a look together
|
||
</li>
|
||
<li class="fragment fade-in">
|
||
We're available outside of sessions, too. Chat about your
|
||
use cases or questions over a coffee or meal
|
||
</li>
|
||
</ul>
|
||
<table>
|
||
<tr>
|
||
<td style="vertical-align:top; font-size:35px">
|
||
<br><br>
|
||
<li class="fragment fade-in">
|
||
4 sessions = time for more than a <br>
|
||
standard introduction. <br></li>
|
||
<li class="fragment fade-in">
|
||
Materials are available <br>
|
||
online & persistent, we can<br>
|
||
be flexible & spontaneous <br>
|
||
if specific topics interest you
|
||
</li>
|
||
</td>
|
||
<td>
|
||
<img class="fragment fade-in" src="../pics/splits.jpg" width="600px">
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>After the workshop</h2>
|
||
<ul>
|
||
If you have a question after the workshop, you can reach out for help:
|
||
<br>
|
||
<ul style="font-size:30px">
|
||
<dt>Reach out to to the <b>DataLad</b> team via</dt>
|
||
<li>
|
||
<a href="https://matrix.to/#/!NaMjKIhMXhSicFdxAj:matrix.org?via=matrix.waite.eu&via=matrix.org&via=inm7.de" target="_blank">
|
||
Matrix</a> (free, decentralized communication app, no app needed).
|
||
We run a weekly Zoom office hour (Thursday, 4pm Berlin time) from this room as well.
|
||
</li>
|
||
<li>
|
||
<a href="https://github.com/datalad/datalad" target="_blank">
|
||
the development repository on GitHub</a>
|
||
</li>
|
||
<br>
|
||
<dt>Reach out to the user community with</dt>
|
||
<li>A question on <a href="https://neurostars.org/" target="_blank">neurostars.org</a>
|
||
with a <code>datalad</code> tag</li>
|
||
<br>
|
||
<dt>Find more user tutorials or workshop recordings</dt>
|
||
<li>On <a href="https://www.youtube.com/channel/datalad" target="_blank">
|
||
DataLad's YouTube channel</a>
|
||
</li>
|
||
<li>
|
||
In the <a href="http://handbook.datalad.org/en/latest/" target="_blank">
|
||
DataLad Handbook </a>
|
||
</li>
|
||
<li>In the <a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">DataLad RDM course</a> </li>
|
||
<li>In the <a href="http://docs.datalad.org" target="_blank">Official API documentation</a> </li>
|
||
</ul>
|
||
</ul>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Audience response system</h2>
|
||
Use your phone to scan the QR code, or open the link in a new browser window <br>
|
||
<iframe src="https://directpoll.com/r?XDbzPBdEt8j1rJlVwV5I4m6c9z8nJU2YLnRe3j3k",
|
||
style="border: 0" width="900" height="800"></iframe>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>On a scale of rubber ducks...</h2>
|
||
<img src="../pics/rubberduckscale.png" height="600"><iframe src="https://directpoll.com/r?XDbzPBdEt8j1rJlVwV5I4m6c9z8nJU2YLnRe3j3k",
|
||
style="border: 0" width="400" height="600"></iframe>
|
||
</section>
|
||
</section>
|
||
|
||
<!--..Research data management in general..-->
|
||
|
||
<section>
|
||
|
||
<section>
|
||
<h2>Research data management</h2>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Common problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
You write a paper & stay up late to generate good-looking figures,
|
||
but you have to tweak many parameters and display options.
|
||
The next morning, you have no idea which parameters produced which
|
||
figures, and which of the figures fit to what you report in the paper.<br>
|
||
<img height="400" src="../pics/turingway/findfiles.png">
|
||
<img height="400" src="../pics/turingway/projectstack.png"</div>
|
||
<imgcredit>Illustration adapted from Scriberia and The Turing Way</imgcredit>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Common problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
Your research project produces phenomenal results, but your
|
||
laptop, the only place that stores the source code for the
|
||
results, is stolen or breaks<br>
|
||
<img height="700" src="../pics/stolenlaptop.jpg"></div>
|
||
<imgcredit>https://co.pinterest.com/pin/551128073121451139//imgcredit>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Common problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
A graduate student complains that a research idea does not work.
|
||
Their supervisor can't figure out what the student did and how,
|
||
and the student can't sufficiently explain their approach
|
||
(data, algorithms, software).
|
||
Weeks of discussion and mis-communication ensues because the
|
||
supervisor can't first-hand explore or use the students project.
|
||
<br>
|
||
<img height="500" src="../pics/badsupervision.gif"></div>
|
||
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Common problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
You wrote a script during your PhD that applied a specific
|
||
method to a dataset. Now, with new data and a new project, you
|
||
try to reuse the script, but forgot how it worked.
|
||
<br>
|
||
<img height="500" src="../pics/frustration.jpg"></div>
|
||
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>common problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
You try to recreate results from another lab's published paper.
|
||
You base your re-implementation on everything reported in their paper,
|
||
but the results you obtain look nowhere like the original.
|
||
<br>
|
||
<img height="500" src="../pics/turingway/ReadableCode.png"></div>
|
||
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
|
||
</section>
|
||
|
||
<section>
|
||
<h2><strike>common</strike> old problems in science</h2>
|
||
<div class="fragment fade-in" data-fragment-index="1">
|
||
All these problems were paraphrased from
|
||
<a href="https://sci-hub.se/https://link.springer.com/chapter/10.1007%2F978-1-4612-2544-7_5" target="_blank">
|
||
Buckheit & Donoho, <b>1995</b></a>
|
||
<br></div>
|
||
<div class="fragment fade-in">Let's do better!</div>
|
||
</section>
|
||
|
||
</section>
|
||
|
||
<!--...WHAT IS DATALAD...-->
|
||
|
||
<section>
|
||
|
||
<section data-transition="fade">
|
||
<div><table>
|
||
<tr><dl>
|
||
<img src="../pics/datalad_logo_wide.svg" height="150"><br>
|
||
<b><a href="https://www.datalad.org/" target="_blank"> DataLad</a>
|
||
can help <br> with small or large-scale <br> data management </b>
|
||
<dt></dt>
|
||
</dl></tr>
|
||
<tr><dl class="fragment fade-in">Free, <br> open source, <br> command line tool & Python API </dl></tr>
|
||
</table>
|
||
</div>
|
||
<ul style="vertical-align:middle">
|
||
<br>
|
||
<dt></dt>
|
||
</ul>
|
||
</section>
|
||
<section>
|
||
<h2> <img src="../pics/datalad_logo_wide.svg"></h2>
|
||
<ul>
|
||
<li>A command-line tool, available for all major operating systems
|
||
(Linux, macOS/OSX, Windows), MIT-licensed</li>
|
||
<li>Build on top of <a href="https://git-scm.com/" target="_blank">Git</a>
|
||
and <a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a></li>
|
||
<dt><li>Allows...</li></dt>
|
||
<dt>... version-controlling arbitrarily large content </dt>
|
||
<dd>version control data and software alongside to code!</dd>
|
||
<dt>... transport mechanisms for sharing and obtaining data </dt>
|
||
<dd>consume and collaborate on data (analyses) like software</dd>
|
||
<dt>... (computationally) reproducible data analysis</dt>
|
||
<dd>Track and share provenance of all digital objects</dd>
|
||
<dt>... and <i>much</i> more </dt>
|
||
<li>Completely domain-agnostic</li>
|
||
<br>
|
||
</ul>
|
||
</section>
|
||
<section>
|
||
<h2>Acknowledgements</h2>
|
||
<table>
|
||
<tr style="vertical-align:top">
|
||
<td style="vertical-align:top">
|
||
<dl>
|
||
<dt>Software</dt>
|
||
<dd style="margin-left:5px!important">
|
||
<ul style="margin-left:5px!important">
|
||
<li>Joey Hess (git-annex)</li>
|
||
<li>The DataLad team &
|
||
contributors</li>
|
||
</ul>
|
||
</dd>
|
||
<dt style="margin-top:20px">Illustrations </dt>
|
||
<dd style="margin-left:5px!important">
|
||
<ul style="margin-left:5px!important">
|
||
<li>The Turing Way <br>
|
||
project & Scriberia</li>
|
||
<img src="../pics/bannerthanks.svg">
|
||
</ul>
|
||
</dd>
|
||
</dl>
|
||
</td>
|
||
<td style="vertical-align:top">
|
||
<div style="margin-bottom:-20px;text-align:center"><strong>Funders</strong></div>
|
||
<img style="height:150px;margin-right:50px" data-src="../pics/nsf_2020.png" />
|
||
<img style="height:150px;margin-right:50pxi;margin-left:50px" data-src="../pics/binc.png" />
|
||
<img style="height:150px;margin-left:50px" data-src="../pics/bmbf_2020.png" />
|
||
<img style="height:80px;margin-top:-40px;margin-left:auto;margin-right:auto;width:100%" data-src="../pics/fzj_logo.svg" />
|
||
<div style="margin-top:-20px">
|
||
<img style="height:60px;margin-right:20px" data-src="../pics/erdf.png" />
|
||
<img style="height:60px;margin-right:20px" data-src="../pics/cbbs_logo.png" />
|
||
<img style="height:60px" data-src="../pics/LSA-Logo.png" />
|
||
</div>
|
||
<div style="margin-top:40px;margin-bottom:20px;text-align:center"><strong>Collaborators</strong></div>
|
||
<div style="margin-top:-20px">
|
||
<img style="height:100px;margin:20px" data-src="../pics/hbp_logo.png" />
|
||
<img style="height:100px;margin:20px" data-src="../pics/conp_logo.png" />
|
||
<img style="height:100px;margin:20px" data-src="../pics/vbc_logo.png" />
|
||
</div>
|
||
<div style="margin-top:-40px">
|
||
<img style="height:120px;margin:20px" data-src="../pics/openneuro_logo.png" />
|
||
<img style="height:120px;margin:20px" data-src="../pics/cbrain_logo.png" />
|
||
<img style="height:140px;margin:20px" data-src="../pics/brainlife_logo.png" />
|
||
</div>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h3>
|
||
Examples of what DataLad can be used for:
|
||
</h3>
|
||
<ul>
|
||
<li class="fragment fade-in-then-semi-out">
|
||
Behind-the-scenes <b>infrastructure component for data transport and versioning</b>
|
||
(e.g., used by <a href="https://openneuro.org/" target="_blank"> OpenNeuro</a>,
|
||
<a href="https://brainlife.io/" target="_blank"> brainlife.io </a>,
|
||
the <a href="https://conp.ca/" target="_blank">Canadian Open Neuroscience Platform (CONP)</a>,
|
||
<a href="https://mcin.ca/technology/cbrain/" target="_blank"> CBRAIN</a>)</li>
|
||
<img height="800" class="fragment fade-in" src="../pics/openneuro2.gif" alt="a screenrecording of browsing open neuro">
|
||
</ul>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h3>
|
||
Examples of what DataLad can be used for:
|
||
</h3>
|
||
<ul>
|
||
<li class="fragment fade-in-then-semi-out"> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
|
||
<img height="800" class="fragment fade-in" src="../pics/shareresearch2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
|
||
</ul>
|
||
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h3>
|
||
Examples of what DataLad can be used for:
|
||
</h3>
|
||
<ul>
|
||
<li> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
|
||
<img height="800" class="fragment fade-in" src="../pics/openscience.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
|
||
</ul>
|
||
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h3>
|
||
Examples of what DataLad can be used for:
|
||
</h3>
|
||
<ul>
|
||
<li class="fragment fade-in-then-semi-out"><b>Central data management</b> and archival system</li>
|
||
<img height="850" class="fragment fade-in" src="../pics/centralmanagement.gif">
|
||
</ul>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h3>
|
||
Examples of what DataLad can be used for:
|
||
</h3>
|
||
<ul>
|
||
<li class="fragment fade-in-then-semi-out"><b>Scalable computing framework</b> for reproducible science</li>
|
||
<img height="350" class="fragment fade-in" src="../pics/fairly-big.png">
|
||
<img height="500" class="fragment fade-in" src="../pics/ukb_datasets.svg">
|
||
</ul>
|
||
</section>
|
||
</section>
|
||
|
||
<section>
|
||
<section data-transition="None">
|
||
<h2>Prerequisites: Terminal</h2>
|
||
<ul>
|
||
<div>
|
||
<li>DataLad can be used from the command line</li>
|
||
<pre><code>datalad create mydataset</code></pre></div>
|
||
<div class="fragment fade-in">
|
||
<li>... or with its Python API</li>
|
||
<pre><code class="python">import datalad.api as dl
|
||
dl.create(path="mydataset")</code></pre></div>
|
||
<div class="fragment fade-in">
|
||
<li>... and other programming languages can use it via system call</li>
|
||
<pre><code class="python"># in R
|
||
> system("datalad create mydataset")
|
||
</code></pre></div>
|
||
<br><br>
|
||
</ul>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Prerequisites: Terminal</h2>
|
||
<iframe src="https://directpoll.com/r?XDbzPBdEt8j1rJlVwV5I4m6c9z8nJU2YLnRe3j3k",
|
||
style="border: 0" width="900" height="800"></iframe>
|
||
<p><a href="https://datalad-hub.inm7.de" target="_blank">
|
||
datalad-hub.inm7.de</a></p>
|
||
<p><a href="https://www.mathcs.emory.edu/~valerie/courses/fall10/155/resources/unix_cheatsheet.html" target="_blank">
|
||
Unix terminal cheatsheet (incl. Windows equivalents)</a></p>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Prerequisites: Installation and Configuration</h2>
|
||
<ul style="font-size:30px">
|
||
<li data-fragment-index="1" class="fragment fade-in">Your installed version of DataLad should be 0.17.2</li>
|
||
<pre class="fragment fade-in" data-fragment-index="1"><code data-fragment-index="1" class="fragment fade-in">datalad --version
|
||
0.17.2</code></pre>
|
||
<table>
|
||
<li data-fragment-index="2" class="fragment fade-in">DataLad relies on Git to create a revision history with detailed information on
|
||
what was changes, when, and how. Therefore, you should tell Git who you are and
|
||
configure a Git identity (name and email). Find out if an identity is set
|
||
by running either of:</li>
|
||
<tr>
|
||
<td>
|
||
<pre data-fragment-index="2" class="fragment fade-in"><code data-fragment-index="2" class="fragment fade-in" class="bash">$ git config --get user.name
|
||
Adina Wagner
|
||
$ git config --get user.email
|
||
adina.wagner@t-online.de .
|
||
</code></pre>
|
||
</td>
|
||
<td>
|
||
<pre data-fragment-index="2" class="fragment fade-in"><code data-fragment-index="2" class="fragment fade-in" class="bash">$ datalad configuration get user.name user.email
|
||
Adina Wagner
|
||
adina.wagner@t-online.de
|
||
.
|
||
</code></pre>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
|
||
<li data-fragment-index="3" class="fragment fade-in">Set a Git identity using either of
|
||
<table>
|
||
<tr>
|
||
<td>
|
||
<pre data-fragment-index="3" class="fragment fade-in"><code data-fragment-index="3" class="fragment fade-in">$ git config set --global \
|
||
user.name "Adina Wagner"
|
||
$ git config set --global \
|
||
user.email "adina.wagner@t-online.de" .</code></pre>
|
||
</td>
|
||
<td>
|
||
<pre data-fragment-index="3" class="fragment fade-in"><code data-fragment-index="3" class="fragment fade-in">$ datalad configuration --scope global \
|
||
set user.name="Adina Wagner"
|
||
$ datalad configuration --scope global \
|
||
set user.email="adina.wagner@t-online.de" .</code></pre>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
<li data-fragment-index="4" class="fragment fade-in">Allow brand-new DataLad functionality:
|
||
<pre><code>datalad configuration --scope global set datalad.extensions.load=next</code></pre> </li>
|
||
|
||
<small>Find installation and configuration
|
||
instructions at <a href="http://handbook.datalad.org/en/latest/intro/installation.html" target="_blank">
|
||
handbook.datalad.org</a></small>
|
||
</ul>
|
||
</section>
|
||
|
||
<section data-transition="None">
|
||
<h2>Prerequisites: Using DataLad</h2>
|
||
<ul style="font-size:30px">
|
||
<li class="fragment fade-in">Every DataLad command consists of a main
|
||
command followed by a sub-command. The main and the sub-command can have options.
|
||
<img height="280px" src="../pics/command-structure.png">
|
||
</li>
|
||
<li class="fragment fade-in"> Example (main command, subcommand, several subcommand options):
|
||
<pre><code>$ datalad save -m "Saving changes" --recursive </code></pre>
|
||
</li>
|
||
<li class="fragment fade-in">Use <em>--help</em> to find out more about any (sub)command
|
||
and its options, including detailed description and examples (<em>q</em> to close). Use <em>-h</em> to get a short
|
||
overview of all options
|
||
<pre><code>$ datalad save -h
|
||
Usage: datalad save [-h] [-m MESSAGE] [-d DATASET] [-t ID] [-r] [-R LEVELS]
|
||
[-u] [-F MESSAGE_FILE] [--to-git] [-J NJOBS] [--amend]
|
||
[--version]
|
||
[PATH ...]
|
||
|
||
Use '--help' to get more comprehensive information.
|
||
</code></pre></li>
|
||
</ul>
|
||
</section>
|
||
</section>
|
||
|
||
<section>
|
||
<section data-markdown><script type="text/template">
|
||
If everything is important...
|
||
|
||
...track everything!
|
||
</script></section>
|
||
|
||
<section data-markdown><script type="text/template">
|
||
<!-- .element: height="600" -->
|
||
http://datalad.org<!-- .element: style="margin-left:800px" -->
|
||
|
||
<aside class="notes">
|
||
But let's not talk about it, and only talk about feature and example implementations in DataLad
|
||
</aside>
|
||
</script>
|
||
</section>
|
||
|
||
<section data-markdown data-transition="none"><script type="text/template">
|
||
## Exhaustive tracking of research components
|
||
<!-- .element: width="100%" -->
|
||
Well-structured datasets (using community standards), and portable computational environments — and their evolution — are the precondition for reproducibility
|
||
|
||
<table width=100% style="padding:0px">
|
||
<tr><td style="padding:0px">
|
||
<code><pre>
|
||
# turn any directory into a dataset
|
||
# with version control
|
||
|
||
% datalad create <directory>
|
||
</pre></code>
|
||
</td><td style="padding:0px">
|
||
<code><pre>
|
||
# save a new state of a dataset with
|
||
# file content of any size
|
||
|
||
% datalad save
|
||
</pre></code>
|
||
</td></tr></table>
|
||
Note:
|
||
- link to prev. statements on description standards
|
||
- your community could be really small (your lab), when data are precious resources
|
||
will be spent to understand it, but information must be capture to make this possible
|
||
</script></section>
|
||
|
||
<section data-markdown data-transition="none"><script type="text/template">
|
||
## Capture computational provenance
|
||
<!-- .element: width="100%" -->
|
||
Which data was needed at which version, as input into which code, running with what parameterization in which
|
||
computional environment, to generate an outcome?
|
||
|
||
<table width=100% style="padding:0px">
|
||
<tr><td style="padding:0px">
|
||
<code><pre>
|
||
# execute any command and capture its output
|
||
# while recording all input versions too
|
||
|
||
% datalad run --input ... --output ... <command>
|
||
</pre></code>
|
||
</td></tr></table>
|
||
|
||
Note:
|
||
The missing link: even when everything is shared, we still don't know how to start.
|
||
README is minimum, but executable prov-records are much better.
|
||
</script></section>
|
||
|
||
<section data-markdown data-transition="none"><script type="text/template">
|
||
## Exhaustive capture enables portability
|
||
<!-- .element: width="100%" -->
|
||
Precise identification of data and computational environments, combined for provenance records form a comprehensive and portable data structure, capturing all aspects of an investigation.
|
||
|
||
<table width=100% style="padding:0px">
|
||
<tr><td style="padding:0px">
|
||
<code><pre>
|
||
# transfer data and metadata to other sites and services
|
||
# with fine-grained access control for dataset components
|
||
|
||
% datalad push --to <site-or-service>
|
||
</pre></code>
|
||
</td></tr></table>
|
||
|
||
Note:
|
||
Does it fly? Can you give it to someone? Or can you take it with you to your new lab?
|
||
</script></section>
|
||
|
||
<section data-markdown data-transition="none"><script type="text/template">
|
||
## Reproducibility strengthens trust
|
||
<!-- .element: width="100%" -->
|
||
Outcomes of computational transformations can be validated by authorized 3rd-parties. This enables audits, promotes accountability, and streamlines automated "upgrades" of outputs
|
||
|
||
<table width=100% style="padding:0px">
|
||
<tr><td style="padding:0px">
|
||
<code><pre>
|
||
# obtain dataset (initially only identity,
|
||
# availability, and provenance metadata)
|
||
|
||
% datalad clone <url>
|
||
</pre></code>
|
||
</td><td style="padding:0px">
|
||
<code><pre>
|
||
# immediately actionable provenance records
|
||
# full abstraction of input data retrieval
|
||
|
||
% datalad rerun <commit|tag|range>
|
||
</pre></code>
|
||
</td></tr></table>
|
||
Note:
|
||
Goal is automated reproducibility, enables assessment of robustness and benchmarking algorithmic developments
|
||
</script></section>
|
||
|
||
<section data-markdown data-transition="none"><script type="text/template">
|
||
## Ultimate goal: (re-)usability
|
||
<!-- .element: width="100%" -->
|
||
Verifiable, portable, self-contained data structures that track all aspects of an investigation exhaustively can be (re-)used as modular components in larger contexts — propagating their traits
|
||
|
||
<table width=100% style="padding:0px">
|
||
<tr><td style="padding:0px">
|
||
<code><pre>
|
||
# declare a dependency on another dataset and
|
||
# re-use it a particular state in a new context
|
||
|
||
% datalad clone -d <superdataset> <url> <path-in-dataset>
|
||
</pre></code>
|
||
</td></tr></table>
|
||
|
||
Note:
|
||
With these in place, re-usability is a small(er) step
|
||
</script></section>
|
||
|
||
<section data-markdown><script type="text/template">
|
||
## DataLad: Manage (co-)evolution of digital objects
|
||
<!-- .element: width="900" style="margin-bottom:-70px;margin-top:-20px" -->
|
||
|
||
Consume, create, curate, analyze, publish, and query data with full provenance capture and "universal" metadata support.
|
||
<p style="font-size:70%;margin-top:-20px">
|
||
DataLad is free and open source (MIT-licensed). http://datalad.org
|
||
</p>
|
||
|
||
<note>
|
||
Halchenko, Meyer, Poldrack, ... & Hanke, M. (2021).
|
||
DataLad: distributed system for joint management of code, data, and their relationship.
|
||
Journal of Open Source Software, 6(63), 3262.
|
||
</note>
|
||
Note:
|
||
- following illustrations contain concrete implementation with datalad
|
||
- Software developed to address the needs of long-term maintenance and collab on the stufyforrest dataset
|
||
</script></section>
|
||
|
||
<section data-markdown><script type="text/template">
|
||
## Let's try...
|
||
</script></section>
|
||
</section>
|
||
|
||
|
||
<section>
|
||
<section>
|
||
<h1>Backup</h1>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Core concepts & features</h2>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Everything happens in DataLad datasets</h2>
|
||
<img src="../pics/artwork/src/dataset.svg" width="600"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Dataset = Git/git-annex repository</h2>
|
||
<ul>
|
||
<li>content agnostic</li>
|
||
<li>no custom data structures</li>
|
||
<li>complete decentralization</li>
|
||
<li>Looks and feels like a directory on your computer:</li>
|
||
</ul>
|
||
<br>
|
||
<br>
|
||
<img src="../pics/remodnav-ds-nautilus.png" width="500"> <img src="../pics/remodnav-ds-terminal.png" width="500">
|
||
<small>File viewer and terminal view of a DataLad dataset</small>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>version control arbitrarily large files</h2>
|
||
<img src="../pics/artwork/src/local_wf.svg" width="600"> <br>
|
||
|
||
<ul><p class="fragment fade-in">
|
||
Stay flexible:
|
||
<li class="fragment fade-in">Non-complex DataLad core API (easy for data management novices)</li>
|
||
<li class="fragment fade-in">Pure Git or git-annex commands (for regular Git or git-annex users, or to use specific functionality)</li>
|
||
</ul></p>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Use a datasets' history</h2>
|
||
<img src="../pics/researchlog.png">
|
||
<ul>
|
||
<li class="fragment fade-in"> reset your dataset (or subset of it) to a previous state, </li>
|
||
<li class="fragment fade-in"> revert changes or bring them back, </li>
|
||
<li class="fragment fade-in"> find out what was done when, how, why, and by whom </li>
|
||
<li class="fragment fade-in"> Identify precise versions: Use data in the most recent version, or the one from 2018, or... </li>
|
||
</ul>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Consume and collaborate</h2>
|
||
<img src="../pics/artwork/src/collaboration.svg" width="900"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>machine-readable, re-executable provenance</h2>
|
||
<img src="../pics/artwork/src/reproducible_execution.svg" width="900"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Seamless nesting and dataset linkage</h2>
|
||
|
||
<img src="../pics/artwork/src/linkage_subds.svg" width="900"> <br>
|
||
<!-- <ul>
|
||
<li class="fragment fade-in" data-fragment-index="2">Overcomes scaling issues with large amounts of files</li>
|
||
<pre class="fragment fade-in" data-fragment-index="2"><code>adina@bulk1 in /ds/hcp/super on git:master❱ datalad status --annex -r
|
||
15530572 annex'd files (77.9 TB recorded total size)
|
||
nothing to save, working tree clean</code></pre>
|
||
<small><a class="fragment fade-in" data-fragment-index="2" href="https://github.com/datalad-datasets/human-connectome-project-openaccess" target="_blank">(github.com/datalad-datasets/human-connectome-project-openaccess)</a></small>
|
||
<li class="fragment fade-in">Modularizes research components for transparency, reuse, and access management</li>
|
||
</ul>
|
||
-->
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Core concepts & features</h2>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Everything happens in DataLad datasets</h2>
|
||
<img src="../pics/artwork/src/dataset.svg" width="600"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Dataset = Git/git-annex repository</h2>
|
||
<ul>
|
||
<li>content agnostic</li>
|
||
<li>no custom data structures</li>
|
||
<li>complete decentralization</li>
|
||
<li>Looks and feels like a directory on your computer:</li>
|
||
</ul>
|
||
<br>
|
||
<br>
|
||
<img src="../pics/remodnav-ds-nautilus.png" width="500"> <img src="../pics/remodnav-ds-terminal.png" width="500">
|
||
<small>File viewer and terminal view of a DataLad dataset</small>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>version control arbitrarily large files</h2>
|
||
<img src="../pics/artwork/src/local_wf.svg" width="600"> <br>
|
||
|
||
<ul><p class="fragment fade-in">
|
||
Stay flexible:
|
||
<li class="fragment fade-in">Non-complex DataLad core API (easy for data management novices)</li>
|
||
<li class="fragment fade-in">Pure Git or git-annex commands (for regular Git or git-annex users, or to use specific functionality)</li>
|
||
</ul></p>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Use a datasets' history</h2>
|
||
<img src="../pics/researchlog.png">
|
||
<ul>
|
||
<li class="fragment fade-in"> reset your dataset (or subset of it) to a previous state, </li>
|
||
<li class="fragment fade-in"> revert changes or bring them back, </li>
|
||
<li class="fragment fade-in"> find out what was done when, how, why, and by whom </li>
|
||
<li class="fragment fade-in"> Identify precise versions: Use data in the most recent version, or the one from 2018, or... </li>
|
||
</ul>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Consume and collaborate</h2>
|
||
<img src="../pics/artwork/src/collaboration.svg" width="900"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>machine-readable, re-executable provenance</h2>
|
||
<img src="../pics/artwork/src/reproducible_execution.svg" width="900"> <br>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Seamless nesting and dataset linkage</h2>
|
||
|
||
<img src="../pics/artwork/src/linkage_subds.svg" width="900"> <br>
|
||
<!-- <ul>
|
||
<li class="fragment fade-in" data-fragment-index="2">Overcomes scaling issues with large amounts of files</li>
|
||
<pre class="fragment fade-in" data-fragment-index="2"><code>adina@bulk1 in /ds/hcp/super on git:master❱ datalad status --annex -r
|
||
15530572 annex'd files (77.9 TB recorded total size)
|
||
nothing to save, working tree clean</code></pre>
|
||
<small><a class="fragment fade-in" data-fragment-index="2" href="https://github.com/datalad-datasets/human-connectome-project-openaccess" target="_blank">(github.com/datalad-datasets/human-connectome-project-openaccess)</a></small>
|
||
<li class="fragment fade-in">Modularizes research components for transparency, reuse, and access management</li>
|
||
</ul>
|
||
-->
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Third party integrations</h2>
|
||
<img src="../pics/artwork/src/thirdparty.svg" width="900"> <br>
|
||
<small>Apart from <b>local computing infrastructure</b> (from private laptops to computational clusters),
|
||
datasets can be hosted in major <b>third party repository hosting and cloud storage</b> services.
|
||
More info: Chapter on <a href="http://handbook.datalad.org/en/latest/basics/basics-thirdparty.html" target="_blank">
|
||
Third party infrastructure</a>.</small>
|
||
</section>
|
||
|
||
<section>
|
||
<h2>Third party integrations</h2>
|
||
<img src="../pics/artwork/src/thirdparty.svg" width="900"> <br>
|
||
<small>Apart from <b>local computing infrastructure</b> (from private laptops to computational clusters),
|
||
datasets can be hosted in major <b>third party repository hosting and cloud storage</b> services.
|
||
More info: Chapter on <a href="http://handbook.datalad.org/en/latest/basics/basics-thirdparty.html" target="_blank">
|
||
Third party infrastructure</a>.</small>
|
||
</section>
|
||
|
||
</section>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
|
||
<script src="../reveal.js/dist/reveal.js"></script>
|
||
<script src="../reveal.js/plugin/notes/notes.js"></script>
|
||
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
|
||
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
|
||
<script>
|
||
// More info about initialization & config:
|
||
// - https://revealjs.com/initialization/
|
||
// - https://revealjs.com/config/
|
||
Reveal.initialize({
|
||
hash: true,
|
||
// The "normal" size of the presentation, aspect ratio will be preserved
|
||
// when the presentation is scaled to fit different resolutions. Can be
|
||
// specified using percentage units.
|
||
width: 1280,
|
||
height: 960,
|
||
// Factor of the display size that should remain empty around the content
|
||
margin: 0.2,
|
||
// Bounds for smallest/largest possible scale to apply to content
|
||
minScale: 0.2,
|
||
maxScale: 1.0,
|
||
|
||
controls: true,
|
||
progress: true,
|
||
history: true,
|
||
center: true,
|
||
slideNumber: 'c',
|
||
pdfSeparateFragments: false,
|
||
pdfMaxPagesPerSlide: 1,
|
||
pdfPageHeightOffset: -1,
|
||
transition: 'slide', // none/fade/slide/convex/concave/zoom
|
||
// Learn about plugins: https://revealjs.com/plugins/
|
||
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
|
||
});
|
||
</script>
|
||
</body>
|
||
</html>
|