datalad-course/html/uke_introduction.html

612 lines
24 KiB
HTML

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<!-- Edit me start! -->
<title>This is where your title goes</title>
<meta name="description" content=" This is where you put a short description ">
<meta name="author" content=" Your Name ">
<!-- Edit me end! -->
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<section>
<script src="https://cdn.logwork.com/widget/countdown.js"></script>
<a href="https://logwork.com/countdown-2zu8" class="countdown-timer"
data-style="columns" data-timezone="Europe/Berlin" data-date="2022-04-21 09:00">
Welcome Session starts in</a>
Have a ☕!
</section>
<section>
<h2>Research data management<br />👩‍💻👨‍💻<br />with DataLad</h2>
<div style="margin-top:1em;text-align:center">
<table style="border: none;">
<tr>
<td>
Adina Wagner<br><small><a href="https://twitter.com/AdinaKrik" target="_blank">
<img data-src="../pics/twitter.png" style="height:30px;margin:0px" />@AdinaKrik</a></small>
</td>
<td>
Michał Szczepanik<br>
</td>
</tr>
<tr>
<td>
<img style="height:70px;margin-right:10px" data-src="../pics/fzj_logo.svg" /><br>
</td>
<td>
<small><a href="http://psychoinformatics.de" target="_blank">Psychoinformatics lab</a>,
<br> Institute of Neuroscience and Medicine (INM-7)<br>
Research Center Jülich</small><br>
</td>
</tr>
</table>
</div>
<br><br><small>
Slides: <a href="https://github.com/datalad-handbook/course/blob/master/talks/PDFs" target="_blank">
https://github.com/datalad-handbook/course/</a></small>
</a>
</section>
</section>
<!--...WORKSHOP INTRODUCTION...-->
<section>
<section>
<h2>welcome!</h2>
A few logistical things first:
<ul style="font-size:30px">
<li class="fragment fade-in-then-semi-out">
An approximate schedule for today is
<a href="https://adswa.github.io/dl-workshop/content/welcome/" target="_blank">
on our companion workshop website</a>.
</li>
<li class="fragment fade-in-then-semi-out">
Feel free to take collaborative, public notes at <a href="https://etherpad.wikimedia.org/p/RDM_with_DataLad" target="_blank">
etherpad.wikimedia.org/p/RDM_with_DataLad</a>. You can also use this pad for anonymous questions.
</li>
<li class="fragment fade-in-then-semi-out">
We are using a JupyterHub. You should have received credentials in advance via email -
if not, please raise your hand!
</li>
<li class="fragment fade-in-then-semi-out">
Find the workshop contents (and more) at <a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">
psychoinformatics-de.github.io/rdm-course/ </a>
</li>
<li class="fragment fade-in-then-semi-out">
Let us introduce the workshop organizers...
</li>
<li class="fragment fade-in-then-semi-out">
Some guidelines for the virtual workshop venue...
</li>
<ul>
<li class="fragment fade-in">Please mute yourself when you don't speak</li>
<li class="fragment fade-in">Make use of the "Raise hand" feature</li>
<li class="fragment fade-in">Drop out and re-join as you please</li>
<li class="fragment fade-in">Adhere to the <a href="https://adswa.github.io/dl-workshop/coc/" target="_blank">code of conduct</a> </li>
</ul>
</ul>
</section>
<section>
<h2>Questions/interaction throughout the workshop</h2>
<ul style="font-size:30px">
<li>
If you have a question during a lecture, please first type your questions in the chat.
There are no stupid questions :)
</li>
<li>
It would be great to have lively discussions - unless its interrupting others,
please feel encouraged to unmute/turn on your video to interact with us.
</li>
<li>
We're happy to discuss specific use cases at the end. Please make a note about them in
the "Shared notes"
</li>
</ul>
</section>
<section>
<h2>Questions/interaction after the workshop</h2>
<ul>
If you have a question after the workshop, you can reach out for help:
<br>
<ul style="font-size:30px">
<dt>Reach out to to the <b>DataLad</b> team via</dt>
<li>
<a href="https://matrix.to/#/!NaMjKIhMXhSicFdxAj:matrix.org?via=matrix.waite.eu&via=matrix.org&via=inm7.de" target="_blank">
Matrix</a> (free, decentralized communication app, no app needed).
We run a weekly Zoom office hour (Thursday, 4pm Berlin time) from this room as well.
</li>
<li>
<a href="https://github.com/datalad/datalad" target="_blank">
the development repository on GitHub</a>
</li>
<br>
<dt>Reach out to the user community with</dt>
<li>A question on <a href="https://neurostars.org/" target="_blank">neurostars.org</a>
with a <code>datalad</code> tag</li>
<br>
<dt>Find more user tutorials or workshop recordings</dt>
<li>On <a href="https://www.youtube.com/channel/datalad" target="_blank">
DataLad's YouTube channel</a>
</li>
<li>
In the <a href="http://handbook.datalad.org/en/latest/" target="_blank">
DataLad Handbook </a>
</li>
<li>In the <a href="https://psychoinformatics-de.github.io/rdm-course/" target="_blank">DataLad RDM course</a> </li>
<li>In the <a href="http://docs.datalad.org" target="_blank">Official API documentation</a> </li>
</ul>
</ul>
</section>
<section>
<h2>Resources and Further Reading</h2>
<table style="font-size:30px">
<tr>
<td>
Comprehensive user documentation in the<br>
DataLad Handbook
<a href="http://handbook.datalad.org" target="_blank">(handbook.datalad.org)</a>
</td>
<td>
<img src="../pics/logo.svg" height="150">
</td>
</tr>
</table>
<table style="font-size:30px">
<tr>
<td><img src="../pics/artwork/src/enter.svg" height="100"></a></td>
<td>
<ul>
<li>High-level function/command overviews, <br>
Installation, Configuration, Cheatsheet</li>
</ul>
</td>
</tr>
<tr>
<td><img src="../pics/artwork/src/basics.svg" height="100"></td>
<td>
<ul>
<li>Narrative-based code-along course</li>
<li>Independent on background/skill level, <br>
suitable for data management novices</li>
</ul>
</td>
</tr>
<tr>
<td><img src="../pics/artwork/src/usecases.svg" height="100"></td>
<td>
<ul>
<li>Step-by-step solutions to common <br>
data management problems, like<br />how to
make a reproducible paper</li>
</ul>
</td>
</tr>
</table>
<p style="font-size:30px">Overview of most tutorials, talks, videos, ... at
<a href="https://github.com/datalad/tutorials" target="_blank">github.com/datalad/tutorials</a> </p>
</section>
<section>
<h2>Live polling system</h2>
Please use your phone to scan to QR code, or open the link in a new browser window <br>
<iframe src="https://www.directpoll.com/r?XDbzPBd3ixYqg8huKIwKuJ7aj5lQw7fByQ4HgMgN",
style="border: 0" width="900" height="800"></iframe>
</section>
<section>
<h2>What's your mood today?</h2>
<img src="../pics/sheepscale.png" height="600"><iframe src="https://www.directpoll.com/r?XDbzPBd3ixYqg8huKIwKuJ7aj5lQw7fByQ4HgMgN",
style="border: 0" width="400" height="600"></iframe>
</section>
<section>
<h2>What's your level of excitement?</h2>
<img src="../pics/rubberduckscale.png" height="600"><iframe src="https://www.directpoll.com/r?XDbzPBd3ixYqg8huKIwKuJ7aj5lQw7fByQ4HgMgN",
style="border: 0" width="400" height="600"></iframe>
</section>
<section>
<h2>Video recordings</h2>
<small>The recording would be edited or stopped to exclude certain or all discussions.<br>
This poll is unanimous - only if everyone votes "yes" the workshop will be recorded</small>
<iframe src="https://www.directpoll.com/r?XDbzPBd3ixYqg8huKIwKuJ7aj5lQw7fByQ4HgMgN",
style="border: 0" width="900" height="800"></iframe>
</section>
<section data-background-image="https://www.datalad.org/theme/img/logo/datalad_nav_wide.png"
data-background-opacity="0.1" data-background-size="400px">
<h2>What will we do today?</h2>
<ul style="font-size:30px">
<li class="fragment fade-in-then-semi-out">
The workshop centers around
<a href="http://handbook.datalad.org/r.html?about" target="_blank">DataLad</a> (version 0.16)
</li>
<li class="fragment fade-in-then-semi-out">
We aim to do more than a standard introduction by providing in-depth explanations,
hands-on exercises, and discussions throughout the workshop
</li>
<small><li class="fragment fade-in-then-semi-out">
(Help us by asking any question that comes up!)<br>
</li></small>
<img class="fragment fade-in-then-semi-out" src="../pics/splits.jpg">
</ul>
</section>
</section>
<!--..Research data management in general..-->
<section>
<section>
<h2>Motivation</h2>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
You write a paper about an algorithm, stay up
late to generate good-looking figures, but you have to tweak parameters and
display options to make it work AND look good. The next morning, you have no
idea which parameters produced which figures, and which of the figures
fits to what you report in the paper.<br>
<img height="400" src="../pics/turingway/findfiles.png">
<img height="400" src="../pics/turingway/projectstack.png"</div>
<imgcredit>Illustration adapted from Scriberia and The Turing Way</imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
Your research project produces phenomenal results, but your laptop,
the only place that stores the source code for the results, is
stolen/breaks<br>
<img height="700" src="../pics/stolenlaptop.jpg"></div>
<imgcredit>https://co.pinterest.com/pin/551128073121451139//imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
A graduate student approaches their supervisor, complaining that the
supervisors research idea does not work. After weeks of discussion,
it becomes apparent that oral communication doesn't suffice - the
student can't sufficiently explain the environment (data, algorithms,
...) they constructed, and if the supervisor can't enter and use the
students project there's no way to find a fix.
<br>
<img height="500" src="../pics/badsupervision.gif"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section data-transition="None">
<h2>Common problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
A Post-doc wrote a script during the PhD that applied a specific
method to a dataset. Now, with new data and a new project, they
try to reuse the script, but forgot how it worked.
<br>
<img height="500" src="../pics/frustration.jpg"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section data-transition="None">
<h2>common problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
You try to recreate results from another lab's published paper.
You base your re-implementation on everything reported in their paper,
but the results you obtain look nowhere like the original.
<br>
<img height="500" src="../pics/turingway/ReadableCode.png"></div>
<imgcredit>http://phdcomics.com/comics.php?f=1693</imgcredit>
</section>
<section>
<h2><strike>common</strike> old problems in science</h2>
<div class="fragment fade-in" data-fragment-index="1">
All these problems were paraphrased from
<a href="https://sci-hub.se/https://link.springer.com/chapter/10.1007%2F978-1-4612-2544-7_5" target="_blank">
Buckheit & Donoho, <b>1995</b></a>
<br></div>
<div class="fragment fade-in">Let's do better!</div>
</section>
</section>
<!--...WHAT IS DATALAD...-->
<section>
<section data-transition="fade">
<div><table>
<tr><dl>
<img src="../pics/datalad_logo_wide.svg" height="150"><br>
<b><a href="https://www.datalad.org/" target="_blank"> DataLad</a>
can help <br> with small or large-scale <br> data management </b>
<dt></dt>
</dl></tr>
<tr><dl class="fragment fade-in">Free, <br> open source, <br> command line tool & Python API </dl></tr>
</table>
</div>
<ul style="vertical-align:middle">
<br>
<dt></dt>
</ul>
</section>
<section>
<h2> <img src="../pics/datalad_logo_wide.svg"></h2>
<ul>
<li>A command-line tool, available for all major operating systems
(Linux, macOS/OSX, Windows), MIT-licensed</li>
<li>Build on top of <a href="https://git-scm.com/" target="_blank">Git</a>
and <a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a></li>
<dt><li>Allows...</li></dt>
<dt>... version-controlling arbitrarily large content </dt>
<dd>version control data and software alongside to code!</dd>
<dt>... transport mechanisms for sharing and obtaining data </dt>
<dd>consume and collaborate on data (analyses) like software</dd>
<dt>... (computationally) reproducible data analysis</dt>
<dd>Track and share provenance of all digital objects</dd>
<dt>... and <i>much</i> more </dt>
<li>Completely domain-agnostic</li>
<br>
</ul>
</section>
<section>
<h2>Acknowledgements</h2>
<table>
<tr style="vertical-align:top">
<td style="vertical-align:top">
<dl>
<dt>Software</dt>
<dd style="margin-left:5px!important">
<ul style="margin-left:5px!important">
<li>Joey Hess (git-annex)</li>
<li>The DataLad team &
contributors</li>
</ul>
</dd>
<dt style="margin-top:20px">Illustrations </dt>
<dd style="margin-left:5px!important">
<ul style="margin-left:5px!important">
<li>The Turing Way <br>
project & Scriberia</li>
<img src="../pics/bannerthanks.svg">
</ul>
</dd>
</dl>
</td>
<td style="vertical-align:top">
<div style="margin-bottom:-20px;text-align:center"><strong>Funders</strong></div>
<img style="height:150px;margin-right:50px" data-src="../pics/nsf_2020.png" />
<img style="height:150px;margin-right:50pxi;margin-left:50px" data-src="../pics/binc.png" />
<img style="height:150px;margin-left:50px" data-src="../pics/bmbf_2020.png" />
<img style="height:80px;margin-top:-40px;margin-left:auto;margin-right:auto;width:100%" data-src="../pics/fzj_logo.svg" />
<div style="margin-top:-20px">
<img style="height:60px;margin-right:20px" data-src="../pics/erdf.png" />
<img style="height:60px;margin-right:20px" data-src="../pics/cbbs_logo.png" />
<img style="height:60px" data-src="../pics/LSA-Logo.png" />
</div>
<div style="margin-top:40px;margin-bottom:20px;text-align:center"><strong>Collaborators</strong></div>
<div style="margin-top:-20px">
<img style="height:100px;margin:20px" data-src="../pics/hbp_logo.png" />
<img style="height:100px;margin:20px" data-src="../pics/conp_logo.png" />
<img style="height:100px;margin:20px" data-src="../pics/vbc_logo.png" />
</div>
<div style="margin-top:-40px">
<img style="height:120px;margin:20px" data-src="../pics/openneuro_logo.png" />
<img style="height:120px;margin:20px" data-src="../pics/cbrain_logo.png" />
<img style="height:140px;margin:20px" data-src="../pics/brainlife_logo.png" />
</div>
</td>
</tr>
</table>
</section>
<section>
<h2>Core concepts & features</h2>
</section>
<section>
<h2>Everything happens in DataLad datasets</h2>
<img src="../pics/artwork/src/dataset.svg" width="600"> <br>
</section>
<section>
<h2>Dataset = Git/git-annex repository</h2>
<ul>
<li>content agnostic</li>
<li>no custom data structures</li>
<li>complete decentralization</li>
<li>Looks and feels like a directory on your computer:</li>
</ul>
<br>
<br>
<img src="../pics/remodnav-ds-nautilus.png" width="500"> <img src="../pics/remodnav-ds-terminal.png" width="500">
<small>File viewer and terminal view of a DataLad dataset</small>
</section>
<section>
<h2>version control arbitrarily large files</h2>
<img src="../pics/artwork/src/local_wf.svg" width="600"> <br>
<ul><p class="fragment fade-in">
Stay flexible:
<li class="fragment fade-in">Non-complex DataLad core API (easy for data management novices)</li>
<li class="fragment fade-in">Pure Git or git-annex commands (for regular Git or git-annex users, or to use specific functionality)</li>
</ul></p>
</section>
<section>
<h2>Use a datasets' history</h2>
<img src="../pics/researchlog.png">
<ul>
<li class="fragment fade-in"> reset your dataset (or subset of it) to a previous state, </li>
<li class="fragment fade-in"> revert changes or bring them back, </li>
<li class="fragment fade-in"> find out what was done when, how, why, and by whom </li>
<li class="fragment fade-in"> Identify precise versions: Use data in the most recent version, or the one from 2018, or... </li>
</ul>
</section>
<section>
<h2>Consume and collaborate</h2>
<img src="../pics/artwork/src/collaboration.svg" width="900"> <br>
</section>
<section>
<h2>machine-readable, re-executable provenance</h2>
<img src="../pics/artwork/src/reproducible_execution.svg" width="900"> <br>
</section>
<section>
<h2>Seamless nesting and dataset linkage</h2>
<img src="../pics/artwork/src/linkage_subds.svg" width="900"> <br>
<!-- <ul>
<li class="fragment fade-in" data-fragment-index="2">Overcomes scaling issues with large amounts of files</li>
<pre class="fragment fade-in" data-fragment-index="2"><code>adina@bulk1 in /ds/hcp/super on git:master❱ datalad status --annex -r
15530572 annex'd files (77.9 TB recorded total size)
nothing to save, working tree clean</code></pre>
<small><a class="fragment fade-in" data-fragment-index="2" href="https://github.com/datalad-datasets/human-connectome-project-openaccess" target="_blank">(github.com/datalad-datasets/human-connectome-project-openaccess)</a></small>
<li class="fragment fade-in">Modularizes research components for transparency, reuse, and access management</li>
</ul>
-->
</section>
<section>
<h2>Third party integrations</h2>
<img src="../pics/artwork/src/thirdparty.svg" width="900"> <br>
<small>Apart from <b>local computing infrastructure</b> (from private laptops to computational clusters),
datasets can be hosted in major <b>third party repository hosting and cloud storage</b> services.
More info: Chapter on <a href="http://handbook.datalad.org/en/latest/basics/basics-thirdparty.html" target="_blank">
Third party infrastructure</a>.</small>
</section>
<section data-transition="None">
<h3>
Examples of what DataLad can be used for:
</h3>
<ul>
<li class="fragment fade-in-then-semi-out">
Behind-the-scenes <b>infrastructure component for data transport and versioning</b>
(e.g., used by <a href="https://openneuro.org/" target="_blank"> OpenNeuro</a>,
<a href="https://brainlife.io/" target="_blank"> brainlife.io </a>,
the <a href="https://conp.ca/" target="_blank">Canadian Open Neuroscience Platform (CONP)</a>,
<a href="https://mcin.ca/technology/cbrain/" target="_blank"> CBRAIN</a>)</li>
<img height="800" class="fragment fade-in" src="../pics/openneuro2.gif" alt="a screenrecording of browsing open neuro">
</ul>
</section>
<section data-transition="None">
<h3>
Examples of what DataLad can be used for:
</h3>
<ul>
<li class="fragment fade-in-then-semi-out"> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
<img height="800" class="fragment fade-in" src="../pics/shareresearch2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
</ul>
</section>
<section data-transition="None">
<h3>
Examples of what DataLad can be used for:
</h3>
<ul>
<li> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
<img height="800" class="fragment fade-in" src="../pics/openscience.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
</ul>
</section>
<section data-transition="None">
<h3>
Examples of what DataLad can be used for:
</h3>
<ul>
<li class="fragment fade-in-then-semi-out"><b>Central data management</b> and archival system</li>
<img height="850" class="fragment fade-in" src="../pics/centralmanagement.gif">
</ul>
</section>
<section data-transition="None">
<h3>
Examples of what DataLad can be used for:
</h3>
<ul>
<li class="fragment fade-in-then-semi-out"><b>Scalable computing framework</b> for reproducible science</li>
<img height="350" class="fragment fade-in" src="../pics/fairly-big.png">
<img height="500" class="fragment fade-in" src="../pics/ukb_datasets.svg">
</ul>
</section>
<section data-transition="None">
<h3>... and many more!</h3>
<br><br><br>
Let's fire up a terminal and get started with the Basics
</section>
</section>
</div>
</div>
<script src="../reveal.js/dist/reveal.js"></script>
<script src="../reveal.js/plugin/notes/notes.js"></script>
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// The "normal" size of the presentation, aspect ratio will be preserved
// when the presentation is scaled to fit different resolutions. Can be
// specified using percentage units.
width: 1280,
height: 960,
// Factor of the display size that should remain empty around the content
margin: 0.3,
// Bounds for smallest/largest possible scale to apply to content
minScale: 0.2,
maxScale: 1.0,
controls: true,
progress: true,
history: true,
center: true,
slideNumber: 'c',
pdfSeparateFragments: false,
pdfMaxPagesPerSlide: 1,
pdfPageHeightOffset: -1,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Learn about plugins: https://revealjs.com/plugins/
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
});
</script>
</body>
</html>