datalad-course/html/workshop_oldenburg.html

346 lines
11 KiB
HTML

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<!-- Edit me start! -->
<title>This is where your title goes</title>
<meta name="description" content=" This is where you put a short description ">
<meta name="author" content=" Your Name ">
<!-- Edit me end! -->
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<h1>Data ... <strong>Lad</strong></h1>
<p>Michael Hanke &amp; Adina Wagner</p>
<p style="margin-top:50px"><img style="height:100px;margin-right:100px" data-src="../pics/fzj_logo.svg" />
<img style="height:100px" data-src="../pics/hhu_logo.svg" /><br/>
</section>
<section data-transition="None">
<h2>Use cases in data storage and retrieval</h2>
<ul>
<li>DataLad: Joint management of digital objects through their entire life cycle</li> <br>
DataLad's features,
</ul>
<li>Version control</li>
<li>Transport logistics</li>
<li>Interoperability</li>
<li>Provenance capture</li>
<ul><br>
enable a range of common use cases for storing and retrieving data
</ul>
</section>
<section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<table>
<tr>
<td>
<img src="../pics/services_only.png" height="600">
</td>
<td>
<ul>
<li class="fragment fade-in-then-semi-out">Many options for sharing <br>
and storing data</li>
<li class="fragment fade-in-then-semi-out">Often university-/lab-wide <br>
standards, but lack of generic <br>
and interoperable workflows</li>
</ul>
</td>
</tr>
</table>
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<table>
<tr>
<td>
<img src="../pics/services_connected.png" height="600">
</td>
<td>
<ul>
<li>Generic interface to a<br>variety of third party<br>services & storage providers</li>
<li class="fragment fade-in">Enables interoperable<br>workflows independent of<br>the chosen service </li>
</ul>
</td>
</tr>
</table>
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<img src="../pics/artwork/src/collaboration.svg" height="600">
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<ul>
<li class="fragment fade-in-then-semi-out"> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
</ul>
<img height="850" class="fragment fade-in" src="../pics/clonedata.gif" alt="a screenrecording of cloning studyforrest data from github">
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<ul>
<li> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
</ul>
<img height="850" src="../pics/randomginrepo.png">
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<ul>
<li> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
</ul>
<img height="550" src="../pics/datalad-osf.png">
</section>
<section data-transition="None">
<h2>Data sharing & consumption</h2>
<ul>
<li class="fragment fade-in-then-semi-out">Special Case: <b>Central data management</b> and archival system</li>
</ul>
<img height="850" class="fragment fade-in" src="../pics/centralmanagement.gif">
</section>
</section>
<section>
<section data-transition="None">
<h2>Beyond data: Provenance</h2>
<table>
<tr>
<td>
<img src="../pics/Provenance_alpha.png">
<imgcredit>CC-BY Scriberia and The Turing Way</imgcredit>
</td>
<td>
<ul>
<li class="fragment fade-in">Create, share & use
provenance of research objects</li>
<li class="fragment fade-in">Link code, data, software
container and re-executable
analysis records
</li>
</ul>
</td>
</tr>
</table>
</section>
<section data-transition="None">
<h2>Beyond data: Provenance</h2>
<ul>
<li class="fragment fade-in-then-semi-out"> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
</ul>
<img height="850" class="fragment fade-in" src="../pics/shareresearch2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
</section>
<section data-transition="None">
<h2>Containerized workflows</h2>
<img src="../pics/containers-run.svg">
</section>
<section data-transition="None">
<h2>Provenance capture</h2>
<ul>
<li><b>Computational provenance</b>: Datasets can track <b>software containers</b>,
and perform and record computations inside it:
</li>
<pre><code class="bash" style="max-height:none">$ datalad containers-run -n neuroimaging-container \
--input 'mri/*_bold.nii --output 'sub-*/LC_timeseries_run-*.csv' \
"bash -c 'for sub in sub-*; do for run in run-1 ... run-8;
do python3 code/extract_lc_timeseries.py \$sub \$run; done; done'"
-- Git commit -- Michael Hanke < ... @gmail.com>; Fri Jul 6 11:02:28 2019
[DATALAD RUNCMD] singularity exec --bind {pwd} .datalad/e...
=== Do not change lines below ===
{
"cmd": "singularity exec --bind {pwd} .datalad/environments/nilearn.simg bash..",
"dsid": "92ea1faa-632a-11e8-af29-a0369f7c647e",
"inputs": [
"mri/*.bold.nii.gz",
".datalad/environments/nilearn.simg"
],
"outputs": ["sub-*/LC_timeseries_run-*.csv"],
...
}
^^^ Do not change lines above ^^^
---
sub-01/LC_timeseries_run-1.csv | 1 +
...</code></pre>
</ul>
</section>
<section data-transition="None">
<h2>Provenance capture</h2>
<ul>
<li>All recorded transformations can be re-computed automatically</li>
<pre><code class="bash" style="max-height:none">$ datalad rerun eee1356bb7e8f921174e404c6df6aadcc1f158f0
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
add(ok): sub-01/LC_timeseries_run-1.csv (file)
...
save(ok): . (dataset)
action summary:
add (ok: 45)
save (notneeded: 45, ok: 1)
unlock (notneeded: 45)
...</code></pre>
<ul>
<li>Aid with the reproducibility of a result and verify it (via content hash)</li>
<li>Use complete capture and automatic re-computation as alternative to storage and transport</li>
</li></li>
</ul>
</ul>
</section>
<section data-transition="None">
<h2>Containerized workflows</h2>
<img src="../pics/containersrun.gif">
</section>
<section data-transition="None">
<h2>Large-scale, containerized workflows</h2>
<img src="../pics/enkichapter.gif">
</section>
</section>
<section>
<section>
<h2>Concepts: Data sharing without privacy breach</h2>
<ul>
<li>Expose (sensitive) file content versus anonymized metadata on a per-file basis</li>
</ul>
<img src="../pics/gdpr_anon_dataset.svg">
</section>
<section>
<h2>Concepts: Computation to data</h2>
<ul>
<li>Expose sufficient anonymized metadata to allow code development,
but don't share raw data, only results</li>
</ul>
<img src="../pics/datalad_hospital.svg">
</section>
</section>
<section>
<section>
<h2>Find out more</h2>
<table>
<tr>
<td>
More use cases & comprehensive user documentation in the<br>
DataLad Handbook
<a href="http://handbook.datalad.org">(handbook.datalad.org)</a>
</td>
<td>
<img src="../pics/logo.svg" height="150">
</td>
</tr>
</table>
<table>
<tr>
<td><img src="../pics/artwork/src/enter.svg" height="100"></a></td>
<td>
<ul>
<li>High-level function/command overviews, <br>
Installation, Configuration, Cheatsheet</li>
</ul>
</td>
</tr>
<tr>
<td><img src="../pics/artwork/src/basics.svg" height="100"></td>
<td>
<ul>
<li>Narrative-based code-along course</li>
<li>Independent on background/skill level, <br>
suitable for data management novices</li>
</ul>
</td>
</tr>
<tr>
<td><img src="../pics/artwork/src/usecases.svg" height="100"></td>
<td>
<ul>
<li>Step-by-step solutions to common <br>
data management problems</li>
</ul>
</td>
</tr>
</table>
<aside class="notes">
- what is in it?
- how is it structured?
- who and what is it aiming for?
- show "big picture" figure
- claim data management demands of science map well onto datalad functionality
- summarize remaining principles (obsoletion insurance, etc.)
</aside>
</script>
</section>
</section>
</div>
</div>
<script src="../reveal.js/dist/reveal.js"></script>
<script src="../reveal.js/plugin/notes/notes.js"></script>
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// The "normal" size of the presentation, aspect ratio will be preserved
// when the presentation is scaled to fit different resolutions. Can be
// specified using percentage units.
width: 1280,
height: 960,
// Factor of the display size that should remain empty around the content
margin: 0.3,
// Bounds for smallest/largest possible scale to apply to content
minScale: 0.2,
maxScale: 1.0,
controls: true,
progress: true,
history: true,
center: true,
slideNumber: 'c',
pdfSeparateFragments: false,
pdfMaxPagesPerSlide: 1,
pdfPageHeightOffset: -1,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Learn about plugins: https://revealjs.com/plugins/
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
});
</script>
</body>
</html>