346 lines
11 KiB
HTML
346 lines
11 KiB
HTML
<!doctype html>
|
|
<html>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
|
|
|
<!-- Edit me start! -->
|
|
<title>This is where your title goes</title>
|
|
<meta name="description" content=" This is where you put a short description ">
|
|
<meta name="author" content=" Your Name ">
|
|
<!-- Edit me end! -->
|
|
|
|
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
|
|
|
|
<!-- Theme used for syntax highlighted code -->
|
|
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
|
|
</head>
|
|
<body>
|
|
<div class="reveal">
|
|
<div class="slides">
|
|
|
|
<section>
|
|
<h1>Data ... <strong>Lad</strong></h1>
|
|
<p>Michael Hanke & Adina Wagner</p>
|
|
<p style="margin-top:50px"><img style="height:100px;margin-right:100px" data-src="../pics/fzj_logo.svg" />
|
|
<img style="height:100px" data-src="../pics/hhu_logo.svg" /><br/>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Use cases in data storage and retrieval</h2>
|
|
|
|
<ul>
|
|
<li>DataLad: Joint management of digital objects through their entire life cycle</li> <br>
|
|
DataLad's features,
|
|
</ul>
|
|
<li>Version control</li>
|
|
<li>Transport logistics</li>
|
|
<li>Interoperability</li>
|
|
<li>Provenance capture</li>
|
|
<ul><br>
|
|
|
|
enable a range of common use cases for storing and retrieving data
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
<img src="../pics/services_only.png" height="600">
|
|
</td>
|
|
<td>
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out">Many options for sharing <br>
|
|
and storing data</li>
|
|
<li class="fragment fade-in-then-semi-out">Often university-/lab-wide <br>
|
|
standards, but lack of generic <br>
|
|
and interoperable workflows</li>
|
|
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
<img src="../pics/services_connected.png" height="600">
|
|
</td>
|
|
<td>
|
|
<ul>
|
|
<li>Generic interface to a<br>variety of third party<br>services & storage providers</li>
|
|
<li class="fragment fade-in">Enables interoperable<br>workflows independent of<br>the chosen service </li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<img src="../pics/artwork/src/collaboration.svg" height="600">
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out"> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
|
|
</ul>
|
|
<img height="850" class="fragment fade-in" src="../pics/clonedata.gif" alt="a screenrecording of cloning studyforrest data from github">
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<ul>
|
|
<li> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
|
|
</ul>
|
|
<img height="850" src="../pics/randomginrepo.png">
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<ul>
|
|
<li> <b>Publish (or consume) datasets</b> via GitHub, GitLab, Gin, OSF, or similar services</li>
|
|
</ul>
|
|
<img height="550" src="../pics/datalad-osf.png">
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Data sharing & consumption</h2>
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out">Special Case: <b>Central data management</b> and archival system</li>
|
|
</ul>
|
|
<img height="850" class="fragment fade-in" src="../pics/centralmanagement.gif">
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section data-transition="None">
|
|
<h2>Beyond data: Provenance</h2>
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
<img src="../pics/Provenance_alpha.png">
|
|
<imgcredit>CC-BY Scriberia and The Turing Way</imgcredit>
|
|
</td>
|
|
<td>
|
|
<ul>
|
|
<li class="fragment fade-in">Create, share & use
|
|
provenance of research objects</li>
|
|
<li class="fragment fade-in">Link code, data, software
|
|
container and re-executable
|
|
analysis records
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Beyond data: Provenance</h2>
|
|
<ul>
|
|
<li class="fragment fade-in-then-semi-out"> <b>Creating and sharing reproducible, open science</b>: Sharing data, software, code, and provenance </li>
|
|
</ul>
|
|
<img height="850" class="fragment fade-in" src="../pics/shareresearch2.gif" alt="a screenrecording of cloning REMODNAV paper dataset from github">
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Containerized workflows</h2>
|
|
<img src="../pics/containers-run.svg">
|
|
</section>
|
|
|
|
|
|
<section data-transition="None">
|
|
<h2>Provenance capture</h2>
|
|
<ul>
|
|
<li><b>Computational provenance</b>: Datasets can track <b>software containers</b>,
|
|
and perform and record computations inside it:
|
|
</li>
|
|
<pre><code class="bash" style="max-height:none">$ datalad containers-run -n neuroimaging-container \
|
|
--input 'mri/*_bold.nii --output 'sub-*/LC_timeseries_run-*.csv' \
|
|
"bash -c 'for sub in sub-*; do for run in run-1 ... run-8;
|
|
do python3 code/extract_lc_timeseries.py \$sub \$run; done; done'"
|
|
|
|
-- Git commit -- Michael Hanke < ... @gmail.com>; Fri Jul 6 11:02:28 2019
|
|
[DATALAD RUNCMD] singularity exec --bind {pwd} .datalad/e...
|
|
=== Do not change lines below ===
|
|
{
|
|
"cmd": "singularity exec --bind {pwd} .datalad/environments/nilearn.simg bash..",
|
|
"dsid": "92ea1faa-632a-11e8-af29-a0369f7c647e",
|
|
"inputs": [
|
|
"mri/*.bold.nii.gz",
|
|
".datalad/environments/nilearn.simg"
|
|
],
|
|
"outputs": ["sub-*/LC_timeseries_run-*.csv"],
|
|
...
|
|
}
|
|
^^^ Do not change lines above ^^^
|
|
---
|
|
sub-01/LC_timeseries_run-1.csv | 1 +
|
|
...</code></pre>
|
|
</ul>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Provenance capture</h2>
|
|
<ul>
|
|
<li>All recorded transformations can be re-computed automatically</li>
|
|
<pre><code class="bash" style="max-height:none">$ datalad rerun eee1356bb7e8f921174e404c6df6aadcc1f158f0
|
|
[INFO] == Command start (output follows) =====
|
|
[INFO] == Command exit (modification check follows) =====
|
|
add(ok): sub-01/LC_timeseries_run-1.csv (file)
|
|
...
|
|
save(ok): . (dataset)
|
|
action summary:
|
|
add (ok: 45)
|
|
save (notneeded: 45, ok: 1)
|
|
unlock (notneeded: 45)
|
|
...</code></pre>
|
|
|
|
<ul>
|
|
<li>Aid with the reproducibility of a result and verify it (via content hash)</li>
|
|
<li>Use complete capture and automatic re-computation as alternative to storage and transport</li>
|
|
</li></li>
|
|
</ul>
|
|
|
|
</ul>
|
|
</section>
|
|
|
|
<section data-transition="None">
|
|
<h2>Containerized workflows</h2>
|
|
<img src="../pics/containersrun.gif">
|
|
</section>
|
|
<section data-transition="None">
|
|
<h2>Large-scale, containerized workflows</h2>
|
|
<img src="../pics/enkichapter.gif">
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2>Concepts: Data sharing without privacy breach</h2>
|
|
<ul>
|
|
<li>Expose (sensitive) file content versus anonymized metadata on a per-file basis</li>
|
|
</ul>
|
|
<img src="../pics/gdpr_anon_dataset.svg">
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Concepts: Computation to data</h2>
|
|
<ul>
|
|
<li>Expose sufficient anonymized metadata to allow code development,
|
|
but don't share raw data, only results</li>
|
|
</ul>
|
|
<img src="../pics/datalad_hospital.svg">
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2>Find out more</h2>
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
More use cases & comprehensive user documentation in the<br>
|
|
DataLad Handbook
|
|
<a href="http://handbook.datalad.org">(handbook.datalad.org)</a>
|
|
</td>
|
|
<td>
|
|
<img src="../pics/logo.svg" height="150">
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<table>
|
|
<tr>
|
|
<td><img src="../pics/artwork/src/enter.svg" height="100"></a></td>
|
|
<td>
|
|
<ul>
|
|
<li>High-level function/command overviews, <br>
|
|
Installation, Configuration, Cheatsheet</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td><img src="../pics/artwork/src/basics.svg" height="100"></td>
|
|
<td>
|
|
<ul>
|
|
<li>Narrative-based code-along course</li>
|
|
<li>Independent on background/skill level, <br>
|
|
suitable for data management novices</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td><img src="../pics/artwork/src/usecases.svg" height="100"></td>
|
|
<td>
|
|
<ul>
|
|
<li>Step-by-step solutions to common <br>
|
|
data management problems</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<aside class="notes">
|
|
- what is in it?
|
|
- how is it structured?
|
|
- who and what is it aiming for?
|
|
- show "big picture" figure
|
|
- claim data management demands of science map well onto datalad functionality
|
|
- summarize remaining principles (obsoletion insurance, etc.)
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
|
|
</div>
|
|
</div>
|
|
|
|
<script src="../reveal.js/dist/reveal.js"></script>
|
|
<script src="../reveal.js/plugin/notes/notes.js"></script>
|
|
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
|
|
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
|
|
<script>
|
|
// More info about initialization & config:
|
|
// - https://revealjs.com/initialization/
|
|
// - https://revealjs.com/config/
|
|
Reveal.initialize({
|
|
hash: true,
|
|
// The "normal" size of the presentation, aspect ratio will be preserved
|
|
// when the presentation is scaled to fit different resolutions. Can be
|
|
// specified using percentage units.
|
|
width: 1280,
|
|
height: 960,
|
|
// Factor of the display size that should remain empty around the content
|
|
margin: 0.3,
|
|
// Bounds for smallest/largest possible scale to apply to content
|
|
minScale: 0.2,
|
|
maxScale: 1.0,
|
|
|
|
controls: true,
|
|
progress: true,
|
|
history: true,
|
|
center: true,
|
|
slideNumber: 'c',
|
|
pdfSeparateFragments: false,
|
|
pdfMaxPagesPerSlide: 1,
|
|
pdfPageHeightOffset: -1,
|
|
transition: 'slide', // none/fade/slide/convex/concave/zoom
|
|
// Learn about plugins: https://revealjs.com/plugins/
|
|
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
|
|
});
|
|
</script>
|
|
</body>
|
|
</html>
|