667 lines
25 KiB
HTML
667 lines
25 KiB
HTML
<!doctype html>
|
|
<html>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
|
|
|
<!-- Edit me start! -->
|
|
<title>This is where your title goes</title>
|
|
<meta name="description" content=" This is where you put a short description ">
|
|
<meta name="author" content=" Your Name ">
|
|
<!-- Edit me end! -->
|
|
|
|
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
|
|
|
|
<!-- Theme used for syntax highlighted code -->
|
|
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
|
|
</head>
|
|
<body>
|
|
<div class="reveal">
|
|
<div class="slides">
|
|
|
|
|
|
|
|
<section>
|
|
<section>
|
|
<h2><small>Data management for Neuroscience:</small><br /> The BIDS standard
|
|
and DataLad<br /><small>an Introduction</small></h2>
|
|
|
|
<div style="margin-top:1em;text-align:center">
|
|
<table style="border: none;">
|
|
<tr>
|
|
<td>Adina Wagner
|
|
<br><small>
|
|
<a href="https://twitter.com/AdinaKrik" target="_blank">
|
|
<img data-src="../pics/twitter.png" style="height:30px;margin:0px" />
|
|
@AdinaKrik</a></small></td>
|
|
<td><img style="height:100px;margin-right:10px" data-src="../pics/fzj_logo.svg" />
|
|
<br></td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<small><a href="http://psychoinformatics.de" target="_blank">Psychoinformatics lab</a>,
|
|
<br> Institute of Neuroscience and
|
|
Medicine, Brain & Behavior (INM-7)<br>
|
|
Research Center Jülich</small><br>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</div>
|
|
<br><br><small>
|
|
Slides: <a href="https://github.com/datalad-handbook/course/blob/master/talks/PDFs/IRTGworkshop_Nov19_wagner.pdf">
|
|
https://github.com/datalad-handbook/course/</a></small>
|
|
</a>
|
|
<aside class="notes">
|
|
<li>its dark already, the workshop will be intense, only 2 hours -> there will be lots of links
|
|
for you to re-read things in detail later</li>
|
|
<li>I take about one hour, then we do data management together</li>
|
|
</aside>
|
|
|
|
<aside class="notes">
|
|
<li>its dark already, the workshop will be intense, only 2 hours -> there will be lots of links
|
|
for you to re-read things in detail later</li>
|
|
<li>I take about one hour, then we do data management together</li>
|
|
</aside>
|
|
|
|
</section>
|
|
</section>
|
|
|
|
<!--...INTRODUCTION...-->
|
|
|
|
<section>
|
|
|
|
|
|
<section>
|
|
<h2>What is (research) data management?</h2>
|
|
<ul>
|
|
<li class="fragment fade-in">(Research) Data = every digital object involved in your project:
|
|
code, software/tools, raw data, processed data, results, manuscripts ...</li>
|
|
<li class="fragment fade-in">... needs to be properly managed - from its creation to its use, publication,
|
|
sharing, archiving, re-use, or destruction... (keyword:
|
|
<a href="https://www.go-fair.org/fair-principles/" target="_blank">FAIR</a> data) </li>
|
|
</ul>
|
|
<img src="../pics/datalifecycle_jisc_ccbysand.png" class="fragment fade-in">
|
|
<ul>
|
|
<li class="fragment fade-in">Research data management is a key component for reproducibility, efficiency, and impact/reach
|
|
of data analysis projects</li>
|
|
</ul>
|
|
<imgcredit>JISC; CC-BY-SA-ND</imgcredit>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Why data management?</h2>
|
|
<img src="../pics/frontend_vs_backend_paper.png" style="box-shadow: 10px 10px 8px #888888;height=600px">
|
|
<imgcredit>adapted from https://dribbble.com/shots/3090048-Front-end-vs-Back-end</imgcredit>
|
|
<ul>
|
|
<li class="fragment fade-in">Funders & publishers require it</li>
|
|
<li class="fragment fade-in">Scientific peers increasingly expect it</li>
|
|
<li class="fragment fade-in">The quality and efficiency of your work improves</li>
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>How do you spend your time?</h2>
|
|
<ul>
|
|
<table>
|
|
<tr>
|
|
<td>CrowdFlower <br> <a href="https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport.pdf">DataScience Report 2017</a></td>
|
|
</tr>
|
|
<td><img class="fragment fade-in" src="../pics/sciencetime1.jpg" height="350"></td>
|
|
<td><img class="fragment fade-in" height="400" src="../pics/sciencetime.jpg"></td>
|
|
</table>
|
|
<blockquote class="fragment fade-in" cite="Thomas Wachtler">
|
|
Collaborative work and re-use of data are hampered by the effort it takes to access and
|
|
understand the data. <br>
|
|
<small>Thomas Wachtler</small></blockquote></td>
|
|
<li class="fragment fade-in">Good data management can make your and others work & life much easier!</li>
|
|
</ul>
|
|
<aside class="notes">
|
|
Data management is for yourself! You can concentrate on science, instead
|
|
of organizing data constantly
|
|
<li>not only others/your data, also code!</li>
|
|
</aside>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>How is (research) data management possible?</h2>
|
|
|
|
<p>There are tools and concepts that can help:
|
|
</p>
|
|
|
|
<ul>
|
|
<li><b>Version control</b> your data</li>
|
|
<li><b>Standardize</b> file names and organization</li>
|
|
<li><b>Document</b> <i>everything</i>, ideally automatically</li>
|
|
</ul>
|
|
<aside class="notes">
|
|
<li>Knowing about these tools is the first step</li>
|
|
<li>Today you'll get a glimpse of all of that</li>
|
|
</aside>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Why version control?</h2>
|
|
<img src="../pics/final.png" style="box-shadow: 10px 10px 8px #888888;height=600px" height="600"><br>
|
|
<ul>
|
|
<li class="fragment fade-in">keep things organized</li>
|
|
<li class="fragment fade-in">keep track of changes</li>
|
|
</ul>
|
|
<aside class="notes">
|
|
<li>Not only manuscripts, but also data!</li>
|
|
</aside>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Why standards?</h2>
|
|
<img src="../pics/datasharing_xkcd.png" style="box-shadow: 10px 10px 8px #888888;height=800px" height="700">
|
|
<ul>
|
|
<li class="fragment fade-in">reduce misunderstanding and rewriting/rearranging efforts</li>
|
|
</ul>
|
|
|
|
<aside class="notes">
|
|
<li>A lack of a standard or consensus in how to name or structure data leads to misunderstanding
|
|
and time wasted on rearranging data/rewriting scripts</li>
|
|
<li>Standards make your data compatible with others code, and vice versa -> mutual benefit</li>
|
|
<li>Clever: use the standard that is beneficial for you (don't reinvent the wheel)</li>
|
|
</aside>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section data-markdown data-transition="fade"><script type="text/template">
|
|
<!-- .element: style="height:150px;margin-left:50px" class="fragment fade-in"-->
|
|
<!-- .element:style="height:150px;margin-left:50px" class="fragment fade-in" -->
|
|
|
|
<aside class="notes">
|
|
Agenda for today: BIDS standard and data management
|
|
<ul>
|
|
<li>Who has heard of BIDS?</li>
|
|
<li>Who has heard of DataLad?</li>
|
|
</ul>
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
|
|
<section data-transition="fade">
|
|
<h3>Tools that can help with data management</h3>
|
|
<div><table>
|
|
<tr><dl>
|
|
<img src="../pics/BIDS_Logo.png" height="150">
|
|
<dt></dt>
|
|
</dl></tr>
|
|
<tr><dl>
|
|
<img src="../pics/datalad_logo_wide.svg" height="150">
|
|
<dt></dt>
|
|
</dl></tr>
|
|
</table>
|
|
</div>
|
|
<ul style="vertical-align:middle">
|
|
<dd class="fragment fade-in"><i>What</i> is it?</dd>
|
|
<dd class="fragment fade-in"><i>Why</i> should I use it?</dd>
|
|
<dd class="fragment fade-in"><i>How</i> can I use it?</dd>
|
|
</ul>
|
|
<aside class="notes">
|
|
<ul>
|
|
<li>There will be a lot of content in the next two hours given the darkness</li>
|
|
<li>I'll provide a list of pointers to useful resources</li>
|
|
<li>If you want and have your computer configured follow along when I tell you</li>
|
|
|
|
<li>PLEASE, do ask dumb questions!</li>
|
|
</ul>
|
|
</aside>
|
|
</section>
|
|
|
|
<!--...THE BIDS STANDARD...-->
|
|
|
|
<section data-markdown><script type="text/template">
|
|
<!-- .element: height="100" -->
|
|
<!-- .element: height="600" -->
|
|
<imgcredit>BIDS; CC-BY</imgcredit>
|
|
|
|
BIDS is a standard for a multitude of neuroimaging data (MRI, EEG, ...). It defines a data organization, naming schemes for files, and meta data descriptors. <!-- .element: class="fragment" -->
|
|
|
|
<aside class="notes">
|
|
<li>WHAT is BIDS?</li>
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
|
|
<section>
|
|
<h2><img style="margin-top:50px" src="../pics/BIDS_Logo.png" height="100"></h2>
|
|
<dl>
|
|
<dt class="fragment fade-in">BIDS is a structure</dt>
|
|
<dd class="fragment fade-in">BIDS is not a new file format</dd>
|
|
<dt class="fragment fade-in">BIDS is a standard</dt>
|
|
<dd class="fragment fade-in">BIDS is not a software tool. However, there is a large and growing amount of
|
|
tools that are compatible with it that ease data archiving, data discovery/search,
|
|
and analysis</dd>
|
|
<dt class="fragment fade-in">There is no "one" BIDS</dt>
|
|
<dd class="fragment fade-in">BIDS exist for (a growing amount of) different modalities.
|
|
There is constant (open!) development of all of them</dd>
|
|
</dl>
|
|
<aside class="notes">
|
|
<li>WHAT is BIDS</li>
|
|
</aside>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2><img src="../pics/BIDS_Logo.png" height="100"></h2>
|
|
<dl>
|
|
<dt class="fragment fade-in">BIDS helps you and others to intuitively understand your data</dt>
|
|
<dd class="fragment fade-in"> A consensus on data organization spares you and others time to dive in to or rearrange data or scripts</dd>
|
|
|
|
<dt class="fragment fade-in">BIDS opens up a large range of tools for you:</dt>
|
|
<dd class="fragment fade-in">⮊ <a href="https://bids-apps.neuroimaging.io/about/" target="_blank">BIDS Apps</a>, e.g.,
|
|
<a href="https://fmriprep.readthedocs.io/en/stable/" target="_blank">fmriprep</a> or
|
|
<a href="https://mriqc.readthedocs.io/en/stable/" target="_blank">MRIQC</a></dd>
|
|
<dd class="fragment fade-in">⮊ BIDS-aware tooling, e.g.,
|
|
<a href="https://bids-standard.github.io/pybids/" target="_blank">PyBIDS</a> </dd>
|
|
<dd class="fragment fade-in">⮊ <a href="https://brainlife.io/" target="_blank">brainlife.io</a></dd>
|
|
<dd class="fragment fade-in">⮊ Upload your (or download others) BIDS-compliant
|
|
datasets from <a href="https://openneuro.org/" target="_blank">OpenNeuro </a></dd>
|
|
<dd class="fragment fade-in"></dd>
|
|
</dl>
|
|
<aside class="notes">
|
|
<li>WHY should I use it?</li>
|
|
</aside>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2><img src="../pics/BIDS_Logo.png" height="100"></h2>
|
|
<div align="left">
|
|
<dl><dt>Useful links and pointers</dt>
|
|
|
|
<dd>⮊ Read the <a href="https://www.nature.com/articles/sdata201644" target="_blank">paper</a></dd>
|
|
<dd>⮊ Get started with the <a href="https://github.com/bids-standard/bids-starter-kit" target="_blank">BIDS starter-kit</a></dd>
|
|
<dd>⮊ Work through the Stanford Center for Reproducible Neuroscience
|
|
<a href="http://reproducibility.stanford.edu/bids-tutorial-series-part-1a/" target="_blank">BIDS Tutorial Series</a></dd>
|
|
<dd>⮊ Use the <a href="http://bids-standard.github.io/bids-validator/" target="_blank">BIDS validator</a> to check your datasets</dd>
|
|
<dd>⮊<a href="https://github.com/bids-standard/bids-specification" target="_blank"> Get involved</a> on
|
|
Github to help shape BIDS to your needs. You can also checkout the Google
|
|
<a href="https://groups.google.com/forum/#!forum/bids-discussion" target="_blank">discussion group</a></dd>
|
|
<dd>⮊ Follow <a href="https://twitter.com/BIDSstandard" target="_blank">@BIDS-standard</a></dd>
|
|
</dl>
|
|
</div>
|
|
<aside class="notes">
|
|
<li>HOW can I use it?</li>
|
|
</aside>
|
|
</section>
|
|
</section>
|
|
|
|
<!--...DATA MANAGEMENT WITH DATALAD...-->
|
|
|
|
<section>
|
|
<section data-transition="fade">
|
|
<h2><img src="../pics/datalad-animated.gif"></h2>
|
|
</section>
|
|
|
|
<section data-transition="fade">
|
|
<h2><img src="../pics/datalad_logo_wide.svg"></h2>
|
|
DataLad is a data management multitool.
|
|
<aside class="notes">
|
|
<li>demonstrate a reproducible paper now</li>
|
|
</aside>
|
|
</section>
|
|
|
|
<section data-transition="fade">
|
|
<h2><img src="../pics/datalad_logo_wide.svg"></h2>
|
|
<a href="https://github.com/psychoinformatics-de/paper-remodnav/" target="_blank">Let's see it in action</a>
|
|
<aside class="notes">
|
|
<li>log into brainbfast?</li>
|
|
<li>TODO: What do I talk about while this compiles?</li>
|
|
<ul>
|
|
<li>who plans to follow along on the computer?</li>
|
|
<li>which operating system do we have?</li>
|
|
<li>has everyone DataLad installed? Where there installation problems?</li>
|
|
</ul>
|
|
</aside>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Reproducible paper - a Magic trick?</h2>
|
|
|
|
If curious, you can read up all the details and a step-by-step instruction
|
|
<a href="http://handbook.datalad.org/en/latest/usecases/reproducible-paper.html" target="_blank">here.</a>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2> <img src="../pics/datalad_logo_wide.svg"> in brief</h2>
|
|
<ul>
|
|
<li>A command-line tool, available for all major operating systems (Linux, macOS/OSX, Windows)</li>
|
|
<li>Build on top of <a href="https://git-scm.com/" target="_blank">Git</a>
|
|
and <a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a></li>
|
|
<dt><li>Allows...</li></dt>
|
|
<dd>... version-controlling arbitrarily large content,</dd>
|
|
<dd>... easily sharing and obtaining data (note: no data hosting!),</dd>
|
|
<dd>... (computationally) reproducible data analysis,
|
|
<dd>... and <i>much</i> more </dd>
|
|
<li>Completely domain-agnostic</li>
|
|
<br>
|
|
|
|
<dt class="fragment fade-in">Today: Basic concepts and commands. </dt>
|
|
<table class="fragment fade-in">
|
|
<tr>
|
|
<td>⮊ <b>For more:</b> Read <a
|
|
href="http://handbook.datalad.org/en/latest/index.html" target="_blank">
|
|
the DataLad Handbook</a>
|
|
</td>
|
|
<td><img src="../pics/logo.svg" height="100"></td>
|
|
|
|
</tr>
|
|
</table>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>DataLad Datasets</h2>
|
|
|
|
<ul>
|
|
<li>DataLad's core data structure</li>
|
|
<ul>
|
|
<li class="fragment fade-in">Dataset = A directory managed by DataLad</li>
|
|
<li class="fragment fade-in">Any directory of your computer can be managed by DataLad.</li>
|
|
<li class="fragment fade-in">Datasets can be <i>created</i> (from scratch) or <i>installed</i></li>
|
|
<li class="fragment fade-in">Datasets can be nested: <i>linked subdirectories</i></li>
|
|
</ul>
|
|
</ul>
|
|
|
|
<aside class="notes">
|
|
<li>anything can be managed: CV, website, music library, phd</li>
|
|
<li>show this on the manuscript repo: history, looks/feels</li>
|
|
</aside>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Experience a DataLad dataset</h2>
|
|
|
|
Code to follow along:
|
|
<a href="http://handbook.datalad.org/en/latest/code_from_chapters/01_dataset_basics_code.html" target="_blank">
|
|
http://handbook.datalad.org/en/latest/code_from_chapters/01_dataset_basics_code.html
|
|
</a>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Local version control</h2>
|
|
|
|
<p>Procedurally, version control is easy with DataLad!</p>
|
|
<img class="fragment fade-in" src="../pics/local_wf.svg" height="500"> <!-- .element: class="fragment" -->
|
|
<br>
|
|
|
|
<b class="fragment fade-in">Advice:</b>
|
|
<ul>
|
|
<li class="fragment fade-in">Save <i>meaningful</i> units of change</li>
|
|
<li class="fragment fade-in">Attach helpful commit messages</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Summary - Local version control</h3>
|
|
|
|
<dl>
|
|
<dt class="fragment fade-in"><code>datalad create</code> creates an empty dataset.</dt> <dd class="fragment fade-in">Configurations (<b>-c yoda</b>, <b>-c text2git</b>) are useful.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">A dataset has a <i>history</i> to track files and their modifications. </dt><dd class="fragment fade-in">Explore it with Git (<b>git log</b>) or external tools (e.g., <b>tig</b>).</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad save</code> records the dataset or file state to the history.</dt> <dd class="fragment fade-in">Concise <b>commit messages</b> should summarize the change for future you and others.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad status</code> reports the current state of the dataset.</dt> <dd class="fragment fade-in">A clean dataset status is good practice.</dd>
|
|
</dl>
|
|
</section>
|
|
|
|
|
|
<section data-markdown><script type="text/template">
|
|
## From here <span class="fragment" data-fragment-index="1" style="margin-left:350px">to this:</span>
|
|
<!-- .element: height="780" style="box-shadow: 10px 10px 8px #888888" -->
|
|
<!-- .element: class="fragment" data-fragment-index="1" height="780" style="box-shadow: 10px 10px 8px #888888" -->
|
|
<imgcredit>www.phdcomics.com; www.linode.com</imgcredit>
|
|
|
|
<p class="fragment" data-fragment-index="2">BUT: Version control is only one aspect of data management</p>
|
|
|
|
<aside class="notes">
|
|
Note to self
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Consuming datasets</h2>
|
|
<img class="fragment fade-in" src="../pics/virtual_dstree_dl101.svg" height="600">
|
|
<ul>
|
|
<li class="fragment fade-in">Datasets are light-weight: Upon installation, only small
|
|
files and meta data about file availability are retrieved.</li>
|
|
<li class="fragment fade-in">Content can be obtained on demand via <code>datalad get</code>.</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Dataset nesting</h2>
|
|
<img src="../pics/linkage.svg" height="500">
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Summary - Dataset consumption & nesting</h3>
|
|
|
|
<ul>
|
|
<dt class="fragment fade-in"><code>datalad install</code> installs a dataset.</dt><dd class="fragment fade-in"> It can be installed “on its own”:
|
|
Specify the <b>--source/-s</b> of the dataset, and an optional <b>path</b> for it to be installed to.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets can be installed as subdatasets within an existing dataset. </dt> <dd class="fragment fade-in"> The <b>--dataset/-d</b> option needs a path to the root of the superdataset.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Only small files and metadata about file availability are present locally after an install. </dt><dd class="fragment fade-in">To retrieve actual file content of larger files, <code>datalad get </code> downloads large file content on demand.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad status</code> can report on total and retrieved repository size</dt> <dd class="fragment fade-in">using <code>--annex</code> and <code>--annex all</code> options.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets preserve their history.</dt> <dd class="fragment fade-in">The superdataset records only the <i>version state</i> of the subdataset.</dd>
|
|
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section data-transition="fade">
|
|
<h2>reproducible data analysis</h2>
|
|
<img src="../pics/ownlegacycode_phd.png" height="500">
|
|
<imgcredit>Full comic at <a href="http://phdcomics.com/comics.php?f=1689">http://phdcomics.com/comics.php?f=1979</a></imgcredit>
|
|
|
|
<p class="fragment fade-in">Code to follow along:
|
|
<a href="http://handbook.datalad.org/en/latest/code_from_chapters/10_yoda_code.html" target="_blank">
|
|
http://handbook.datalad.org/en/latest/code_from_chapters/10_yoda_code.html
|
|
</a>
|
|
</p>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Basic organizational principles for datasets</h2>
|
|
<dl>
|
|
<dt>Keep everything clean and modular</dt>
|
|
<li>An analysis is a superdataset, its components are subdatasets, and its structure modular</li>
|
|
<table>
|
|
<tr>
|
|
<td><img src="../pics/dataset_modules.png" height="400"></td>
|
|
<td><pre><code class="bash" style="max-height:none">├── code/
|
|
│ ├── tests/
|
|
│ └── myscript.py
|
|
├── docs
|
|
│ ├── build/
|
|
│ └── source/
|
|
├── envs
|
|
│ └── Singularity
|
|
├── inputs/
|
|
│ └─── data/
|
|
│ ├── dataset1/
|
|
│ │ └── datafile_a
|
|
│ └── dataset2/
|
|
│ └── datafile_a
|
|
├── outputs/
|
|
│ └── important_results/
|
|
│ └── figures/
|
|
└── README.md</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
|
|
</dl>
|
|
<ul>
|
|
<li>do not touch/modify raw data: save any results/computations <i>outside</i> of input datasets</li>
|
|
<li>Keep a superdataset self-contained: Scripts reference subdatasets or files with <i>relative paths</i></li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Basic organizational principles for datasets</h2>
|
|
<dl>
|
|
<dt>Record where you got it from, where it is now, and what you do to it</dt>
|
|
<li>Link datasets (as subdatasets), record data origin</li>
|
|
<li>Collect and store provenance of all contents of a dataset that you create</li>
|
|
<table style="verticala-lign:middle">
|
|
<tr><img src="../pics/dataset_linkage_provenance.png"></tr>
|
|
</table>
|
|
<dl>
|
|
<dt>Document everything:</dt>
|
|
<li>Which script produced which output? From which data? In which software environment? ... </li>
|
|
</dl>
|
|
</dl>
|
|
<note>Find out more about organizational principles in
|
|
<a href="" target="_blank">the YODA principles</a>!</note>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>A classification analysis on the iris flower dataset</h2>
|
|
<img src="../pics/iris-machinelearning.png" height="300">
|
|
<img src="../pics/iris_cluster.png" height="450">
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Reproducible execution & provenance capture</h2>
|
|
|
|
<p>datalad run</p>
|
|
<img class="fragment fade-in" src="../pics/run_prov.svg" height="600"> <!-- .element: class="fragment" -->
|
|
</section>
|
|
|
|
|
|
</section>
|
|
|
|
<!--...SUMMARY...-->
|
|
<section>
|
|
<section>
|
|
<h2>How to get started with BIDS and DataLad</h2>
|
|
<dl>
|
|
<dt class="fragment fade-in">Check out BIDS compliant datasets - with DataLad!</dt>
|
|
<dd class="fragment fade-in">
|
|
<pre><code style="max-height:none">$ datalad install ///openneuro/ds000001
|
|
[INFO ] Cloning http://datasets.datalad.org/openneuro/ds000001 [1 other candidates] into '/tmp/ds000001'
|
|
[INFO ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
|
|
| datalad siblings -d "/tmp/ds000001" enable -s s3-PRIVATE
|
|
install(ok): /tmp/ds000001 (dataset)
|
|
|
|
$ cd ds000001
|
|
$ ls sub-01/*
|
|
sub-01/anat:
|
|
sub-01_inplaneT2.nii.gz sub-01_T1w.nii.gz
|
|
|
|
sub-01/func:
|
|
sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz
|
|
sub-01_task-balloonanalogrisktask_run-01_events.tsv
|
|
sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz
|
|
sub-01_task-balloonanalogrisktask_run-02_events.tsv
|
|
sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz
|
|
sub-01_task-balloonanalogrisktask_run-03_events.tsv
|
|
</code></pre>
|
|
</dd>
|
|
<dt class="fragment fade-in">Read <a href="https://handbook.datalad.org"> the DataLad handbook</a></dt>
|
|
<dd class="fragment fade-in">An interactive, hands-on crash-course (free and open source)</dd>
|
|
|
|
</dl>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Acknowledgements</h2>
|
|
<table>
|
|
<tr style="vertical-align:middle">
|
|
<td style="vertical-align:middle horizontal-align:top" >
|
|
<ul>
|
|
<dt>The BIDS standard</dt>
|
|
<li>Chris Gorgolewski</li>
|
|
<li>Russ Poldrack</li>
|
|
<li>100(0?)+ additional contributors</li>
|
|
<img src="../pics/BIDS_Logo.png" height="120">
|
|
</ul>
|
|
</td>
|
|
<td style="vertical-align:middle">
|
|
<ul>
|
|
<img src="../pics/datalad_logo_wide.svg" height="120">
|
|
<li>Michael Hanke</li>
|
|
<li>Yaroslav Halchenko</li>
|
|
<li>Joey Hess (git-annex)</li>
|
|
<li>Benjamin Poldrack</li>
|
|
<li>Kyle Meyer</li>
|
|
<li>22+ additional contributors</li>
|
|
</ul>
|
|
</td>
|
|
<td style="vertical-align:middle horizontal-align:top" >
|
|
<ul>
|
|
<dt>The DataLad Handbook</dt>
|
|
<li>Laura Waite</li>
|
|
<li>Michael Hanke</li>
|
|
<li>11+ additional contributors</li>
|
|
<img src="../pics/logo.svg" height="140">
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Thank you!</h3>
|
|
<h1>Questions?</h1>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
</div>
|
|
</div>
|
|
|
|
<script src="../reveal.js/dist/reveal.js"></script>
|
|
<script src="../reveal.js/plugin/notes/notes.js"></script>
|
|
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
|
|
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
|
|
<script>
|
|
// More info about initialization & config:
|
|
// - https://revealjs.com/initialization/
|
|
// - https://revealjs.com/config/
|
|
Reveal.initialize({
|
|
hash: true,
|
|
// The "normal" size of the presentation, aspect ratio will be preserved
|
|
// when the presentation is scaled to fit different resolutions. Can be
|
|
// specified using percentage units.
|
|
width: 1280,
|
|
height: 960,
|
|
// Factor of the display size that should remain empty around the content
|
|
margin: 0.3,
|
|
// Bounds for smallest/largest possible scale to apply to content
|
|
minScale: 0.2,
|
|
maxScale: 1.0,
|
|
|
|
controls: true,
|
|
progress: true,
|
|
history: true,
|
|
center: true,
|
|
slideNumber: 'c',
|
|
pdfSeparateFragments: false,
|
|
pdfMaxPagesPerSlide: 1,
|
|
pdfPageHeightOffset: -1,
|
|
transition: 'slide', // none/fade/slide/convex/concave/zoom
|
|
// Learn about plugins: https://revealjs.com/plugins/
|
|
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
|
|
});
|
|
</script>
|
|
</body>
|
|
</html>
|