382 lines
14 KiB
HTML
382 lines
14 KiB
HTML
<!doctype html>
|
|
<html>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
|
|
|
<!-- Edit me start! -->
|
|
<title>This is where your title goes</title>
|
|
<meta name="description" content=" This is where you put a short description ">
|
|
<meta name="author" content=" Your Name ">
|
|
<!-- Edit me end! -->
|
|
|
|
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
|
|
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
|
|
|
|
<!-- Theme used for syntax highlighted code -->
|
|
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
|
|
</head>
|
|
<body>
|
|
<div class="reveal">
|
|
<div class="slides">
|
|
|
|
|
|
<section>
|
|
<h1 style="text-transform:none">Data management</h1> <br>
|
|
<h3 style="text-transform:none">Local version control workflows</h3>
|
|
|
|
<img src="../pics/final.png" height="500" align="center">
|
|
|
|
<imgcredit>Full comic at <a href="http://phdcomics.com/comics/archive.php?comicid=1531">http://phdcomics.com/comics.php?f=1531</a></imgcredit>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<section data-markdown><script type="text/template">
|
|
## Last session...
|
|
|
|
- **Prerequisites & technicalities**: Git identity, Gitlab/Github, Reathedocs, the Handbook
|
|
- **Theory**: DataLad datasets, YODA principles
|
|
- **Practice**: Create, structure, and install datasets
|
|
|
|
<!-- .element: height="60" -->
|
|
|
|
### Change in session set-up: <!-- .element: class="fragment" -->
|
|
|
|
- Sessions oriented at handbook content (totalling 6 sessions)<!-- .element: class="fragment" -->
|
|
- Domain-agnostic narrative: "Educational course" on DataLad <!-- .element: class="fragment" -->
|
|
- Live code-casts to follow along in your own terminal <!-- .element: class="fragment" -->
|
|
|
|
<aside class="notes">
|
|
- open question session
|
|
</aside>
|
|
</script>
|
|
</section>
|
|
|
|
|
|
<section data-markdown><script type="text/template">
|
|
## Objectives
|
|
|
|
**Local version control workflows**
|
|
- Saving modifications to datasets: adding and changing files
|
|
- Installing (sub)datasets
|
|
- Dataset nesting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**Let's get into a terminal!**<!-- .element: class="fragment" -->
|
|
<br><a href="http://handbook.datalad.org/en/latest/basics/101-101-create.html"> -> handbook chapter</a><!-- .element: class="fragment" -->
|
|
</script>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section data-transition="fade">
|
|
<h2>DataLad Datasets</h2>
|
|
<table class="fragment fade-in">
|
|
<tr>
|
|
<td style="vertical-align:middle"><ul>
|
|
<li>DataLad is build on top of other tools:</li>
|
|
<td width="40%"><img data-src="../pics/slides/pics/datalad_sandwhich_tuned/sandwhich01.svg"></td>
|
|
</tr>
|
|
</table>
|
|
|
|
</section>
|
|
<section data-transition="fade">
|
|
<h2>DataLad Datasets</h2>
|
|
<table>
|
|
<tr>
|
|
<td style="vertical-align:middle"><dl>
|
|
<dt>The foundation is <a href="https://git-scm.com/" target="_blank">Git</a>:</dt>
|
|
<dd>Datasets are Git repositories. If you want, use any Git command/feature!</dd>
|
|
<dt><a href="https://git-annex.branchable.com/" target="_blank">Git-annex</a> handles large file content:</dt>
|
|
<dd>(Large) file content is "annexed".</dd>
|
|
</dl></td>
|
|
<td width="40%"><img data-src="../pics/slides/pics/datalad_sandwhich_tuned/sandwhich03.svg"></td>
|
|
</tr>
|
|
</table>
|
|
<aside class="notes"></aside>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>Local version control</h2>
|
|
|
|
<p>Procedurally, version control is easy with DataLad!</p>
|
|
<img class="fragment fade-in" src="../pics/local_wf.svg" height="500"> <!-- .element: class="fragment" -->
|
|
<br>
|
|
|
|
<b class="fragment fade-in">Advice:</b>
|
|
<ul>
|
|
<li class="fragment fade-in">Save <i>meaningful</i> units of change</li>
|
|
<li class="fragment fade-in">Attach helpful commit messages</li>
|
|
</ul>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h3>Summary - Local version control</h3>
|
|
|
|
<dl>
|
|
<dt class="fragment fade-in"><code>datalad create</code> creates an empty dataset.</dt> <dd class="fragment fade-in">Configurations (<b>-c yoda</b>, <b>-c text2git</b>) are useful.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">A dataset has a <i>history</i> to track files and their modifications. </dt><dd class="fragment fade-in">Explore it with Git (<b>git log</b>) or external tools (e.g., <b>tig</b>).</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad save</code> records the dataset or file state to the history. </dt><dd class="fragment fade-in">Concise <b>commit messages</b> should summarize the change for future you and others.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad download-url</code> obtains web content and records its origin. </dt><dd class="fragment fade-in">It even takes care of saving the change.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad status</code> reports the current state of the dataset.</dt> <dd class="fragment fade-in">A clean dataset status is good practice.</dd>
|
|
</dl>
|
|
</section>
|
|
|
|
<section data-transition="fade">
|
|
<h3>Summary - Local version control</h3>
|
|
<img height="600" src="../pics/final_cut.png">
|
|
</section>
|
|
|
|
<section data-transition="fade">
|
|
<h3>Summary - Local version control</h3>
|
|
<img height="600" src="../pics/versioned_file.png">
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section>
|
|
<h2>Installing Datasets</h2>
|
|
|
|
<pre class="fragment fade-in"><code> $ datalad install --dataset . \
|
|
--source https://github.com/datalad-datasets/longnow-podcasts.git \
|
|
recordings/longnow
|
|
</code></pre>
|
|
<ul>
|
|
<li class="fragment fade-in">Installs a dataset (here: from Github) (--source/-s) ...</li>
|
|
<li class="fragment fade-in">into the <i>super</i>dataset Datalad-101 (--dataset/-d) under the path recordings/longnow ...</li>
|
|
<li class="fragment fade-in">and registers it as a <i>subdataset</i>.</li>
|
|
<br>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Installing Datasets</h2>
|
|
<img class="fragment fade-in" src="../pics/virtual_dstree_dl101.svg" height="600">
|
|
<ul>
|
|
<li class="fragment fade-in">Datasets are light-weight: Upon installation, only small
|
|
files and meta data about file availability are retrieved.</li>
|
|
<li class="fragment fade-in">Content can be obtained on demand via <code>datalad get</code>.</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Dataset Nesting</h2>
|
|
<img src="../pics/dataset_linkage_dl101.png">
|
|
<pre><code class="diff" style="max-height:none">commit 8fdf62acd0bf1e99ebcb6c466edc994a5f4013ba
|
|
Author: DataLad Demo demo@datalad.org
|
|
Date: Sat Oct 26 15:54:44 2019 +0200
|
|
|
|
[DATALAD] Recorded changes
|
|
|
|
diff --git a/.gitmodules b/.gitmodules
|
|
new file mode 100644
|
|
index 0000000..1b59b8c
|
|
--- /dev/null
|
|
+++ b/.gitmodules
|
|
@@ -0,0 +1,4 @@
|
|
+[submodule "recordings/longnow"]
|
|
+ path = recordings/longnow
|
|
+ url = https://github.com/datalad-datasets/longnow-podcasts.git
|
|
+ datalad-id = b3ca2718-8901-11e8-99aa-a0369f7c647e
|
|
diff --git a/recordings/longnow b/recordings/longnow
|
|
new file mode 160000
|
|
index 0000000..dcc34fb
|
|
--- /dev/null
|
|
+++ b/recordings/longnow
|
|
@@ -0,0 +1 @@
|
|
+Subproject commit dcc34fbe669b06ced84ced381ba0db21cf5e665f </code></pre>
|
|
</section>
|
|
|
|
<section>
|
|
<h3>Summary - Dataset consumption & nesting</h3>
|
|
|
|
<ul>
|
|
<dt class="fragment fade-in"><code>datalad install</code> installs a dataset.</dt><dd class="fragment fade-in"> It can be installed “on its own”:
|
|
Specify the <b>--source/-s</b> of the dataset, and an optional <b>path</b> for it to be installed to.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets can be installed as subdatasets within an existing dataset. </dt> <dd class="fragment fade-in"> The <b>--dataset/-d</b> option needs a path to the root of the superdataset.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Only small files and metadata about file availability are present locally after an install. </dt><dd class="fragment fade-in">To retrieve actual file content of larger files, <code>datalad get </code> downloads large file content on demand.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in"><code>datalad status</code> can report on total and retrieved repository size</dt> <dd class="fragment fade-in">using <code>--annex</code> and <code>--annex all</code> options.</dd>
|
|
<br>
|
|
<dt class="fragment fade-in">Datasets preserve their history.</dt> <dd class="fragment fade-in">The superdataset records only the <i>version state</i> of the subdataset.</dd>
|
|
|
|
</ul>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<section>
|
|
<h2>Now what I can do with that?</h2>
|
|
<dl>
|
|
<dt>Local version control</dt>
|
|
<li>Version control changing small files (code, manuscripts (text!), ...)</li>
|
|
<li>Add large files to a dataset history</li>
|
|
<li>Meaninful and well-described commits will make future interactions with the dataset history easier</li>
|
|
<br>
|
|
<dt>Dataset installation and nesting</dt>
|
|
<li>Consume existing datasets</li>
|
|
<li>Link datasets together</li>
|
|
</dl>
|
|
</section>
|
|
|
|
<section>
|
|
<h2>Practice @home</h2>
|
|
|
|
<ul>
|
|
<li>Start a coding project or take an existing project and version
|
|
control your work with DataLad. Remember to create datasets with
|
|
the <code>text2git</code> or <code>yoda</code> configuration!</li>
|
|
</ul>
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h2>Further reading</h2>
|
|
|
|
<dl>
|
|
<dt>The basics on datasets:</dt>
|
|
<dd>- Chapter <a href=http://handbook.datalad.org/en/latest/index.html target="_blank">DataLad Datasets</a> in the handbook.</dd>
|
|
<dt>How to get help on commands and their options:</dt>
|
|
<dd>- Section
|
|
<a href=http://handbook.datalad.org/en/latest/basics/101-135-help.html
|
|
target="_blank">How to get help</a> in the handbook</dd>
|
|
</dl>
|
|
</script>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<section data-markdown><script type="text/template">
|
|
## Outline: What comes next?
|
|
|
|
- Reproducible analyses with DataLad (chapter <a href="http://handbook.datalad.org/en/latest/basics/101-108-run.html" target="_blank">DataLad, Run!</a> in the handbook).
|
|
- **Which date is suitable?** > <a href="https://doodle.com/poll/bdyn3c7nfuz9eyx4" target="_blank">Doodle poll</a> <
|
|
|
|
|
|
</script>
|
|
</section>
|
|
|
|
<section data-markdown>
|
|
# Open Question Session
|
|
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
|
|
<section>
|
|
<h3>Backup slides for anticipated questions </h3>
|
|
</section>
|
|
<section data-markdown><script type="text/template">
|
|
## What is `HEAD`?
|
|
|
|
- A git repository is build up as a *tree* of **commits** (history entries).
|
|
- A **branch** is a named pointer (reference) to a commit, and allows to isolate developments. The default branch is called `master`.
|
|
- `HEAD` is a pointer to the branch you are on, and thus to the last commit in the given branch.
|
|
|
|
<!-- .element: height="600" -->
|
|
|
|
If you'd be on branch "testing", which commit would HEAD point to? <!-- .element: class="fragment" -->
|
|
|
|
</script>
|
|
</section>
|
|
|
|
<section style="text-align: left;">
|
|
<h2>How does a here-document work?</h2>
|
|
|
|
<br>
|
|
<pre><code style="bash">
|
|
$ cat << EOT > notes.txt
|
|
One can create a new dataset with 'datalad create [--description] PATH'.
|
|
The dataset is created empty
|
|
|
|
EOT
|
|
|
|
</code></pre>
|
|
<ul>
|
|
<li>Two <i>delimiting identifiers</i> (EOT) wrap any amount of text into a stream</li>
|
|
<li>The <code><<</code> characters <i>redirect</i> the stream into <i>standard input</i> for the <code>cat</code> command</li>
|
|
<li>The <code>></code> character <i>redirects</i> the <i>standard output</i> of <code>cat</code> and writes it into a new file <code>notes.txt</code></li>
|
|
</ul>
|
|
<br>
|
|
<br>
|
|
<p align="left" class="fragment fade-in"> Why is it used?</p>
|
|
<ul align="left" class="fragment fade-in">
|
|
<li>Allows pretty formating (e.g., line breaks)</li>
|
|
<li>Allows writing documents from the terminal </li>
|
|
</ul>
|
|
</p>
|
|
</section>
|
|
|
|
|
|
<section>
|
|
<h2>What's with all those configurations?</h2>
|
|
<ul>
|
|
<li class="fragment fade-in">Contents of a dataset can be stored in Git or
|
|
Git-annex.</li>
|
|
<li class="fragment fade-in">Files stored in Git <i>can</i> be more easily
|
|
modifiable.</li>
|
|
<li class="fragment fade-in">By default, all contents of a dataset are
|
|
stored in Git-annex.</li>
|
|
<li class="fragment fade-in">Configurations like <code>text2git</code>
|
|
and <code>yoda</code> configure the dataset to store files of certain
|
|
types or in certain locations to be stored in Git</li>
|
|
<li class="fragment fade-in">More about this in
|
|
<a href="http://handbook.datalad.org/en/latest/basics/101-114-txt2git.html" target="_blank"> Under the hood: Git-annex</a>. </ul>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
</div>
|
|
</div>
|
|
|
|
<script src="../reveal.js/dist/reveal.js"></script>
|
|
<script src="../reveal.js/plugin/notes/notes.js"></script>
|
|
<script src="../reveal.js/plugin/markdown/markdown.js"></script>
|
|
<script src="../reveal.js/plugin/highlight/highlight.js"></script>
|
|
<script>
|
|
// More info about initialization & config:
|
|
// - https://revealjs.com/initialization/
|
|
// - https://revealjs.com/config/
|
|
Reveal.initialize({
|
|
hash: true,
|
|
// The "normal" size of the presentation, aspect ratio will be preserved
|
|
// when the presentation is scaled to fit different resolutions. Can be
|
|
// specified using percentage units.
|
|
width: 1280,
|
|
height: 960,
|
|
// Factor of the display size that should remain empty around the content
|
|
margin: 0.3,
|
|
// Bounds for smallest/largest possible scale to apply to content
|
|
minScale: 0.2,
|
|
maxScale: 1.0,
|
|
|
|
controls: true,
|
|
progress: true,
|
|
history: true,
|
|
center: true,
|
|
slideNumber: 'c',
|
|
pdfSeparateFragments: false,
|
|
pdfMaxPagesPerSlide: 1,
|
|
pdfPageHeightOffset: -1,
|
|
transition: 'slide', // none/fade/slide/convex/concave/zoom
|
|
// Learn about plugins: https://revealjs.com/plugins/
|
|
plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ]
|
|
});
|
|
</script>
|
|
</body>
|
|
</html>
|