datalad-course/html/vampitup_rda2021.html

330 lines
13 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<title>VAMP it up!</title>
<meta name="description" content=" some description ">
<meta name="author" content="Michael Hanke">
<meta charset="utf-8">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../css/main.css" id="theme">
<link rel="stylesheet" href="../reveal.js/dist/reset.css">
<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
<link rel="stylesheet" href="../css/main.css">
<!-- Theme used for syntax highlighted code -->
<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<h2>VAMP it up!<br><small>A pragmatic approach to reusable research outputs</small></h2>
<p>Michael Hanke</p>
<p>
<small>Institute of Neuroscience and Medicine, Brain &amp; Behavior (INM-7),
Research Center Jülich</small><br>
<small>Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf</small></br>
<a href="http://psychoinformatics.de">http://psychoinformatics.de</a></p>
<p style="margin-top:50px"><img style="height:100px;margin-right:100px" data-src="../pics/fzj_logo.svg" />
<img style="height:100px" data-src="../pics/hhu_logo.svg" /></p>
</section>
<section>
<section data-markdown data-transition="none"><script type="text/template">
![](../pics/pymvpa.png)
<note><a href="http://pymvpa.org">http://pymvpa.org</a></note>
Note:
open-source neuroimaging ML toolbox, In my Phd
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
![](../pics/neurodebian.png)
<note><a href="http://neuro.debian.org">http://neuro.debian.org</a></note>
Note:
largest repository of readily usable open-source neuro-software
</script></section>
<section>
<h2 style="font-size:150%"><em>Open</em>, "naturalistic" data: studyforrest.org</h2>
<img style="width:45%" data-src="../pics/forrest_gump_xlg.jpg" />
<img style="margin-bottom:8%" data-src="../pics/forrest_data_records.png" />
<note>Hanke, Baumgartner, Ibe, Kaule, Pollmann, Speck, Zinke, &amp; Stadler (2014)
<em>A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie</em>. Scientific Data, 1:140003.
<a href="http://www.nature.com/articles/sdata20143">http://www.nature.com/articles/sdata20143</a>
</note>
<aside class="notes">
ongoing open-data experiment
</aside>
</section>
</section>
<section>
<section>
<ul style="list-style-type:none">
<li><h1 style="display:inline">F</h1>indable</li>
<li><h1 style="display:inline">A</h1>ccessible</li>
<li><h1 style="display:inline">I</h1>nteroperable</li>
<li><h1 style="display:inline">R</h1>eusable</li>
</ul>
<note><a href="https://www.go-fair.org/fair-principles">https://www.go-fair.org/fair-principles</a></note>
<aside class="notes">
since 2016 (five years), no way around it, declared "done"
</aside>
</section>
<section>
<ul style="list-style-type:none">
<li><h1 style="display:inline">F?</h1> I already have it, it's right here!</li>
<li><h1 style="display:inline">A?</h1> I am working with it already, I made it!</li>
<li><h1 style="display:inline">I?</h1> With what?</li>
<li><h1 style="display:inline">R?</h1> First let me finish this PhD and then we talk, OK?</li>
</ul>
<aside class="notes">
individual/limited perspective less clear, tendency to delay and make expensive
</aside>
</section>
<section data-markdown data-transition="none"><script type="text/template">
![](../pics/Vamp-The_Rich_Dont_Rock.jpg)<!-- .element: height="1000px" -->
<imgcredit>Divebomb Records</imgcredit>
Note:
My proposal: Let's think about VAMP instead
</script></section>
<section>
<ul style="list-style-type:none">
<li><h1 style="display:inline">V</h1>ersion-controlled</li>
<li><h1 style="display:inline">A</h1>ctionable metadata</li>
<li><h1 style="display:inline">M</h1>odular</li>
<li><h1 style="display:inline">P</h1>portable</li>
</ul>
<aside class="notes">
not the band, but...
</aside>
</section>
</section>
<section>
<section data-markdown data-transition="none"><script type="text/template">
## Exhaustive tracking of research components
![](../pics/vamp_0_start.png)<!-- .element: width="100%" -->
Well-structured datasets (using community standards), and portable computational environments &mdash; and their evolution &mdash; are the precondition for reproducibility
Note:
your community could be really small (your lab), when data are precious resources
will be spent to understand it, but information must be capture to make this possible
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
## Capture computational provenance
![](../pics/vamp_1_provcapture.png)<!-- .element: width="100%" -->
Which data was needed at which version, as input into which code, running with what parameterization in which
computional environment, to generate an outcome?
Note:
The missing link: even when everything is shared, we still don't know how to start.
README is minimum, but executable prov-records are much better.
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
## Exhaustive capture enables portability
![](../pics/vamp_2_pushtocloud.png)<!-- .element: width="100%" -->
Precise identification of data and computational environments, combined for provenance records form a comprehensive and portable data structure, capturing all aspects of an investigation.
Note:
Does it fly? Can you give it to someone? Or can you take it with you to your new lab?
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
## Reproducibility strengthens trust
![](../pics/vamp_3_reproduce.png)<!-- .element: width="100%" -->
Outcomes of computational transformations can be validated by authorized 3rd-parties. This enables audits, promotes accountability, and streamlines automated "upgrades" of outputs
Note:
Goal is automated reproducibility, enables assessment of robustness and benchmarking algorithmic developments
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
## Ultimate goal: (re-)usability
![](../pics/vamp_4_reuse.png)<!-- .element: width="100%" -->
Verifiable, portable, self-contained data structures that track all aspects of an investigation exhaustively can be (re-)used as modular components in larger contexts &mdash; propagating their traits
Note:
With these in place, re-usability is a small(er) step
</script></section>
</section>
<section>
<section data-markdown><script type="text/template">
## DataLad: manage evolution of digital objects
![](../pics/yoda_decentralized_publishing.png)<!-- .element: width="1000" style="margin-bottom:-50px" -->
Consume, create, curate, analyze, publish, and query data with full provenance capture and "universal" metadata support.
<p style="font-size:80%">
DataLad is free and open source (MIT-licensed).
</p>
http://datalad.org
Note:
DataLad implements al these features. Go get it at URL
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
## Advantages of the VAMP attitude
- *Overlay data structure* hides away peculiarities of (current) environment choices for storage and computation<br>
&mdash; focus on content, not infrastructure
- Self-contained units that are *valid and complete without any external services*<br>
&mdash; federation-ready for improved resilience
- Metadata plurality puts *focus on metadata validity* (for your own work) without becoming a problem for global standardization efforts<br>
&mdash; ability to verify detailed metadata is more useful than today's
choice of terminology and minimal description standard
- Promotes *long-term curation and stewardship* for flexibly reusable unit<br>
&mdash; yields proven and trusted resources for incremental science
</script></section>
<section data-markdown data-transition="none"><script type="text/template">
### "Automatic" interoperability with 3rd-party solutions
![](../pics/nfdi_common_infra_2.png)<!-- .element: width="600px" style="margin-bottom:-30px;margin-top:-10px"-->
- Technology directly used by OpenNeuro, CBRAIN platform, BrainLife.io,
and compatible with AWS/S3, GIN, Dropbox, etc. (optional strong encryption)
- 100+TB of research data, homogenously accessible regardless of hosting choices
(datasets.datalad.org)
Note:
Interoperability is now an orthogonal aspect that can be improved collaboratively
</script></section>
<section data-markdown><script type="text/template">
## Reproducible paper - a Magic trick?
![](../pics/dar2020.png)<!-- .element: width="1000" -->
- See for yourself: https://youtu.be/_I3JFhJJtW0?t=861
- Get step-by-step instructions: http://handbook.datalad.org/usecases/reproducible-paper.html
Note:
No time to explain, see for yourself
</script></section>
<section data-markdown><script type="text/template">
## Open science education
![](../pics/adina.jpg)<!-- .element: class="ackimg" -->
![](../pics/handbook_frontpage.png)<!-- .element: width="950" style="margin-top:-20px;margin-bottom:-10px" -->
http://handbook.datalad.org
- **educational materials** on technologies &mdash; **targeting scientists**, not developers (executable paper,
student surpervisor workflow, ...)
- handbook with 400+ pages on concepts, workflows, and use cases (work in progress, led by Adina Wagner)
Note:
RDM Education is key. Handbook helps people be more productive, yielding more FAIR resources as an outcome, but not as the main goal.
</script></section>
</section>
<section>
<h2>Acknowledgements</h2>
<table>
<tr style="vertical-align:middle">
<td style="vertical-align:middle">
<ul style="font-size:80%">
<li>Yaroslav Halchenko</li>
<li>Benjamin Poldrack</li>
<li>Kyle Meyer</li>
<li>Adina Wagner</li>
<li>30+ Datalad contributors</li>
</ul>
</td>
<td style="vertical-align:middle">
<img style="height:150px;margin-right:50px" data-src="../pics/nsf.png" />
<img style="height:150px;margin-right:50pxi;margin-left:50px" data-src="../pics/binc.png" />
<img style="height:150px;margin-left:50px" data-src="../pics/bmbf.png" />
<br />
<img style="height:80px;margin-top:-40px;margin-left:auto;margin-right:auto;width:100%" data-src="../pics/fzj_logo.svg" />
<div style="margin-top:-20px">
<img style="height:60px;margin-right:200px" data-src="../pics/erdf.png" />
<img style="height:60px" data-src="../pics/LSA-Logo.png" />
</div>
<div style="margin-top:-20px">
<img style="height:100px;margin:20px" data-src="../pics/hbp_logo.png" />
<img style="height:100px;margin:20px" data-src="../pics/conp_logo.png" />
<img style="height:100px;margin:20px" data-src="../pics/vbc_logo.png" />
</div>
<div style="margin-top:-40px">
<img style="height:120px;margin:20px" data-src="../pics/openneuro_logo.png" />
<img style="height:120px;margin:20px" data-src="../pics/cbrain_logo.png" />
<img style="height:140px;margin:20px" data-src="../pics/brainlife_logo.png" />
</div>
</td>
</tr>
</table>
<p>Website: <a href="http://datalad.org">http://datalad.org</a><br />
Documentation: <a href="http://handbook.datalad.org">http://handbook.datalad.org</a><br />
Open data: <a href="http://datasets.datalad.org">http://datasets.datalad.org</a></p>
</section>
<script src="../reveal.js/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
// The "normal" size of the presentation, aspect ratio will be preserved
// when the presentation is scaled to fit different resolutions. Can be
// specified using percentage units.
width: 1280,
height: 1024,
// Factor of the display size that should remain empty around the content
margin: 0.1,
// Bounds for smallest/largest possible scale to apply to content
minScale: 0.2,
maxScale: 1.0,
controls: false,
progress: true,
history: true,
center: true,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Optional reveal.js plugins
dependencies: [
{ src: '../reveal.js/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>