datalad-course/html/vampitup_rda2021.html

<!doctype html>
<html lang="en">

<head>
  <title>VAMP it up!</title>
  <meta name="description" content=" some description ">
  <meta name="author" content="Michael Hanke">

  <meta charset="utf-8">
  <meta name="apple-mobile-web-app-capable" content="yes" />
  <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
  <link rel="stylesheet" href="../css/main.css" id="theme">
		<link rel="stylesheet" href="../reveal.js/dist/reset.css">
		<link rel="stylesheet" href="../reveal.js/dist/reveal.css">
		<link rel="stylesheet" href="../reveal.js/dist/theme/beige.css">
        <link rel="stylesheet" href="../css/main.css">
		<!-- Theme used for syntax highlighted code -->
		<link rel="stylesheet" href="../reveal.js/plugin/highlight/monokai.css">
</head>
<body>

<div class="reveal">

<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
  <h2>VAMP it up!<br><small>A pragmatic approach to reusable research outputs</small></h2>

  <p>Michael Hanke</p>
  <p>
      <small>Institute of Neuroscience and Medicine, Brain &amp; Behavior (INM-7),
      Research Center Jülich</small><br>
  <small>Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf</small></br>
  <a href="http://psychoinformatics.de">http://psychoinformatics.de</a></p>
  <p style="margin-top:50px"><img style="height:100px;margin-right:100px" data-src="../pics/fzj_logo.svg" />
  <img style="height:100px" data-src="../pics/hhu_logo.svg" /></p>
</section>

<section>
<section data-markdown data-transition="none"><script type="text/template">
![](../pics/pymvpa.png)

<note><a href="http://pymvpa.org">http://pymvpa.org</a></note>
Note:
open-source neuroimaging ML toolbox, In my Phd
</script></section>

<section data-markdown data-transition="none"><script type="text/template">
![](../pics/neurodebian.png)

<note><a href="http://neuro.debian.org">http://neuro.debian.org</a></note>
Note:
largest repository of readily usable open-source neuro-software
</script></section>

<section>
    <h2 style="font-size:150%"><em>Open</em>, "naturalistic" data: studyforrest.org</h2>
    <img style="width:45%" data-src="../pics/forrest_gump_xlg.jpg" />
    <img style="margin-bottom:8%" data-src="../pics/forrest_data_records.png" />
    <note>Hanke, Baumgartner, Ibe, Kaule, Pollmann, Speck, Zinke, &amp; Stadler (2014)
    <em>A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie</em>. Scientific Data, 1:140003.
            <a href="http://www.nature.com/articles/sdata20143">http://www.nature.com/articles/sdata20143</a>
	</note>
  <aside class="notes">
      ongoing open-data experiment
  </aside>
</section>
</section>

<section>
<section>
  <ul style="list-style-type:none">
    <li><h1 style="display:inline">F</h1>indable</li>
    <li><h1 style="display:inline">A</h1>ccessible</li>
    <li><h1 style="display:inline">I</h1>nteroperable</li>
    <li><h1 style="display:inline">R</h1>eusable</li>
  </ul>
  <note><a href="https://www.go-fair.org/fair-principles">https://www.go-fair.org/fair-principles</a></note>
  <aside class="notes">
      since 2016 (five years), no way around it, declared "done"
  </aside>
</section>

<section>
  <ul style="list-style-type:none">
    <li><h1 style="display:inline">F?</h1> I already have it, it's right here!</li>
    <li><h1 style="display:inline">A?</h1> I am working with it already, I made it!</li>
    <li><h1 style="display:inline">I?</h1> With what?</li>
    <li><h1 style="display:inline">R?</h1> First let me finish this PhD and then we talk, OK?</li>
  </ul>
  <aside class="notes">
      individual/limited perspective less clear, tendency to delay and make expensive
  </aside>
</section>

<section data-markdown data-transition="none"><script type="text/template">
![](../pics/Vamp-The_Rich_Dont_Rock.jpg)<!-- .element: height="1000px" -->
<imgcredit>Divebomb Records</imgcredit>

Note:
My proposal: Let's think about VAMP instead
</script></section>

<section>
  <ul style="list-style-type:none">
    <li><h1 style="display:inline">V</h1>ersion-controlled</li>
    <li><h1 style="display:inline">A</h1>ctionable metadata</li>
    <li><h1 style="display:inline">M</h1>odular</li>
    <li><h1 style="display:inline">P</h1>portable</li>
  </ul>
  <aside class="notes">
      not the band, but...
  </aside>
</section>
</section>


<section>
<section data-markdown data-transition="none"><script type="text/template">
## Exhaustive tracking of research components
![](../pics/vamp_0_start.png)<!-- .element: width="100%" -->
Well-structured datasets (using community standards), and portable computational environments &mdash; and their evolution &mdash; are the precondition for reproducibility

Note:
your community could be really small (your lab), when data are precious resources
will be spent to understand it, but information must be capture to make this possible
</script></section>

<section data-markdown data-transition="none"><script type="text/template">
## Capture computational provenance
![](../pics/vamp_1_provcapture.png)<!-- .element: width="100%" -->
Which data was needed at which version, as input into which code, running with what parameterization in which
computional environment, to generate an outcome?

Note:
The missing link: even when everything is shared, we still don't know how to start.
README is minimum, but executable prov-records are much better.
</script></section>

<section data-markdown data-transition="none"><script type="text/template">
## Exhaustive capture enables portability
![](../pics/vamp_2_pushtocloud.png)<!-- .element: width="100%" -->
Precise identification of data and computational environments, combined for provenance records form a comprehensive and portable data structure, capturing all aspects of an investigation.

Note:
Does it fly? Can you give it to someone? Or can you take it with you to your new lab?
</script></section>

<section data-markdown data-transition="none"><script type="text/template">
## Reproducibility strengthens trust
![](../pics/vamp_3_reproduce.png)<!-- .element: width="100%" -->
Outcomes of computational transformations can be validated by authorized 3rd-parties. This enables audits, promotes accountability, and streamlines automated "upgrades" of outputs

Note:
Goal is automated reproducibility, enables assessment of robustness and benchmarking algorithmic developments
</script></section>

<section data-markdown data-transition="none"><script type="text/template">
## Ultimate goal: (re-)usability
![](../pics/vamp_4_reuse.png)<!-- .element: width="100%" -->
Verifiable, portable, self-contained data structures that track all aspects of an investigation exhaustively can be (re-)used as modular components in larger contexts &mdash; propagating their traits

Note:
With these in place, re-usability is a small(er) step
</script></section>
</section>


<section>
<section data-markdown><script type="text/template">
## DataLad: manage evolution of digital objects
![](../pics/yoda_decentralized_publishing.png)<!-- .element: width="1000" style="margin-bottom:-50px" -->

Consume, create, curate, analyze, publish, and query data with full provenance capture and "universal" metadata support.
<p style="font-size:80%">
DataLad is free and open source (MIT-licensed).
</p>
http://datalad.org

Note:
DataLad implements al these features. Go get it at URL
</script></section>


<section data-markdown data-transition="none"><script type="text/template">
## Advantages of the VAMP attitude

- *Overlay data structure* hides away peculiarities of (current) environment choices for storage and computation<br>
  &mdash; focus on content, not infrastructure

- Self-contained units that are *valid and complete without any external services*<br>
  &mdash; federation-ready for improved resilience

- Metadata plurality puts *focus on metadata validity* (for your own work) without becoming a problem for global standardization efforts<br>
  &mdash; ability to verify detailed metadata is more useful than today's
  choice of terminology and minimal description standard

- Promotes *long-term curation and stewardship* for flexibly reusable unit<br>
  &mdash; yields proven and trusted resources for incremental science

</script></section>

<section data-markdown data-transition="none"><script type="text/template">
### "Automatic" interoperability with 3rd-party solutions
![](../pics/nfdi_common_infra_2.png)<!-- .element: width="600px" style="margin-bottom:-30px;margin-top:-10px"-->

- Technology directly used by OpenNeuro, CBRAIN platform, BrainLife.io,
  and compatible with AWS/S3, GIN, Dropbox, etc. (optional strong encryption)

- 100+TB of research data, homogenously accessible regardless of hosting choices
  (datasets.datalad.org)

Note:
Interoperability is now an orthogonal aspect that can be improved collaboratively
</script></section>


<section data-markdown><script type="text/template">
## Reproducible paper - a Magic trick?

![](../pics/dar2020.png)<!-- .element: width="1000" -->
- See for yourself: https://youtu.be/_I3JFhJJtW0?t=861

- Get step-by-step instructions: http://handbook.datalad.org/usecases/reproducible-paper.html

Note:
No time to explain, see for yourself
</script></section>


<section data-markdown><script type="text/template">
## Open science education
![](../pics/adina.jpg)<!-- .element: class="ackimg" -->
![](../pics/handbook_frontpage.png)<!-- .element: width="950" style="margin-top:-20px;margin-bottom:-10px" -->

http://handbook.datalad.org

- **educational materials** on technologies &mdash; **targeting scientists**, not developers (executable paper,
  student surpervisor workflow, ...)
- handbook with 400+ pages on concepts, workflows, and use cases (work in progress, led by Adina Wagner)

Note:
RDM Education is key. Handbook helps people be more productive, yielding more FAIR resources as an outcome, but not as the main goal.
</script></section>


</section>

<section>
  <h2>Acknowledgements</h2>
  <table>
  <tr style="vertical-align:middle">
      <td style="vertical-align:middle">
  <ul style="font-size:80%">
      <li>Yaroslav Halchenko</li>
      <li>Benjamin Poldrack</li>
      <li>Kyle Meyer</li>
      <li>Adina Wagner</li>
      <li>30+ Datalad contributors</li>
  </ul>
      </td>
      <td style="vertical-align:middle">
  <img style="height:150px;margin-right:50px" data-src="../pics/nsf.png" />
  <img style="height:150px;margin-right:50pxi;margin-left:50px" data-src="../pics/binc.png" />
  <img style="height:150px;margin-left:50px" data-src="../pics/bmbf.png" />
  <br />
  <img style="height:80px;margin-top:-40px;margin-left:auto;margin-right:auto;width:100%" data-src="../pics/fzj_logo.svg" />
  <div style="margin-top:-20px">
  <img style="height:60px;margin-right:200px" data-src="../pics/erdf.png" />
  <img style="height:60px" data-src="../pics/LSA-Logo.png" />
  </div>
  <div style="margin-top:-20px">
  <img style="height:100px;margin:20px" data-src="../pics/hbp_logo.png" />
  <img style="height:100px;margin:20px" data-src="../pics/conp_logo.png" />
  <img style="height:100px;margin:20px" data-src="../pics/vbc_logo.png" />
  </div>
  <div style="margin-top:-40px">
  <img style="height:120px;margin:20px" data-src="../pics/openneuro_logo.png" />
  <img style="height:120px;margin:20px" data-src="../pics/cbrain_logo.png" />
  <img style="height:140px;margin:20px" data-src="../pics/brainlife_logo.png" />
  </div>
  </td>
  </tr>
  </table>
  <p>Website: <a href="http://datalad.org">http://datalad.org</a><br />
  Documentation: <a href="http://handbook.datalad.org">http://handbook.datalad.org</a><br />
  Open data: <a href="http://datasets.datalad.org">http://datasets.datalad.org</a></p>
</section>


<script src="../reveal.js/js/reveal.js"></script>

<script>
  // Full list of configuration options available at:
  // https://github.com/hakimel/reveal.js#configuration
  Reveal.initialize({
    // The "normal" size of the presentation, aspect ratio will be preserved
    // when the presentation is scaled to fit different resolutions. Can be
    // specified using percentage units.
    width: 1280,
    height: 1024,

    // Factor of the display size that should remain empty around the content
    margin: 0.1,

    // Bounds for smallest/largest possible scale to apply to content
    minScale: 0.2,
    maxScale: 1.0,

    controls: false,
    progress: true,
    history: true,
    center: true,

    transition: 'slide', // none/fade/slide/convex/concave/zoom

    // Optional reveal.js plugins
    dependencies: [
      { src: '../reveal.js/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
      { src: '../reveal.js/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
      { src: '../reveal.js/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
      { src: '../reveal.js/plugin/zoom-js/zoom.js', async: true },
      { src: '../reveal.js/plugin/notes/notes.js', async: true }
    ]
  });
</script>
</body>
</html>