datalad-handbook/docs/basics/101-134-summary.rst
Michał Szczepanik 523fa30263 Fix "it's" vs "its" usage
This fixes the usage of contraction it's (it is / it has) and
possessive its, as far as I could grep.
2024-05-17 21:55:24 +02:00

64 lines
2.8 KiB
ReStructuredText

.. _summary_containers:
Summary
-------
The last two sections have first of all extended your knowledge on dataset nesting:
- When subdatasets are created or installed, they are registered to the superdataset
in their current version state (as identified by their most recent commit's hash).
For a freshly created subdatasets, the most recent commit is at the same time its
first commit.
- Once the subdataset evolves, the superdataset recognizes this as a ``modification``
of the subdatasets version state. If you want to record this, you need to
:dlcmd:`save` it in the superdataset:
.. code-block:: console
$ datalad save -m "a short summary of changes in subds" <path to subds>
But more than nesting concepts, they have also extended your knowledge on
reproducible analyses with :dlcmd:`run` and you have experienced
for yourself why and how software containers can go hand-in-hand with DataLad:
- A software container encapsulates a complete software environment, independent
from the environment of the computer it runs on. This allows you to create or
use secluded software and also share it together with your analysis to ensure
computational reproducibility. The DataLad extension
`datalad containers <https://docs.datalad.org/projects/container>`_
can make this possible.
- The command :dlcmd:`containers-add` registers an :term:`container image` from a path or
URL to your dataset.
- If you use :dlcmd:`containers-run` instead of :dlcmd:`run`,
you can reproducibly execute a command of your choice *within* the software
environment.
- A :dlcmd:`rerun` of a commit produced with :dlcmd:`containers-run`
will re-execute the command in the same software environment.
.. index::
pair: hub; Docker
Now what can I do with it?
^^^^^^^^^^^^^^^^^^^^^^^^^^
For one, you will not be surprised if you ever see a subdataset being shown as
``modified`` by :dlcmd:`status`: You now know that if a subdataset
evolves, its most recent state needs to be explicitly saved to the superdataset's
history.
On a different matter, you are now able to capture and share analysis provenance that
includes the relevant software environment. This does not only make your analyses
projects automatically reproducible, but automatically *computationally* reproducible -
you can make sure that your analyses runs on any computer with Singularity,
regardless of the software environment on this computer. Even if you are unsure how you can wrap up an
environment into a software :term:`container image` at this point, you could make use of
hundreds of publicly available images on `Singularity-Hub <https://singularity-hub.org>`_ and
`Docker-Hub <https://hub.docker.com>`_.
With this, you have also gotten a first glimpse into an extension of DataLad: A
Python module you can install with Python package managers such as ``pip`` that
extends DataLad's functionality.