datalad-handbook/docs/beyond_basics/101-163-summary.rst
Michael Hanke ae7654aaff Transition from :command:... to diversified roles
This give automatic links to manpages, and retains a uniform look
across. We can add visual markers to distinguish the command
in the future.
2023-08-05 20:26:12 +02:00

32 lines
1.5 KiB
ReStructuredText

.. _gobigsummary:
Summary
-------
If you want to go big, DataLad is a suitable tool and can overcome shortcomings
of Git and git-annex, if used correctly. Scaling up involves
some thought, and in some instances compromise, though.
- The general mechanism that allows scaling up is nesting datasets. This process
can be done by hand or programmatically. Recursive operations ease working
across a hierarchy of datasets and create a monorepo-like experience
- Beware of accidentally placing to many (even small) files into Git's version
control in a single dataset!
``.gitignore`` files can keep irrelevant files out of version control, the
``explicit`` option :dlcmd:`run` may be helpful, and
custom largefile rules in ``.gitattributes`` may be necessary to override
dataset configurations such as ``text2git``.
- Don't consider only the limits of version control software, but also the
limits of your file system. Too many files in single directories can become
problematic even without version control.
- If things go wrong, it's not all lost. There are ways to clean up your dataset
if it ever gets clogged, although they are the software equivalent of a
blowtorch and should be handled with care.
Now what can I do with it?
^^^^^^^^^^^^^^^^^^^^^^^^^^
Go big, if you want to. :ref:`Distribute 80TB of files <usecase_HCP_dataset>`
or `more <https://github.com/datalad/datalad-ukbiobank>`_, or version control
large analyses with minimized performance loss of your version control tools.