DOC: Add short DataLad intro as proposed in the handbook #6
1 changed files with 50 additions and 21 deletions
71
README.rst
71
README.rst
|
|
@ -25,41 +25,70 @@ For more information about the project visit: http://studyforrest.org
|
||||||
How to obtain the data files
|
How to obtain the data files
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
This repository contains metadata and information on the identity of all
|
|
||||||
included files. However, the actual content of the (sometime large) data
|
|
||||||
files is stored elsewhere. To obtain any dataset component, git-annex_ is
|
|
||||||
required in addition to Git_.
|
|
||||||
|
|
||||||
1. Clone this repository to the desired location.
|
This repository is a `DataLad <https://www.datalad.org/>`__ dataset. It provides
|
||||||
2. Enter the directory with the local clone and run::
|
fine-grained data access down to the level of individual files, and allows for
|
||||||
|
tracking future updates up to the level of single files. In order to use
|
||||||
|
this repository for data retrieval, `DataLad <https://www.datalad.org>`_ is
|
||||||
|
required. It is a free and open source command line tool, available for all
|
||||||
|
major operating systems, and builds up on Git and `git-annex
|
||||||
|
<https://git-annex.branchable.com>`__ to allow sharing, synchronizing, and
|
||||||
|
version controlling collections of large files. You can find information on
|
||||||
|
how to install DataLad at `handbook.datalad.org/en/latest/intro/installation.html
|
||||||
|
<http://handbook.datalad.org/en/latest/intro/installation.html>`_.
|
||||||
|
|
||||||
git annex init
|
Get the dataset
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Older versions of git-annex may require you to run the following
|
A DataLad dataset can be ``cloned`` by running::
|
||||||
command immediately afterwards::
|
|
||||||
|
|
||||||
git annex enableremote mddatasrc
|
datalad clone <url>
|
||||||
|
|
||||||
Now any desired dataset component can be obtained by using the ``git annex get``
|
Once a dataset is cloned, it is a light-weight directory on your local machine.
|
||||||
command. To obtain the entire dataset content run::
|
At this point, it contains only small metadata and information on the
|
||||||
|
identity of the files in the dataset, but not actual *content* of the
|
||||||
|
(sometimes large) data files.
|
||||||
|
|
||||||
git annex get .
|
Retrieve dataset content
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Keep data up-to-date
|
After cloning a dataset, you can retrieve file contents by running::
|
||||||
--------------------
|
|
||||||
|
|
||||||
If updates to this dataset are made in the future, update any local clone by
|
datalad get <path/to/directory/or/file>
|
||||||
running::
|
|
||||||
|
|
||||||
git pull
|
This command will trigger a download of the files, directories, or
|
||||||
|
subdatasets you have specified.
|
||||||
|
|
||||||
followed by::
|
DataLad datasets can contain other datasets, so called *subdatasets*. If you
|
||||||
|
clone the top-level dataset, subdatasets do not yet contain metadata and
|
||||||
|
information on the identity of files, but appear to be empty directories. In
|
||||||
|
order to retrieve file availability metadata in subdatasets, run::
|
||||||
|
|
||||||
git annex get .
|
datalad get -n <path/to/subdataset>
|
||||||
|
|
||||||
to fetch all new files.
|
Afterwards, you can browse the retrieved metadata to find out about
|
||||||
|
subdataset contents, and retrieve individual files with ``datalad get``. If you
|
||||||
|
use ``datalad get <path/to/subdataset>``, all contents of the subdataset will
|
||||||
|
be downloaded at once.
|
||||||
|
|
||||||
|
Stay up-to-date
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
DataLad datasets can be updated. The command ``datalad update`` will *fetch*
|
||||||
|
updates and store them on a different branch (by default
|
||||||
|
``remotes/origin/master``). Running::
|
||||||
|
|
||||||
|
datalad update --merge
|
||||||
|
|
||||||
|
will *pull* available updates and integrate them in one go.
|
||||||
|
|
||||||
|
More information
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
More information on DataLad and how to use it can be found in the DataLad Handbook at
|
||||||
|
`handbook.datalad.org <http://handbook.datalad.org/en/latest/index.html>`_. The
|
||||||
|
chapter "DataLad datasets" can help you to familiarize yourself with the
|
||||||
|
concept of a dataset.
|
||||||
|
|
||||||
|
|
||||||
.. _Git: http://www.git-scm.com
|
.. _Git: http://www.git-scm.com
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue