192 lines
9.6 KiB
Text
192 lines
9.6 KiB
Text
$ datalad get --help
|
|
Usage: datalad get [-h] [-s LABEL] [-d PATH] [-r] [-R LEVELS] [-n]
|
|
[-D DESCRIPTION] [--reckless [auto|ephemeral|shared-...]]
|
|
[-J NJOBS] [--version]
|
|
[PATH [PATH ...]]
|
|
|
|
Get any dataset content (files/directories/subdatasets).
|
|
|
|
This command only operates on dataset content. To obtain a new independent
|
|
dataset from some source use the CLONE command.
|
|
|
|
By default this command operates recursively within a dataset, but not
|
|
across potential subdatasets, i.e. if a directory is provided, all files in
|
|
the directory are obtained. Recursion into subdatasets is supported too. If
|
|
enabled, relevant subdatasets are detected and installed in order to
|
|
fulfill a request.
|
|
|
|
Known data locations for each requested file are evaluated and data are
|
|
obtained from some available location (according to git-annex configuration
|
|
and possibly assigned remote priorities), unless a specific source is
|
|
specified.
|
|
|
|
*Getting subdatasets*
|
|
|
|
Just as DataLad supports getting file content from more than one location,
|
|
the same is supported for subdatasets, including a ranking of individual
|
|
sources for prioritization.
|
|
|
|
The following location candidates are considered. For each candidate a
|
|
cost is given in parenthesis, higher values indicate higher cost, and thus
|
|
lower priority:
|
|
|
|
- A datalad URL recorded in `.gitmodules` (cost 590). This allows for
|
|
datalad URLs that require additional handling/resolution by datalad, like
|
|
ria-schemes (ria+http, ria+ssh, etc.)
|
|
|
|
- A URL or absolute path recorded for git in `.gitmodules` (cost 600).
|
|
|
|
- URL of any configured superdataset remote that is known to have the
|
|
desired submodule commit, with the submodule path appended to it.
|
|
There can be more than one candidate (cost 650).
|
|
|
|
- In case `.gitmodules` contains a relative path instead of a URL,
|
|
the URL of any configured superdataset remote that is known to have the
|
|
desired submodule commit, with this relative path appended to it.
|
|
There can be more than one candidate (cost 650).
|
|
|
|
- In case `.gitmodules` contains a relative path as a URL, the absolute
|
|
path of the superdataset, appended with this relative path (cost 900).
|
|
|
|
Additional candidate URLs can be generated based on templates specified as
|
|
configuration variables with the pattern
|
|
|
|
`datalad.get.subdataset-source-candidate-<name>`
|
|
|
|
where NAME is an arbitrary identifier. If `name` starts with three digits
|
|
(e.g. '400myserver') these will be interpreted as a cost, and the
|
|
respective candidate will be sorted into the generated candidate list
|
|
according to this cost. If no cost is given, a default of 700 is used.
|
|
|
|
A template string assigned to such a variable can utilize the Python format
|
|
mini language and may reference a number of properties that are inferred
|
|
from the parent dataset's knowledge about the target subdataset. Properties
|
|
include any submodule property specified in the respective `.gitmodules`
|
|
record. For convenience, an existing `datalad-id` record is made available
|
|
under the shortened name ID.
|
|
|
|
Additionally, the URL of any configured remote that contains the respective
|
|
submodule commit is available as `remoteurl-<name>` property, where NAME
|
|
is the configured remote name.
|
|
|
|
Hence, such a template could be `http://example.org/datasets/{id}` or
|
|
`http://example.org/datasets/{path}`, where `{id}` and `{path}` would be
|
|
replaced by the `datalad-id` or PATH entry in the `.gitmodules` record.
|
|
|
|
If this config is committed in `.datalad/config`, a clone of a dataset can
|
|
look up any subdataset's URL according to such scheme(s) irrespective of
|
|
what URL is recorded in `.gitmodules`.
|
|
|
|
Lastly, all candidates are sorted according to their cost (lower values
|
|
first), and duplicate URLs are stripped, while preserving the first item in the
|
|
candidate list.
|
|
|
|
NOTE
|
|
Power-user info: This command uses git annex get to fulfill
|
|
file handles.
|
|
|
|
*Examples*
|
|
|
|
Get a single file::
|
|
|
|
% datalad get <path/to/file>
|
|
|
|
Get contents of a directory::
|
|
|
|
% datalad get <path/to/dir/>
|
|
|
|
Get all contents of the current dataset and its subdatasets::
|
|
|
|
% datalad get . -r
|
|
|
|
Get (clone) a registered subdataset, but don't retrieve data::
|
|
|
|
% datalad get -n <path/to/subds>
|
|
|
|
positional arguments:
|
|
PATH path/name of the requested dataset component. The
|
|
component must already be known to a dataset. To add
|
|
new components to a dataset use the ADD command.
|
|
Constraints: value must be a string or value must be
|
|
NONE
|
|
|
|
optional arguments:
|
|
-h, --help, --help-np
|
|
show this help message. --help-np forcefully disables
|
|
the use of a pager for displaying the help message
|
|
-s LABEL, --source LABEL
|
|
label of the data source to be used to fulfill
|
|
requests. This can be the name of a dataset sibling or
|
|
another known source. Constraints: value must be a
|
|
string or value must be NONE
|
|
-d PATH, --dataset PATH
|
|
specify the dataset to perform the add operation on,
|
|
in which case PATH arguments are interpreted as being
|
|
relative to this dataset. If no dataset is given, an
|
|
attempt is made to identify a dataset for each input
|
|
`path`. Constraints: Value must be a Dataset or a
|
|
valid identifier of a Dataset (e.g. a path) or value
|
|
must be NONE
|
|
-r, --recursive if set, recurse into potential subdatasets.
|
|
-R LEVELS, --recursion-limit LEVELS
|
|
limit recursion into subdataset to the given number of
|
|
levels. Alternatively, 'existing' will limit recursion
|
|
to subdatasets that already existed on the filesystem
|
|
at the start of processing, and prevent new
|
|
subdatasets from being obtained recursively.
|
|
Constraints: value must be convertible to type 'int'
|
|
or value must be one of ('existing',) or value must be
|
|
NONE
|
|
-n, --no-data whether to obtain data for all file handles. If
|
|
disabled, GET operations are limited to dataset
|
|
handles. This option prevents data for file handles
|
|
from being obtained.
|
|
-D DESCRIPTION, --description DESCRIPTION
|
|
short description to use for a dataset location. Its
|
|
primary purpose is to help humans to identify a
|
|
dataset copy (e.g., "mike's dataset on lab server").
|
|
Note that when a dataset is published, this
|
|
information becomes available on the remote side.
|
|
Constraints: value must be a string or value must be
|
|
NONE
|
|
--reckless [auto|ephemeral|shared-...]
|
|
Obtain a dataset or subdatset and set it up in a
|
|
potentially unsafe way for performance, or access
|
|
reasons. Use with care, any dataset is marked as
|
|
'untrusted'. The reckless mode is stored in a
|
|
dataset's local configuration under
|
|
'datalad.clone.reckless', and will be inherited to any
|
|
of its subdatasets. Supported modes are: ['auto']:
|
|
hard-link files between local clones. In-place
|
|
modification in any clone will alter original annex
|
|
content. ['ephemeral']: symlink annex to origin's
|
|
annex and discard local availability info via git-
|
|
annex-dead 'here' and declares this annex private.
|
|
Shares an annex between origin and clone w/o git-annex
|
|
being aware of it. In case of a change in origin you
|
|
need to update the clone before you're able to save
|
|
new content on your end. Alternative to 'auto' when
|
|
hardlinks are not an option, or number of consumed
|
|
inodes needs to be minimized. Note that this mode can
|
|
only be used with clones from non-bare repositories or
|
|
a RIA store! Otherwise two different annex object tree
|
|
structures (dirhashmixed vs dirhashlower) will be used
|
|
simultaneously, and annex keys using the respective
|
|
other structure will be inaccessible.
|
|
['shared-<mode>']: set up repository and annex
|
|
permission to enable multi-user access. This disables
|
|
the standard write protection of annex'ed files.
|
|
<mode> can be any value support by 'git init
|
|
--shared=', such as 'group', or 'all'. Constraints:
|
|
value must be one of (True, False, 'auto',
|
|
'ephemeral') or value must start with 'shared-'
|
|
-J NJOBS, --jobs NJOBS
|
|
how many parallel jobs (where possible) to use. "auto"
|
|
corresponds to the number defined by
|
|
'datalad.runtime.max-annex-jobs' configuration item.
|
|
Constraints: value must be convertible to type 'int'
|
|
or value must be NONE or value must be one of
|
|
('auto',) [Default: 'auto']
|
|
--version show the module and its version which provides the
|
|
command
|
|
|