187 lines
10 KiB
ReStructuredText
187 lines
10 KiB
ReStructuredText
.. index::
|
|
pair: push; DataLad command
|
|
.. _push:
|
|
|
|
The datalad push command
|
|
------------------------
|
|
|
|
Previous sections on publishing DataLad datasets have each
|
|
shown you crucial aspects of the functions of dataset publishing with
|
|
:dlcmd:`push`. This section wraps them all together.
|
|
|
|
The general overview
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
:dlcmd:`push` is the command to turn to when you want to publish datasets.
|
|
It is capable of publishing all dataset content, i.e., files stored in :term:`Git`,
|
|
and files stored with :term:`git-annex`, to a known dataset :term:`sibling`.
|
|
|
|
.. index::
|
|
pair: push; DataLad concept
|
|
.. gitusernote:: Push internals
|
|
|
|
The :dlcmd:`push` uses ``git push``, and ``git annex copy`` under
|
|
the hood. Publication targets need to either be configured remote Git repositories,
|
|
or git-annex special remotes (if they support data upload).
|
|
|
|
In order to publish a dataset, the dataset needs to have a sibling to push to.
|
|
This, for instance, can be a :term:`GitHub`, :term:`GitLab`, or :term:`GIN`
|
|
repository, but it can also be a Remote Indexed Archive (RIA) store for backup
|
|
or storage of datasets [#f1]_, or a regular clone.
|
|
|
|
.. index::
|
|
pair: create-sibling-github; DataLad command
|
|
pair: create-sibling-gitlab; DataLad command
|
|
pair: create-sibling-ria; DataLad command
|
|
pair: GitHub; dataset hosting
|
|
pair: GitLab; dataset hosting
|
|
pair: RIA; dataset hosting
|
|
pair: create sibling; with DataLad
|
|
.. find-out-more:: all of the ways to configure siblings
|
|
|
|
- Add an existing repository as a sibling with the :dlcmd:`siblings`
|
|
command. Here are common examples:
|
|
|
|
.. code-block:: console
|
|
|
|
$ # to a remote repository
|
|
$ datalad siblings add --name github-repo --url <url.to.github>
|
|
$ # to a local path
|
|
$ datalad siblings add --name local-sibling --url /path/to/sibling/ds
|
|
$ # to a clone on an SSH-accessible machine
|
|
$ datalad siblings add --name server-sibling --url [user@]hostname:/path/to/sibling/ds
|
|
|
|
- Create a sibling on an external hosting service from scratch, right from
|
|
within your repository:
|
|
This can be done with the commands :dlcmd:`create-sibling-github` (for GitHub)
|
|
or :dlcmd:`create-siblings-gitlab` (for GitLab), or
|
|
:dlcmd:`create-sibling-ria` (for a remote indexed archive dataset store).
|
|
Note that :dlcmd:`create-sibling-ria` can add an existing store as a sibling
|
|
or create a new one from scratch.
|
|
|
|
- Create a sibling on a local or SSH accessible Unix machine with
|
|
:dlcmd:`create-sibling`.
|
|
|
|
|
|
In order to publish dataset content, DataLad needs to know to which sibling
|
|
content shall be pushed. This can be specified with the ``--to`` option directly
|
|
from the command line:
|
|
|
|
.. code-block:: console
|
|
|
|
$ datalad push --to <sibling>
|
|
|
|
If you have more than one :term:`branch` in your dataset, note that a
|
|
:dlcmd:`push` command will by default update only the current branch.
|
|
If updating multiple branches is relevant for your workflow, please check out
|
|
the :ref:`find-out-more about this <fom-push-branch>`.
|
|
|
|
By default, :dlcmd:`push` will make the last saved state of the dataset
|
|
available. Consequently, if the sibling is in the same state as the dataset,
|
|
no push is attempted.
|
|
Additionally, :dlcmd:`push` will attempt to automatically decide what type
|
|
of dataset contents are going to be published. With a sibling that has a
|
|
:term:`special remote` configured as a :term:`publication dependency`,
|
|
or a sibling that contains an annex (such as a GIN repository or a
|
|
:term:`Remote Indexed Archive (RIA) store`), both the contents
|
|
stored in Git (i.e., a dataset's history) as well as file contents stored in
|
|
git-annex will be published unless dataset configurations overrule this.
|
|
Alternatively, one can enforce particular operations or push a subset of dataset
|
|
contents. For one, when specifying a path in the :dlcmd:`push` command,
|
|
only data or changes for those paths are considered for a push.
|
|
Additionally, one can select a particular mode of operation with the ``-data`` option.
|
|
Several different modes are possible:
|
|
|
|
- ``nothing``: With this option, annexed contents are not published. This
|
|
means that the sibling will have information on the annexed files' names, but
|
|
file contents will not be available, and thus ``datalad get`` calls in the
|
|
sibling would fail.
|
|
- ``anything``: Transfer all annexed contents.
|
|
- ``auto``: With this option, the decision which data is transferred is based on configurations that can determine rules on a per-file and per-sibling level.
|
|
On a technical level, the ``git annex copy`` call to publish file contents is called with its ``--auto`` option.
|
|
With this option, only data that satisfies specific git-annex configurations gets transferred.
|
|
Those configurations could be ``numcopies`` settings (the number of copies available at different remotes), or ``wanted`` settings (preferred contents for a specific remote), and need to be created by a user [#f2]_ with git-annex commands. If you have files you want to keep private, or do not need published, these configurations are very useful.
|
|
- ``auto-if-wanted`` (Default): Unless a ``wanted`` or ``numcopies`` configuration exists in the dataset, all content are published. Should a ``wanted`` or ``numcopies`` configuration exist, the command enables ``--auto`` in the underlying ``git annex copy`` call.
|
|
|
|
Beyond different modes of transferring data, the ``-f/--force`` option allows to force specific publishing operations with three different modes.
|
|
Be careful when using it, as its modes possibly overrule safety protections or optimizations:
|
|
|
|
- ``checkdatapresent``: With this option, the underlying ``git annex copy`` call to
|
|
publish file contents is invoked without a ``--fast`` option. Usually, the
|
|
``--fast`` option increases the speed of the operation, as it disables a check
|
|
whether the sibling already has content. This however, might skip copying content
|
|
in some cases. Therefore, ``--force datatransfer`` is a slower, but more fail-safe
|
|
option to publish annexed file contents.
|
|
- ``gitpush``: This option triggers a ``git push --force``. Be very careful using
|
|
this option! If the changes on the dataset conflict with the changes that exist
|
|
in the sibling, the changes in the sibling will be overwritten.
|
|
- ``all``: The final mode, ``all``, combines all force modes -- thus attempting to really get your dataset contents published by any means.
|
|
|
|
|
|
:dlcmd:`push` can publish available subdatasets recursively if the
|
|
``-r/--recursive`` flag is specified. Note that this requires that all subdatasets
|
|
that should be published have sibling names identical to the sibling specified in
|
|
the top-level :dlcmd:`push` command, or that appropriate default publication
|
|
targets are configured throughout the dataset hierarchy.
|
|
|
|
.. index::
|
|
pair: configure which branches to push; with Git
|
|
.. find-out-more:: Pushing more than the current branch
|
|
:name: fom-push-branch
|
|
:float:
|
|
|
|
If you have more than one :term:`branch` in your
|
|
dataset, a :dlcmd:`push --to <sibling>` will by default only push
|
|
the current :term:`branch`, *unless* you provide configurations that alter
|
|
this default. Here are two ways in which this can be achieved:
|
|
|
|
**Option 1:** Setting the ``push.default`` configuration variable from
|
|
``simple`` (the default) to ``matching`` will configure the dataset such that
|
|
:dlcmd:`push` pushes *all* branches to the sibling.
|
|
A concrete example: On a dataset level, this can be done using
|
|
|
|
.. code-block:: console
|
|
|
|
$ git config --local push.default matching
|
|
|
|
**Option 2:**
|
|
`Tweaking the default push refspec <https://git-scm.com/book/en/v2/Git-Internals-The-Refspec>`_ for the dataset allows to
|
|
select a range of branches that should be pushed. The link above gives a
|
|
thorough introduction into the refspec. For a hands-on example, consider how it is done for
|
|
`the published DataLad-101 dataset <https://github.com/datalad-handbook/DataLad-101>`_:
|
|
|
|
The published version of the handbook is known to the local handbook dataset
|
|
as a :term:`remote` called ``public``, and each section of the book is identified
|
|
with a custom branch name that corresponds to the section name. Whenever an
|
|
update to the public dataset is pushed, apart from pushing only the ``main``
|
|
branch, all branches starting with the section identifier ``sct`` are pushed
|
|
automatically as well. This configuration was achieved by specifying these branches
|
|
(using :term:`globbing` with ``*``) in the ``push`` specification of this :term:`remote`:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git config --local remote.public.push 'refs/heads/sct*'
|
|
|
|
Pushing errors
|
|
^^^^^^^^^^^^^^
|
|
|
|
If you are unfamiliar with Git, please be aware that cloning a dataset to a different place and subsequently pushing to it can lead to Git error messages if changes are pushed to a currently checked out :term:`branch` of the sibling (in technical Git terms: When pushing to a checked-out branch of a non-bare repository remote).
|
|
As an example, consider what happens if we attempt a :dlcmd:`push` to the sibling ``roommate`` that we created in the chapter :ref:`chapter_collaboration`:
|
|
|
|
.. runrecord:: _examples/DL-101-141-101
|
|
:language: console
|
|
:exitcode: 1
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad push --to roommate
|
|
|
|
Publishing fails with the error message ``[remote rejected] (branch is currently checked out)``.
|
|
This can be prevented with `configuration settings <https://github.blog/2015-02-06-git-2-3-has-been-released>`_ in Git versions 2.3 or higher, or by pushing to a branch of the sibling that is currently not checked-out.
|
|
For more information on this, and other error messages during push, please checkout the section :ref:`help`.
|
|
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#f1] RIA siblings are file system based, scalable storage solutions for
|
|
DataLad datasets. You can find out more about them in the online-handbook.
|
|
.. [#f2] For information on the ``numcopies`` and ``wanted`` settings of git-annex see its documentation at `git-annex.branchable.com/git-annex-wanted/ <https://git-annex.branchable.com/git-annex-wanted>`_ and `git-annex.branchable.com/git-annex-numcopies/ <https://git-annex.branchable.com/git-annex-numcopies>`_.
|