datalad-handbook/docs/basics/101-139-gitlfs.rst
Michael Hanke 46d995ea2b Normalize code blocks
- console lexer for anything that is a console session
- some other specialized lexers when it makes sense
- always with prompt, when in a console session, or for commands that
  are meant to be executed

Closes #1013
2023-11-09 15:17:13 +01:00

66 lines
3.2 KiB
ReStructuredText

.. _gitlfs:
Walk-through: Git LFS as a special remote on GitHub
---------------------------------------------------
Some repository hosting services provide for-pay support for large files, and can thus be used as special remotes as well.
GitHub and GitLab, for example, support `Git Large File Storage <https://github.com/git-lfs/git-lfs>`_ (Git LFS) for managing data files using Git.
A free GitHub subscription allows up to `1GB of free storage and up to 1GB of bandwidth monthly <https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage>`_.
As such, it might be sufficient for some use cases, and could be configured
quite easily.
In order to store annexed dataset contents on GitHub, we need first to create a repository on GitHub:
.. code-block:: console
$ datalad create-sibling-github test-github-lfs --access-protocol ssh
.: github(-) [git@github.com:yarikoptic/test-github-lfs.git (git)]
'git@github.com:yarikoptic/test-github-lfs.git' configured as sibling 'github' for <Dataset path=/tmp/test-github-lfs>
and then initialize a :term:`special remote` of type ``git-lfs``, pointing to the same GitHub repository:
.. code-block:: console
$ git annex initremote github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=none embedcreds=no
If you would like to compress data in Git LFS, you need to take a detour via
encryption during :gitannexcmd:`initremote` -- this has compression as a
convenient side effect. Here is an example initialization:
.. code-block:: console
$ git annex initremote --force github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=shared
With this single step it becomes possible to transfer contents to GitHub:
.. code-block:: console
$ git annex copy --to=github-lfs file.dat
copy file.dat (to github-lfs...)
ok
(recording state in git...)
and the entire dataset to the same GitHub repository:
.. code-block:: console
$ datalad push --to=github
[INFO ] Publishing <Dataset path=/tmp/test-github-lfs> to github
publish(ok): . (dataset) [pushed to github: ['[new branch]', '[new branch]']]
Alternatively, to make publication even easier for you, the dataset provider, you can establish a :term:`publication dependency` such that a :dlcmd:`push` performs the data transfer to ``git-lfs`` automatically:
.. code-block:: console
$ datalad siblings configure -s github --publish-depends github-lfs
$ # afterwards, only datalad push is needed to publish dataset contents and history
$ datalad push --to github
Consumers of your dataset should be able to retrieve files right after cloning the dataset without a ``siblings enable`` command, as shown in section :ref:`dropbox`, because of the ``autoenable=true`` configuration for the special remote.
.. index::
pair: drop (LFS); with DataLad
.. importantnote:: No drop from LFS
Unfortunately, it is impossible to :dlcmd:`drop` contents from Git LFS:
`help.github.com/en/github/managing-large-files <https://docs.github.com/en/repositories/working-with-files/managing-large-files/removing-files-from-git-large-file-storage#git-lfs-objects-in-your-repository>`_