375 lines
18 KiB
ReStructuredText
375 lines
18 KiB
ReStructuredText
.. _config:
|
|
|
|
DIY configurations
|
|
------------------
|
|
|
|
Back in section :ref:`text2git`, you already learned that there
|
|
are dataset configurations, and that these configurations can
|
|
be modified, for example, with the ``-c text2git`` option.
|
|
This option applies a configuration template to store text
|
|
files in :term:`Git` instead of :term:`git-annex`, and thereby
|
|
modifies the DataLad dataset's default configuration to store
|
|
every file in git-annex.
|
|
|
|
The lecture today focuses entirely on the topic of configurations,
|
|
and aims to equip everyone with the basics to configure
|
|
their general and dataset specific setup to their needs.
|
|
This is not only a handy way to tune a dataset to one's
|
|
wishes, but also helpful to understand potential differences in
|
|
command execution and file handling between two users,
|
|
computers, or datasets.
|
|
|
|
"First of all, when we talk about configurations, we have
|
|
to differentiate between different scopes of configuration,
|
|
and different tools the configuration belongs or applies to",
|
|
our lecturer starts. "In DataLad datasets, different tools can
|
|
have a configuration: :term:`Git`, :term:`git-annex`, and
|
|
DataLad itself. Because these tools are all
|
|
combined by DataLad to help you manage your data,
|
|
it is important to understand how the configuration of one
|
|
software is used by or influences a second tool, or the overall
|
|
dataset performance."
|
|
|
|
"Oh crap, one of these theoretical lectures again" mourns a
|
|
student from the row behind you. Personally, you'd also
|
|
be much more excited
|
|
about any hands-on lecture filled with commands. But the
|
|
recent lecture about :term:`git-annex` and the :term:`object-tree`
|
|
was surprisingly captivating, so you are actually looking forward to today.
|
|
"Shht! I want to hear this!", you shush him with a wink.
|
|
|
|
"We will start by looking into the very first configuration
|
|
you did, already before the course started: The *global*
|
|
Git configuration." the lecturer says.
|
|
|
|
.. index::
|
|
pair: config; Git command
|
|
|
|
At one point in time, you likely followed instructions such as
|
|
in :ref:`install` and configured your
|
|
*Git identity* with the commands:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git config --global --add user.name "Elena Piscopia"
|
|
$ git config --global --add user.email elena@example.net
|
|
|
|
"What the above commands do is very simple: They search for
|
|
a specific configuration file, and set the variables specified
|
|
in the command -- in this case user name and user email address
|
|
-- to the values provided with the command." she explains.
|
|
|
|
"This general procedure, specifying a value for a configuration
|
|
variable in a configuration file, is how you can configure the
|
|
different tools to your needs. The configuration, therefore,
|
|
is really easy. Even if you are only used to ticking boxes
|
|
in the ``settings`` tab of a software tool so far, it's intuitive
|
|
to understand how a configuration file in principle works and also
|
|
how to use it. The only piece of information you will need
|
|
are the necessary files, or the command that writes to them, and
|
|
the available options for configuration, that's it. And what's
|
|
really cool is that all tools we'll be looking at -- Git, git-annex,
|
|
and DataLad -- can be configured using the :gitcmd:`config`
|
|
command [#f1]_. Therefore, once you understand the syntax of this
|
|
command, you already know half of what's relevant. The other half
|
|
is understanding what you are doing. Now then, let's learn *how*
|
|
to configure settings, but also *understand* what we are doing
|
|
with these configurations."
|
|
|
|
"This seems easy enough", you think. Let's see what types of
|
|
configurations there are.
|
|
|
|
Git config files
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
The user name and email configuration
|
|
is a *user-specific* configuration (called *global*
|
|
configuration by Git), and therefore applies to your user account.
|
|
Wherever on your computer *you* run a Git, git-annex, or DataLad
|
|
command, this global configuration will
|
|
associate the name and email address you supplied in
|
|
the :gitcmd:`config` commands above with this action.
|
|
For example, whenever you
|
|
``datalad save``, the information in this file is used for the
|
|
history entry about commit author and email.
|
|
|
|
Apart from *global* Git configurations, there are also *system-wide* [#f2]_
|
|
and *repository* configurations. Each of these configurations
|
|
resides in its own file. The global configuration is stored in a file called
|
|
``.gitconfig`` in your home directory. Among
|
|
your name and email address, this file can store general
|
|
per-user configurations, such as a default editor [#f3]_, or highlighting
|
|
options.
|
|
|
|
The *repository-specific* configurations apply to each individual
|
|
repository. Their scope is more limited than the *global*
|
|
configuration (namely to a single repository), but it can overrule global
|
|
configurations: The more specific the scope of a configuration file is, the more
|
|
important it is, and the variables in the more specific configuration
|
|
will take precedence over variables in less specific configuration files.
|
|
One could, for example, have :term:`vim` configured to be the default editor
|
|
on a global scope, but could overrule this by setting the editor to ``nano``
|
|
in a given repository. For this reason, the repository-specific configuration
|
|
does not reside in a file in your home directory, but in ``.git/config``
|
|
within every Git repository (and thus DataLad dataset).
|
|
|
|
Thus, there are three different scopes of Git configuration, and each is defined
|
|
in a ``config`` file in a different location. The configurations will determine
|
|
how Git behaves. In principle, all of these files can configure
|
|
the same variables differently, but more specific scopes take precedence over broader
|
|
scopes. Conveniently, not only can DataLad and git-annex be configured with
|
|
the same command as Git, but in many cases they will also use exactly the same
|
|
files as Git for their own configurations.
|
|
|
|
.. index:: ! configuration file; .git/config
|
|
|
|
Let's find out how the repository-specific configuration file in the ``DataLad-101``
|
|
superdataset looks like:
|
|
|
|
.. runrecord:: _examples/DL-101-122-101
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat .git/config
|
|
|
|
This file consists of so called "sections" with the section names
|
|
in square brackets (e.g., ``core``). Occasionally, a section can have
|
|
subsections: This is indicated by subsection names in
|
|
quotation marks after the section name. For example, ``roommate`` is a subsection
|
|
of the section ``remote``.
|
|
Within each section, ``variable = value`` pairs specify configurations
|
|
for the given (sub)section.
|
|
|
|
.. index::
|
|
pair: configure editor; with Git
|
|
|
|
The first section is called ``core`` -- as the name suggests,
|
|
this configures core Git functionality. There are
|
|
`many more <https://git-scm.com/docs/git-config#Documentation/git-config.txt-corefileMode>`_
|
|
configurations than the ones in this config file, but
|
|
they are related to Git, and less related or important to the configuration of
|
|
a DataLad dataset. We will use this section to showcase the anatomy of the
|
|
:gitcmd:`config` command. If, for example, you would want to specifically
|
|
configure :term:`nano` to be the default editor in this dataset, you
|
|
can do it like this:
|
|
|
|
.. runrecord:: _examples/DL-101-122-102
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git config --local --add core.editor "nano"
|
|
|
|
The command consists of the base command :gitcmd:`config`,
|
|
a specification of the scope of the configuration with the ``--local``
|
|
flag, a ``name`` specification consisting of section and key with the
|
|
notation ``section.variable`` (here: ``core.editor``), and finally the value
|
|
specification ``"nano"``.
|
|
|
|
Let's see what has changed:
|
|
|
|
.. runrecord:: _examples/DL-101-122-103
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:emphasize-lines: 7
|
|
|
|
$ cat .git/config
|
|
|
|
With this additional line in your repository's Git configuration, ``nano`` will
|
|
be used as a default editor regardless of the configuration in your global
|
|
or system-wide configuration. Note that the flag ``--local`` applies the
|
|
configuration to your repository's ``.git/config`` file, whereas ``--global``
|
|
would apply it as a user specific configuration, and ``--system`` as a
|
|
system-wide configuration.
|
|
If you would want to change this existing line in your ``.git/config``
|
|
file, you would replace ``--add`` with ``--replace-all`` such as in:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git config --local --replace-all core.editor "vim"
|
|
|
|
to configure :term:`vim` to be your default editor.
|
|
Note that while being a good toy example, it is not a common thing to
|
|
configure repository-specific editors.
|
|
|
|
This example demonstrated the structure of a :gitcmd:`config`
|
|
command. By specifying the ``name`` option with ``section.variable``
|
|
(or ``section.subsection.variable`` if there is a subsection), and
|
|
a value, one can configure Git, git-annex, and DataLad.
|
|
*Most* of these configurations will be written to a ``config`` file
|
|
of Git, depending on the scope (local, global, system-wide)
|
|
specified in the command.
|
|
|
|
.. index::
|
|
pair: unset configuration; with Git
|
|
.. find-out-more:: If things go wrong during Git config
|
|
|
|
If something goes wrong during the :gitcmd:`config` command,
|
|
for example, you end up having two keys of the same name because you
|
|
added a key instead of replacing an existing one, you can use the
|
|
``--unset`` option to remove the line. Alternatively, you can also open
|
|
the config file in an editor and remove or change sections by hand.
|
|
|
|
|
|
The only information you need, therefore, is the name of a section and
|
|
variable to configure, and the value you want to specify. But in many cases
|
|
it is also useful to find out which configurations are already set in
|
|
which way and where. For this, the :gitcmd:`config --list --show-origin`
|
|
is useful. It will display all configurations and their location:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git config --list --show-origin
|
|
file:/home/bob/.gitconfig user.name=Bob McBobface
|
|
file:/home/bob/.gitconfig user.email=bob@mcbobface.com
|
|
file:.git/config annex.uuid=1f83595e-bcba-4226-aa2c-6f0153eb3c54
|
|
file:.git/config annex.backends=MD5E
|
|
file:.git/config submodule.recordings/longnow.url=https://github.com/✂
|
|
file:.git/config submodule.recordings/longnow.active=true
|
|
file:.git/config remote.roommate.url=../mock_user/onemoredir/DataLad-101
|
|
file:.git/config remote.roommate.annex-uuid=a5ae24de-1533-4b09-98b9-cd9ba6bf303c
|
|
file:.git/config submodule.longnow.url=https://github.com/✂
|
|
file:.git/config submodule.longnow.active=true
|
|
...
|
|
|
|
This example shows some configurations in the global ``.gitconfig``
|
|
file, and the configurations within ``DataLad-101/.git/config``.
|
|
The command is very handy to display all configurations at once to identify
|
|
configuration problems, find the right configuration file to make a change to,
|
|
or simply remind oneself of the existing configurations, and it is a useful
|
|
helper to keep in the back of your head.
|
|
|
|
At this point you may feel like many of these configurations or the configuration file
|
|
inside of ``DataLad-101`` do not appear to be
|
|
intuitively understandable enough to confidently apply changes to them,
|
|
or identify necessary changes. And indeed, most of the sections and variables
|
|
or values in there are irrelevant for understanding the book, your dataset,
|
|
or DataLad, and can just be left as they are. The previous section merely served
|
|
to de-mystify the :gitcmd:`config` command and the configuration files.
|
|
Nevertheless, it might be helpful to get an overview about the meaning of the
|
|
remaining sections in that file, and the :ref:`that dissects this config file further <fom_gitconfig>` can give you a glimpse of this.
|
|
|
|
.. index:: dataset configuration
|
|
.. find-out-more:: Dissecting a Git config file further
|
|
:name: fom_gitconfig
|
|
:float:
|
|
|
|
Let's walk through the Git config file of ``DataLad-101``:
|
|
As mentioned above, git-annex will use the
|
|
:term:`Git config file` for some of its configurations, such as the second section.
|
|
It lists the repository version and :term:`annex UUID` [#f4]_ (:gitannexcmd:`whereis` displays information about where the
|
|
annexed content is with these UUIDs).
|
|
|
|
You may recognize the fourth part of the configuration, the subsection
|
|
``"recordings/longnow"`` in the section ``submodule``.
|
|
Clearly, this is a reference to the ``longnow`` podcasts
|
|
we cloned as a subdataset. The name *submodule* is Git
|
|
terminology, and describes a Git repository inside of
|
|
another Git repository -- just like
|
|
the super- and subdataset principles you discovered in the
|
|
section :ref:`nesting`. When you clone a DataLad dataset
|
|
as a subdataset, it gets *registered* in this file.
|
|
For each subdataset, an individual submodule entry
|
|
will store the information about the subdataset's
|
|
``--source`` or *origin* (the "url").
|
|
Thus, every subdataset in your dataset
|
|
will be listed in this file.
|
|
If you want, go back to section :ref:`installds` to see that the
|
|
"url" is the same URL we cloned the longnow dataset from, and
|
|
go back to section :ref:`sharelocal1` to remind yourself of
|
|
how cloning a dataset with subdatasets looked and felt like.
|
|
|
|
Another interesting part is the last section, "remote".
|
|
Here we can find the :term:`sibling` "roommate" we defined
|
|
in :ref:`sibling`. The term :term:`remote` is Git-terminology and is
|
|
used to describe other repositories or DataLad datasets that the
|
|
repository knows about.
|
|
This file, therefore, is where DataLad *registered* the sibling
|
|
with :dlcmd:`siblings add`, and thanks to it you can
|
|
collaborate with your room mate.
|
|
The value to the ``url`` variable is a *path*. If at any point
|
|
either your superdataset or the remote moves on your file system,
|
|
the association between the two datasets breaks -- this can be fixed by adjusting this
|
|
path, and a demonstration of this is in section :ref:`file system`.
|
|
`fetch` contains a specification which parts of the repository are
|
|
updated -- in this case everything (all of the branches).
|
|
Lastly, the ``annex-ignore = false`` configuration allows git-annex
|
|
to query the remote when it tries to retrieve data from annexed content.
|
|
|
|
.. index::
|
|
pair: configuration; DataLad command
|
|
pair: set configuration; with DataLad
|
|
|
|
The ``datalad configuration`` command
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Although this section put a focus on the ``git config`` command, it is important to mention that there also is a :dlcmd:`configuration` command.
|
|
It is not identical to ``git config``, but while it lacks some feature of ``git config``, such as the ability to set system-wide configuration, it has additional features.
|
|
Beyond the ``local`` and ``global`` scopes, it also supports :term:`branch` specific configurations in the ``.datalad/config`` file (further discussed in the next section), setting configurations recursively through dataset hierarchies, and multi-configuration queries (such as ``datalad configuration get user.name user.email``).
|
|
By default, ``datalad configuration`` will ``dump`` (list) the effective configuration including relevant ``DATALAD_*`` :term:`environment variable`\s, and also annotate the purpose of many common configuration items.
|
|
The subcommands ``datalad configuration get`` or ``datalad configuration set`` perform queries or set configurations.
|
|
You can find out more information on this command in the command documentation.
|
|
|
|
|
|
``.git/config`` versus other (configuration) files
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
One crucial aspect distinguishes the ``.git/config`` file from many other files
|
|
in your dataset: Even though it is part of your dataset, it won't be shared together
|
|
with the dataset. The reason for this is that this file is not version
|
|
controlled, as it lies within the ``.git`` directory.
|
|
Repository-specific configurations within your ``.git/config``
|
|
file are thus not written to history. Any local configuration in ``.git/config``
|
|
applies to the dataset, but it does not *stick* to the dataset.
|
|
One can have the misconception that because the configurations were made *in*
|
|
the dataset, these configurations will also be shared together with the dataset.
|
|
``.git/config``, however, behaves just as your global or system-wide configurations.
|
|
These configurations are in effect on a system, or for a user, or for a dataset,
|
|
but are not shared.
|
|
A :dlcmd:`clone` command of someone's dataset will not get you their
|
|
editor configuration, should they have included one in their config file.
|
|
Instead, upon a :dlcmd:`clone`, a new config file will be created.
|
|
|
|
|
|
This means, however, that configurations that should "stick" to a dataset [#f5]_
|
|
need to be defined in different files -- files that are version controlled.
|
|
The next section will talk about them.
|
|
|
|
|
|
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#f1] As an alternative to a ``git config`` command, you could also run configuration
|
|
templates or procedures that apply predefined configurations or in some cases even
|
|
add the information to the configuration file by hand and save it using an editor of your choice. See :ref:`procedures` for more info.
|
|
|
|
.. [#f2] The third scope of a Git configuration are the system wide configurations.
|
|
These are stored (if they exist) in ``/etc/gitconfig`` and contain settings that would
|
|
apply to every user on the computer you are using. These configurations
|
|
are not relevant for DataLad-101, and we will thus skip them. You can
|
|
read more about Git's configurations and different files
|
|
`here <https://git-scm.com/docs/git-config>`_.
|
|
|
|
.. [#f3] If your default editor is :term:`vim` and you do not like this, now can be the time
|
|
to change it! Chose either of two options:
|
|
|
|
1) Open up the file with an editor for your choice (e.g., `nano <https://www.howtogeek.com/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor>`_), and either paste the following configuration or edit it if it already exists:
|
|
|
|
.. code-block:: ini
|
|
|
|
[core]
|
|
editor = nano
|
|
|
|
2) Run the following command, but exchange ``nano`` with an editor of your choice:
|
|
|
|
.. code-block:: ini
|
|
|
|
$ git config --global --add core.editor "nano"
|
|
|
|
.. [#f4] A UUID is a universally unique identifier -- a 128-bit number
|
|
that unambiguously identifies information.
|
|
|
|
.. [#f5] Please note that not all configurations can be written to files other than ``.git/config``.
|
|
Some of the files introduced in the next section will not be queried by Git, and in principle, it is a good thing that one cannot share arbitrary configurations together with a dataset, as this could be a potential security threat.
|
|
In those cases where you need dataset clones to inherit certain non-sticky configurations, it is advised to write a custom procedure and distribute it together with the dataset.
|
|
The next two sections contain concrete usecases and tutorials.
|