datalad-handbook/docs/beyond_basics/101-145-hooks.rst

.. index:: ! 2-003
   pair: result hooks; DataLad concept
.. _2-003:
.. _hooks:

DataLad's result hooks
^^^^^^^^^^^^^^^^^^^^^^

If you are particularly keen on automating tasks in your datasets, you may be
interested in running DataLad commands automatically as soon
as previous commands are executed and resulted in particular outcomes or states.
For example, you may want to automatically :dlcmd:`unlock` all dataset contents
right after an installation in one go. However, you'd also want to make sure that
the :dlcmd:`install` command was *successful* before attempting an
:dlcmd:`unlock`. Therefore, you would like to automatically
run the :dlcmd:`unlock .` command right after the :dlcmd:`install`
command, *but only* if the previous :dlcmd:`install` command was successful.

Such automation allows for flexible and yet automatic responses to the results
of DataLad commands, and can be done with DataLad's *result hooks*.
Generally speaking, `hooks <https://en.wikipedia.org/wiki/Hooking>`__ intercept
function calls or events and allow to extend the functionality of a program.
DataLad's result hooks are calls to other DataLad commands after the command
resulted in a specified result -- such as a successful install.

To understand how hooks can be used and defined, we have to briefly mention
DataLad's *command result evaluations*. Whenever a DataLad
command is executed, an internal evaluation generates a *report* on the status
and result of the command. To get a glimpse into such an evaluation, you can call
any DataLad command with the ``datalad`` option
``-f/--output-format <default, json, json_pp, tailored, '<template>'>`` to
return the command result evaluations with a specific formatting. Here is how this
can look like for a :dlcmd:`create`::

   $ datalad -f json_pp create somedataset
    [INFO   ] Creating a new annex repo at /tmp/somedataset
    {
      "action": "create",
      "path": "/tmp/somedataset",
      "refds": null,
      "status": "ok",
      "type": "dataset"
    }

Internally, this is useful for final result
rendering, error detection, and logging. However, by using hooks, you can
utilize these evaluations for your own purposes and "hook" in more commands
whenever an evaluation fulfills your criteria.

To be able to specify matching criteria, you need to be aware of the potential
criteria you can match against. The evaluation report is a dictionary with
``key:value`` pairs. :numref:`table-result-keyvalues` provides an overview on
some of the available keys and their possible values.

.. tabularcolumns:: \Y{.33}\Y{.66}
.. list-table:: Common result keys and their values. This is only a selection of
    available key-value pairs. The actual set of possible key-value pairs is
    potentially unlimited, as any third-party extension could introduce new keys,
    for example. If in doubt, use the ``-f/--output-format`` option with the
    command of your choice to explore how your matching criteria may look like.
   :name: table-result-keyvalues
   :widths: 50 100
   :header-rows: 1

   * - Key name
     - Values
   * - ``action``
     - ``get``, ``install``, ``drop``, ``status``, ... (any command's name)
   * - ``type``
     - ``file``, ``dataset``, ``symlink``, ``directory``
   * - ``status``
     - ``ok``, ``notneeded``, ``impossible``, ``error``
   * - ``path``
     - The path the previous command operated on

These key-value pairs provide the basis to define matching rules that -- once met --
can trigger the execution of custom hooks.
To define a hook based on certain command results, two configuration variables
need to be set:

.. index::
   single: configuration item; datalad.result-hook.<name>.match-json
   single: configuration item; datalad.result-hook.<name>.call-json
.. code-block:: bash

   datalad.result-hook.<name>.match-json

and

.. code-block:: bash

   datalad.result-hook.<name>.call-json

Here is what you need to know about these variables:

- The ``<name>`` part of the configurations is the same for both variables, and can be
  an arbitrarily [#f2]_ chosen name that serves as an identifier for the hook you are
  defining.

- The first configuration variable, ``datalad.result-hook.<name>.match-json``, defines
  the requirements that a result evaluation needs to match in order to trigger the hook.

- The second configuration variable, ``datalad.result-hook.<name>.call-json``, defines
  what the hook execution comprises. It can be any DataLad command of your choice.

And here is how to set the values for these variables:

- When set via the :gitcmd:`config` command, the value for
  ``datalad.result-hook.<name>.match-json`` needs to be specified as
  a JSON-encoded dictionary with any number of keys, such as

  .. code-block:: bash

     {"type": "file", "action": "get", "status": "notneeded"}

  This translates to: "Match a "not-needed" after :dlcmd:`get` of a file."
  If all specified values in the keys in this dictionary match the values of the
  same keys in the result evaluation, the hook is executed. Apart from ``==``
  evaluations, ``in``, ``not in``, and ``!=`` are supported. To make use of such
  operations, the test value needs to be wrapped into a list, with the first item
  being the operation, and the second value the test value, such as

  .. code-block:: bash

     {"type": ["in", ["file", "directory"]], "action": "get", "status": "notneeded"}

  This translates to:  "Match a "not-needed" after :dlcmd:`get` of a file or directory."
  Another example is

  .. code-block:: bash

     {"type":"dataset","action":"install","status":["eq", "ok"]}

  which translates to: "Match a successful installation of a dataset".

- The value for ``datalad.result-hook.<name>.call-json`` is specified in its
  Python notation, and its options -- when set via the :gitcmd:`config`
  command -- are specified as a JSON-encoded dictionary
  with keyword arguments. Conveniently, a number of string substitutions are
  supported: a ``dsarg`` argument expands to the ``dataset`` given to the initial
  command the hook operates on, and any key from the result evaluation can be
  expanded to the respective value in the result dictionary. Curly braces need to
  be escaped by doubling them.
  This is not the easiest specification there is, but it's also not as hard as it
  may sound. Here is how this could look like for a :dlcmd:`unlock`::

     $ unlock {{"dataset": "{dsarg}", "path": "{path}"}}

  This translates to "unlock the path the previous command operated on, in the
  dataset the previous command operated on". Another example is this run command::

     $ run  {{"cmd": "cp ~/Templates/standard-readme.txt {path}/README", "dataset": "{dsarg}", "explicit": true}}

  This translate to "execute a run command in the dataset the previous command operated
  on. In this run command, copy a README template file from ``~/Templates/standard-readme.txt``
  and place it into the newly created dataset." A final example is this::

     $ run_procedure {{"dataset":"{path}","spec":"cfg_metadatatypes bids"}}

  This hook will run the procedure ``cfg_metadatatypes`` with the argument ``bids``
  and thus set the standard metadata extractor to be bids.


As these variables are configuration variables, they can be set via
:gitcmd:`config` -- either for the dataset (``--local``), or the
user (``--global``) [#f3]_::

    $ git config --global --add datalad.result-hook.readme.call-json 'run {{"cmd":"cp ~/Templates/standard-readme.txt {path}/README", "outputs":["{path}/README"], "dataset":"{path}","explicit":true}}'
    $ git config --global --add datalad.result-hook.readme.match-json '{"type": "dataset","action":"create","status":"ok"}'

Here is what this writes to the ``~/.gitconfig`` file::

    [datalad "result-hook.readme"]
        call-json = run {{\"cmd\":\"cp ~/Templates/standard-readme.txt {path}/README\", \"outputs\":[\"{path}/READ>
        match-json = {\"type\": \"dataset\",\"action\":\"create\",\"status\":\"ok\"}

Note how characters such as quotation marks are automatically escaped via
backslashes. If you want to set the variables "by hand" with an editor instead
of using :gitcmd:`config`, pay close attention to escape them as well.

Given this configuration in the global ``~/.gitconfig`` file, the
"``readme``" hook would be executed whenever you successfully create a new dataset
with :dlcmd:`create`. The "``readme``" hook would then automatically copy a
file, ``~/Templates/standard-readme.txt`` (this could be a standard README template
you defined), into the new dataset.


.. rubric:: Footnotes

.. [#f2] It only needs to be compatible with :gitcmd:`config`. This means that
         it, for example, should not contain any dots (``.``).

.. [#f3] To re-read about the :gitcmd:`config` command and other configurations
         of DataLad and its underlying tools, go back to the chapter on Configurations,
         starting with :ref:`config`.
         **Note that hooks are only read from Git's config files, not .datalad/config!**
         Else, this would pose a severe security risk, as it would allow installed datasets to
         alter DataLad commands to perform arbitrary executions on a system.