378 lines
15 KiB
ReStructuredText
378 lines
15 KiB
ReStructuredText
.. _sibling:
|
|
|
|
Networking
|
|
----------
|
|
|
|
To get a hang on the basics of sharing a dataset,
|
|
you shared your ``DataLad-101`` dataset with your
|
|
room mate on a common, local file system. Your lucky
|
|
room mate now has your notes and can thus try to catch
|
|
up to still pass the course.
|
|
Moreover, though, he can also integrate all other notes
|
|
or changes you make to your dataset, and stay up to date.
|
|
This is because a DataLad dataset makes updating shared
|
|
data a matter of a single :dlcmd:`update --how merge` command.
|
|
|
|
But why does this need to be a one-way street? "I want to
|
|
provide helpful information for you as well!", says your
|
|
room mate. "How could you get any insightful notes that
|
|
I make in my dataset, or maybe the results of our upcoming
|
|
mid-term project? It's a bit unfair that I can get your work,
|
|
but you cannot get mine."
|
|
|
|
.. index::
|
|
pair: register file with URL in dataset; with DataLad
|
|
|
|
Consider, for example, that your room mate might have googled about DataLad
|
|
a bit. In the depths of the web, he might have found useful additional information, such
|
|
a script on `dataset nesting <https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh>`_.
|
|
Because he found this very helpful in understanding dataset
|
|
nesting concepts, he decided to download it from GitHub, and saved it in the ``code/`` directory.
|
|
|
|
He does it using the DataLad command :dlcmd:`download-url`
|
|
that you experienced in section :ref:`createDS` already: This command will
|
|
download a file just as ``wget``, but it can also take a commit message
|
|
and will save the download right to the history of the dataset that you specify,
|
|
while recording its origin as provenance information.
|
|
|
|
Navigate into your dataset copy in ``mock_user/DataLad-101``,
|
|
and run the following command
|
|
|
|
.. runrecord:: _examples/DL-101-121-101
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: Let's make changes in the copy of the original ds
|
|
:cast: 04_collaboration
|
|
|
|
$ # navigate into the installed copy
|
|
$ cd ../mock_user/DataLad-101
|
|
|
|
$ # download the shell script and save it in your code/ directory
|
|
$ datalad download-url \
|
|
-d . \
|
|
-m "Include nesting demo from datalad website" \
|
|
-O code/nested_repos.sh \
|
|
https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh
|
|
|
|
Run a quick ``datalad status``:
|
|
|
|
.. runrecord:: _examples/DL-101-121-102
|
|
:language: console
|
|
:workdir: dl-101/mock_user/DataLad-101
|
|
:notes: the download url command takes care of saving contents for you
|
|
:cast: 04_collaboration
|
|
|
|
$ datalad status
|
|
|
|
Nice, the :dlcmd:`download-url` command saved this download
|
|
right into the history, and :dlcmd:`status` does not report
|
|
unsaved modifications! We'll show an excerpt of the last commit
|
|
here [#f1]_:
|
|
|
|
.. runrecord:: _examples/DL-101-121-103
|
|
:language: console
|
|
:workdir: dl-101/mock_user/DataLad-101
|
|
:lines: 1-13
|
|
:notes: the ds copy has a change the original ds does not have:
|
|
:cast: 04_collaboration
|
|
|
|
$ git log -n 1 -p
|
|
|
|
Suddenly, your room mate has a file change that you do not have.
|
|
His dataset evolved.
|
|
|
|
So how do we link back from the copy of the dataset to its
|
|
origin, such that your room mate's changes can be included in
|
|
your dataset? How do we let the original dataset "know" about
|
|
this copy your room mate has?
|
|
Do we need to install the installed dataset of our room mate
|
|
as a copy again?
|
|
|
|
No, luckily, it's simpler and less convoluted. What we have to
|
|
do is to *register* a DataLad :term:`sibling`: A reference to our room mate's
|
|
dataset in our own, original dataset.
|
|
|
|
.. index::
|
|
pair: sibling; DataLad concept
|
|
.. gitusernote:: Remote siblings
|
|
|
|
Git repositories can configure clones of a dataset as *remotes* in
|
|
order to fetch, pull, or push from and to them. A :dlcmd:`sibling`
|
|
is the equivalent of a git clone that is configured as a remote.
|
|
|
|
Let's see how this is done.
|
|
|
|
.. index::
|
|
pair: siblings; DataLad command
|
|
pair: register sibling in dataset; with DataLad
|
|
|
|
First of all, navigate back into the original dataset.
|
|
In the original dataset, "add" a "sibling" by using
|
|
the :dlcmd:`siblings` command.
|
|
The command takes the base command,
|
|
:dlcmd:`siblings`, an action, in this case ``add``, a path to the
|
|
root of the dataset ``-d .``, a name for the sibling, ``-s/--name roommate``,
|
|
and a URL or path to the sibling, ``--url ../mock_user/DataLad-101``.
|
|
This registers your room mate's ``DataLad-101`` as a "sibling" (we will call it
|
|
"roommate") to your own ``DataLad-101`` dataset.
|
|
|
|
|
|
.. runrecord:: _examples/DL-101-121-104
|
|
:language: console
|
|
:workdir: dl-101/mock_user/DataLad-101
|
|
:notes: To allow updates from copy to original we have to configure the copy as a sibling of the original
|
|
:cast: 04_collaboration
|
|
|
|
$ cd ../../DataLad-101
|
|
$ # add a sibling
|
|
$ datalad siblings add -d . \
|
|
--name roommate --url ../mock_user/DataLad-101
|
|
|
|
There are a few confusing parts about this command: For one, do not be surprised
|
|
about the ``--url`` argument -- it's called "URL" but it can be a path as well.
|
|
Also, do not forget to give a name to your dataset's sibling. Without the ``-s``/
|
|
``--name`` argument the command will fail. The reason behind this is that the default
|
|
name of a sibling if no name is given will be the host name of the specified URL,
|
|
but as you provide a path and not a URL, there is no host name to take as a default.
|
|
|
|
As you can see in the command output, the addition of a :term:`sibling` succeeded:
|
|
``roommate(+)[../mock_user/DataLad-101]`` means that your room mate's dataset
|
|
is now known to your own dataset as "roommate".
|
|
|
|
.. index::
|
|
pair: list dataset siblings; with DataLad
|
|
.. runrecord:: _examples/DL-101-121-105
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: we can check which siblings the dataset has
|
|
:cast: 04_collaboration
|
|
|
|
$ datalad siblings
|
|
|
|
This command will list all known siblings of the dataset. You can see it
|
|
in the resulting list with the name "roommate" you have given to it.
|
|
|
|
.. index::
|
|
pair: remove dataset sibling; with DataLad
|
|
.. find-out-more:: What if I mistyped the name or want to remove the sibling?
|
|
|
|
You can remove a sibling using :dlcmd:`siblings remove -s roommate`
|
|
|
|
The fact that the ``DataLad-101`` dataset now has a sibling means that we
|
|
can also :dlcmd:`update` this repository. Awesome!
|
|
|
|
Your room mate previously ran a :dlcmd:`update --how merge` in the section
|
|
:ref:`update`. This got him
|
|
changes *he knew you made* into a dataset that *he so far did not change*.
|
|
This meant that nothing unexpected would happen with the
|
|
:dlcmd:`update --how merge`.
|
|
|
|
But consider the current case: Your room mate made changes to his
|
|
dataset, but you do not necessarily know which. You also made
|
|
changes to your dataset in the meantime, and added a note on
|
|
:dlcmd:`update`.
|
|
How would you know that his changes and
|
|
your changes are not in conflict with each other?
|
|
|
|
This scenario is where a plain :dlcmd:`update` becomes useful.
|
|
If you run a plain :dlcmd:`update` (which uses the default option ``--how fetch``), DataLad will query the sibling
|
|
for changes, and store those changes in a safe place in your own
|
|
dataset, *but it will not yet integrate them into your dataset*.
|
|
This gives you a chance to see whether you actually want to have the
|
|
changes your room mate made.
|
|
|
|
.. index::
|
|
pair: update dataset from particular sibling; with DataLad
|
|
|
|
Let's see how it's done. First, run a plain :dlcmd:`update` without
|
|
the ``--how merge`` option.
|
|
|
|
.. runrecord:: _examples/DL-101-121-106
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: now we can update. Problem: how do we know whether we want the changes? --> plain datalad update
|
|
:cast: 04_collaboration
|
|
|
|
$ datalad update -s roommate
|
|
|
|
Note that we supplied the sibling's name with the ``-s``/``--name`` option.
|
|
This is good practice, and allows you to be precise in where you want to get
|
|
updates from. It would have worked without the specification (just as a bare
|
|
:dlcmd:`update --how merge` worked for your room mate), because there is only
|
|
one other known location, though.
|
|
|
|
This plain :dlcmd:`update` "fetched" updates from
|
|
the dataset. The changes however, are not yet visible -- the script that
|
|
he added is not yet in your ``code/`` directory:
|
|
|
|
.. runrecord:: _examples/DL-101-121-107
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: no file changes there yet, but where are they?
|
|
:cast: 04_collaboration
|
|
|
|
$ ls code/
|
|
|
|
So where is the file? It is in a different *branch* of your dataset.
|
|
|
|
If you do not use :term:`Git`, the concept of a :term:`branch` can be a big
|
|
source of confusion. There will be sections later in this book that will
|
|
elaborate a bit more what branches are, and how to work with them, but
|
|
for now envision a branch just like a bunch of drawers on your desk.
|
|
The paperwork that you have in front of you right on your desk is your
|
|
dataset as you currently see it.
|
|
These drawers instead hold documents that you are in principle working on,
|
|
just not now -- maybe different versions of paperwork you currently have in
|
|
front of you, or maybe other files than the ones currently in front of you
|
|
on your desk.
|
|
|
|
Imagine that a :dlcmd:`update` created a small drawer, placed all of
|
|
the changed or added files from the sibling inside, and put it on your
|
|
desk. You can now take a look into that drawer to see whether you want
|
|
to have the changes right in front of you.
|
|
|
|
The drawer is a branch, and it is usually called ``remotes/origin/main``.
|
|
To look inside of it you can :gitcmd:`checkout BRANCHNAME`, or you can
|
|
do a ``diff`` between the branch (your drawer) and the dataset as it
|
|
is currently in front of you (your desk). We will do the latter, and leave
|
|
the former for a different lecture:
|
|
|
|
.. index::
|
|
pair: corresponding branch; in adjusted mode
|
|
pair: show dataset modification for particular path; on Windows with DataLad
|
|
pair: diff; DataLad command
|
|
.. windows-wit:: Please use 'datalad diff --from main --to remotes/roommate/main'
|
|
|
|
.. include:: topic/adjustedmode-diff-remote.rst
|
|
|
|
.. runrecord:: _examples/DL-101-121-108
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: on a different branch: remotes/roommate/main. Do a git remote -v here
|
|
:cast: 04_collaboration
|
|
|
|
$ datalad diff --to remotes/roommate/main
|
|
|
|
This shows us that there is an additional file, and it also shows us
|
|
that there is a difference in ``notes.txt``! Let's ask
|
|
:gitcmd:`diff` to show us what the differences in detail (note that it is a shortened excerpt, cut in the middle to reduce its length):
|
|
|
|
.. index::
|
|
pair: corresponding branch; in adjusted mode
|
|
pair: show dataset modification; on Windows with Git
|
|
pair: diff; DataLad command
|
|
.. windows-wit:: Please use 'git diff main..remotes/roommate/main'
|
|
|
|
.. include:: topic/adjustedmode-gitdiff-remote.rst
|
|
|
|
.. runrecord:: _examples/DL-101-121-109
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: also git diff
|
|
:lines: 1-18, 67-78
|
|
:cast: 04_collaboration
|
|
|
|
$ git diff remotes/roommate/main
|
|
|
|
Let's digress into what is shown here.
|
|
We are comparing the current state of your dataset against
|
|
the current state of your room mate's dataset. Everything marked with
|
|
a ``-`` is a change that your room mate has, but not you: This is the
|
|
script that he downloaded!
|
|
|
|
Everything that is marked with a ``+`` is a change that you have,
|
|
but not your room mate: It is the additional note on :dlcmd:`update`
|
|
you made in your own dataset in the previous section.
|
|
|
|
Cool! So now that you know what the changes are that your room mate
|
|
made, you can safely :dlcmd:`update --how merge` them to integrate
|
|
them into your dataset. In technical terms you will
|
|
"*merge the branch remotes/roommate/main into main*".
|
|
But the details of this will be stated in a standalone section later.
|
|
|
|
Note that the fact that your room mate does not have the note
|
|
on :dlcmd:`update` does not influence your note. It will not
|
|
get deleted by the merge. You do not set your dataset to the state
|
|
of your room mate's dataset, but you incorporate all changes he made
|
|
-- which is only the addition of the script.
|
|
|
|
.. runrecord:: _examples/DL-101-121-110
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: no we can safely merge
|
|
:cast: 04_collaboration
|
|
|
|
$ datalad update --how merge -s roommate
|
|
|
|
The exciting question is now whether your room mate's change is now
|
|
also part of your own dataset. Let's list the contents of the ``code/``
|
|
directory and also peek into the history:
|
|
|
|
.. runrecord:: _examples/DL-101-121-111
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: check for the updated files... they are there!
|
|
:cast: 04_collaboration
|
|
|
|
$ ls code/
|
|
|
|
.. runrecord:: _examples/DL-101-121-112
|
|
:language: console
|
|
:lines: 1-6
|
|
:emphasize-lines: 2, 4
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: and here is the summary in the log
|
|
:cast: 04_collaboration
|
|
|
|
$ git log --oneline
|
|
|
|
Wohoo! Here it is: The script now also exists in your own dataset.
|
|
You can see the commit that your room mate made when he saved the script,
|
|
and you can also see a commit that records how you ``merged`` your
|
|
room mate's dataset changes into your own dataset. The commit message of this
|
|
latter commit for now might contain many words yet unknown to you if you
|
|
do not use Git, but a later section will get into the details of what
|
|
the meaning of ":term:`merge`", ":term:`branch`", "refs"
|
|
or ":term:`main`" is.
|
|
|
|
For now, you are happy to have the changes your room mate made available.
|
|
This is how it should be! You helped him, and he helps you. Awesome!
|
|
There actually is a wonderful word for it: *Collaboration*.
|
|
Thus, without noticing, you have successfully collaborated for the first
|
|
time using DataLad datasets.
|
|
|
|
Create a note about this, and save it.
|
|
|
|
.. runrecord:: _examples/DL-101-121-113
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:notes: write a note
|
|
:cast: 04_collaboration
|
|
|
|
$ cat << EOT >> notes.txt
|
|
To update from a dataset with a shared history, you need to add this
|
|
dataset as a sibling to your dataset. "Adding a sibling" means
|
|
providing DataLad with info about the location of a dataset, and a
|
|
name for it.
|
|
Afterwards, a "datalad update --how merge -s name" will integrate the
|
|
changes made to the sibling into the dataset. A safe step in between
|
|
is to do a "datalad update -s name" and checkout the changes with
|
|
"git/datalad diff" to remotes/origin/main.
|
|
|
|
EOT
|
|
$ datalad save -m "Add note on adding siblings"
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#f1] As this example, simplistically, created a "pretend" room mate by only changing directories, not user accounts, the recorded Git identity of your "room mote" will, of course, be the same as yours.
|
|
|
|
.. only:: adminmode
|
|
|
|
Add a tag at the section end.
|
|
|
|
.. runrecord:: _examples/DL-101-121-114
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
|
|
$ git branch sct_networking
|