844 lines
31 KiB
ReStructuredText
844 lines
31 KiB
ReStructuredText
.. _history:
|
|
|
|
Back and forth in time
|
|
----------------------
|
|
|
|
|
|
Almost everyone inadvertently deleted or overwrote files at some point with
|
|
a hasty operation that caused data fatalities or at least troubles to
|
|
reobtain or restore data.
|
|
With DataLad, no mistakes are forever: One powerful feature of datasets
|
|
is the ability to revert data to a previous state and thus view earlier content or
|
|
correct mistakes. As long as the content was version controlled (i.e., tracked),
|
|
it is possible to look at previous states of the data, or revert changes --
|
|
even years after they happened -- thanks to the underlying version control
|
|
system :term:`Git`.
|
|
|
|
.. figure:: ../artwork/src/versioncontrol.svg
|
|
:width: 70%
|
|
|
|
To get a glimpse into how to work with the history of a dataset, today's lecture
|
|
has an external Git-expert as a guest lecturer.
|
|
"I do not have enough time to go through all the details in only
|
|
one lecture. But I'll give you the basics, and an idea of what is possible.
|
|
Always remember: Just google what you need. You will find thousands of helpful tutorials
|
|
or questions on `Stack Overflow <https://stackoverflow.com>`_ right away.
|
|
Even experts will *constantly* seek help to find out which Git command to
|
|
use, and how to use it.", he reassures with a wink.
|
|
|
|
The basis of working with the history is to *look at it* with tools such
|
|
as :term:`tig`, :term:`gitk`, or simply the :gitcmd:`log` command.
|
|
The most important information in an entry (commit) in the history is
|
|
the :term:`shasum` (or hash) associated with it.
|
|
This hash is how dataset modifications in the history are identified,
|
|
and with this hash you can communicate with DataLad or :term:`Git` about these
|
|
modifications or version states [#f1]_.
|
|
Here is an excerpt from the ``DataLad-101`` history to show a
|
|
few abbreviated hashes of the 15 most recent commits [#f2]_:
|
|
|
|
.. runrecord:: _examples/DL-101-137-101
|
|
:workdir: dl-101/DataLad-101
|
|
:language: console
|
|
|
|
$ git log -15 --oneline
|
|
|
|
|
|
"I'll let you people direct this lecture", the guest lecturer proposes.
|
|
"You tell me what you would be interested in doing, and I'll show you how it's
|
|
done. For the rest of the lecture, call me Google!"
|
|
|
|
Fixing (empty) commit messages
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
From the back of the lecture hall comes a question you are really glad
|
|
someone asked: "It has happened to me that I accidentally did a
|
|
:dlcmd:`save` and forgot to specify the commit message,
|
|
how can I fix this?".
|
|
The room nods in agreement -- apparently, others have run into this
|
|
premature slip of the ``Enter`` key as well.
|
|
|
|
Let's demonstrate a simple example. First, let's create some random files.
|
|
Do this right in your dataset.
|
|
|
|
.. runrecord:: _examples/DL-101-137-102
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat << EOT > Gitjoke1.txt
|
|
Git knows what you did last summer!
|
|
EOT
|
|
|
|
$ cat << EOT > Gitjoke2.txt
|
|
Knock knock. Who's there? Git.
|
|
Git-who?
|
|
Sorry, 'who' is not a git command - did you mean 'show'?
|
|
EOT
|
|
|
|
$ cat << EOT > Gitjoke3.txt
|
|
In Soviet Russia, git commits YOU!
|
|
EOT
|
|
|
|
This will generate three new files in your dataset. Run a
|
|
:dlcmd:`status` to verify this:
|
|
|
|
.. runrecord:: _examples/DL-101-137-103
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
And now:
|
|
|
|
.. runrecord:: _examples/DL-101-137-104
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad save
|
|
|
|
Whooops! A :dlcmd:`save` without a
|
|
commit message that saved all of the files.
|
|
|
|
.. runrecord:: _examples/DL-101-137-105
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:emphasize-lines: 6
|
|
|
|
$ git log -p -1
|
|
|
|
As expected, all of the modifications present prior to the
|
|
command are saved into the most recent commit, and the commit
|
|
message DataLad provides by default, ``[DATALAD] Recorded changes``,
|
|
is not very helpful.
|
|
|
|
Changing the commit message of the most recent commit can be done with
|
|
the command :gitcmd:`commit --amend`. Running this command will open
|
|
an editor (the default, as configured in Git), and allow you
|
|
to change the commit message. Make sure to read the :ref:`find-out-more on changing other than the most recent commit <fom-rebase1>` in case you want to improve the commit message of more commits than only the latest.
|
|
|
|
Try running the :gitcmd:`commit --amend` command right now and give
|
|
the commit a new commit message (you can just delete the one created by
|
|
DataLad in the editor)!
|
|
|
|
.. index::
|
|
pair: save --amend; DataLad command
|
|
pair: add changes to previous commit; with DataLad
|
|
pair: change the last commit message; with DataLad
|
|
.. gitusernote:: 'git commit --amend' versus 'datalad save --amend'
|
|
|
|
Similar to ``git commit``, ``datalad save`` also has an ``--amend`` option.
|
|
Like its Git equivalent, it can be used to record changes not in a new, separate commit, but integrate them with the previously saved state.
|
|
Though this has not been the use case for ``git commit --amend`` here, experienced Git users will be accustomed to using ``git commit --amend`` to achieve something similar in their Git workflows.
|
|
In contrast to ``git commit --amend``, ``datalad save --amend`` will not open up an interactive editor to potentially change a commit message (unless the configuration ``datalad.save.no-message`` is set to ``interactive``), but a new commit message can be supplied with the ``-m``/``--message`` option.
|
|
|
|
|
|
.. index::
|
|
pair: change historical commit messages; with Git
|
|
pair: rebase; Git command
|
|
pair: rewrite history; with Git
|
|
.. find-out-more:: Changing the commit messages of not-the-most-recent commits
|
|
:name: fom-rebase1
|
|
:float:
|
|
|
|
The :gitcmd:`commit --amend` command will let you
|
|
rewrite the commit message of the most recent commit. If you
|
|
however need to rewrite commit messages of older commits, you
|
|
can do so during a so-called "interactive rebase". The command
|
|
for this is
|
|
|
|
.. code-block:: console
|
|
|
|
$ git rebase -i HEAD~N
|
|
|
|
where ``N`` specifies how far back you want to rewrite commits.
|
|
``git rebase -i HEAD~3``, for example, lets you apply changes to the
|
|
any number of commit messages within the last three commits.
|
|
|
|
Be aware that an interactive rebase lets you *rewrite* history.
|
|
This can lead to confusion or worse if the history you are rewriting
|
|
is shared with others, e.g., in a collaborative project. Be also aware
|
|
that rewriting history that is *pushed*/*published* (e.g., to GitHub)
|
|
will require a force-push!
|
|
|
|
Running this command gives you a list of the N most recent commits
|
|
in your text editor (which may be :term:`vim`!), sorted with
|
|
the most recent commit on the bottom.
|
|
This is how it may look like:
|
|
|
|
.. code-block:: bash
|
|
|
|
pick 8503f26 Add note on adding siblings
|
|
pick 23f0a52 add note on configurations and git config
|
|
pick c42cba4 add note on DataLad's procedures
|
|
|
|
# Rebase b259ce8..c42cba4 onto b259ce8 (3 commands)
|
|
#
|
|
# Commands:
|
|
# p, pick <commit> = use commit
|
|
# r, reword <commit> = use commit, but edit the commit message
|
|
# e, edit <commit> = use commit, but stop for amending
|
|
# s, squash <commit> = use commit, but meld into previous commit
|
|
# f, fixup <commit> = like "squash", but discard this commit's log message
|
|
# x, exec <command> = run command (the rest of the line) using shell
|
|
# b, break = stop here (continue rebase later with 'git rebase --continue')
|
|
# d, drop <commit> = remove commit
|
|
# l, label <label> = label current HEAD with a name
|
|
|
|
An interactive rebase allows to apply various modifying actions to any
|
|
number of commits in the list. Below the list are descriptions of these
|
|
different actions. Among them is "reword", which lets you "edit the commit
|
|
message". To apply this action and reword the top-most commit message in this list
|
|
(``8503f26 Add note on adding siblings``, three commits back in the history),
|
|
exchange the word ``pick`` in the beginning of the line with the word
|
|
``reword`` or simply ``r`` like this:
|
|
|
|
.. code-block:: bash
|
|
|
|
r 8503f26 Add note on adding siblings
|
|
|
|
If you want to reword more than one commit message, exchange several
|
|
``pick``\s. Any commit with the word ``pick`` at the beginning of the line will
|
|
be kept as is. Once you are done, save and close the editor. This will
|
|
sequentially open up a new editor for each commit you want to reword. In
|
|
it, you will be able to change the commit message. Save to proceed to
|
|
the next commit message until the rebase is complete.
|
|
But be careful not to delete any lines in the above editor view --
|
|
**An interactive rebase can be dangerous, and if you remove a line, this commit will be lost!**
|
|
|
|
.. index::
|
|
pair: stop content tracking; with Git
|
|
|
|
Untracking accidentally saved contents (tracked in Git)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The next question comes from the front:
|
|
"It happened that I forgot to give a path to the :dlcmd:`save`
|
|
command when I wanted to only start tracking a very specific file.
|
|
Other times I just didn't remember that
|
|
additional, untracked files existed in the dataset and saved unaware of
|
|
those. I know that it is good practice to only save
|
|
those changes together that belong together, so is there a way to
|
|
disentangle an accidental :dlcmd:`save` again?"
|
|
|
|
Let's say instead of saving *all three* previously untracked Git jokes
|
|
you intended to save *only one* of those files. What we
|
|
want to achieve is to keep all of the files and their contents
|
|
in the dataset, but get them out of the history into an
|
|
*untracked* state again, and save them *individually* afterwards.
|
|
|
|
.. importantnote:: Untracking is different for Git versus git-annex!
|
|
|
|
Note that this is a case with *text files* (stored in Git)! For
|
|
accidental annexing of files, please make sure to check out
|
|
the next paragraph!
|
|
|
|
This is a task for the :gitcmd:`reset` command. It essentially allows to
|
|
undo commits by resetting the history of a dataset to an earlier version.
|
|
:gitcmd:`reset` comes with several *modes* that determine the
|
|
exact behavior it, but the relevant one for this aim is ``--mixed`` [#f3]_.
|
|
Specifying the command:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git reset --mixed COMMIT
|
|
|
|
will preserve all changes made to files since the specified
|
|
commit in the dataset but remove them from the dataset's history.
|
|
This means all commits *since* ``COMMIT`` (but *not including* ``COMMIT``)
|
|
will not be in your history anymore and become "untracked files" or
|
|
"unsaved changes" instead. In other words, the modifications
|
|
you made in these commits that are "undone" will still be present
|
|
in your dataset -- just not written to the history anymore. Let's
|
|
try this to get a feel for it.
|
|
|
|
The COMMIT in the command can either be a hash or a reference
|
|
with the HEAD pointer.
|
|
|
|
.. index::
|
|
pair: branch; Git concept
|
|
pair: HEAD; Git concept
|
|
.. find-out-more:: Git terminology: branches and HEADs?
|
|
|
|
A Git repository (and thus any DataLad dataset) is built up as a tree of
|
|
commits. A *branch* is a named pointer (reference) to a commit, and allows you
|
|
to isolate developments. The default branch is called ``main``. ``HEAD`` is
|
|
a pointer to the branch you are currently on, and thus to the last commit
|
|
in the given branch.
|
|
|
|
.. image:: ../artwork/src/git_branch_HEAD.png
|
|
:width: 50%
|
|
|
|
Using ``HEAD``, you can identify the most recent commit, or count backwards
|
|
starting from the most recent commit. ``HEAD~1`` is the ancestor of the most
|
|
recent commit, i.e., one commit back (``f30ab`` in the figure above). Apart from
|
|
the notation ``HEAD~N``, there is also ``HEAD^N`` used to count backwards, but
|
|
`less frequently used and of importance primarily in the case of merge
|
|
commits <https://stackoverflow.com/q/2221658/10068927>`__.
|
|
|
|
Let's stay with the hash, and reset to the commit prior to saving the Git jokes.
|
|
|
|
First, find out the shasum, and afterwards, reset it.
|
|
|
|
.. runrecord:: _examples/DL-101-137-106
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 3 --oneline
|
|
|
|
.. runrecord:: _examples/DL-101-137-107
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:realcommand: echo "$ git reset --mixed $(git rev-parse HEAD~1)" && git reset --mixed $(git rev-parse HEAD~1)
|
|
|
|
Let's see what has happened. First, let's check the history:
|
|
|
|
.. runrecord:: _examples/DL-101-137-108
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 2 --oneline
|
|
|
|
As you can see, the commit in which the jokes were tracked
|
|
is not in the history anymore! Go on to see what :dlcmd:`status`
|
|
reports:
|
|
|
|
.. runrecord:: _examples/DL-101-137-109
|
|
:workdir: dl-101/DataLad-101
|
|
:language: console
|
|
|
|
$ datalad status
|
|
|
|
Nice, the files are present, and untracked again. Do they contain
|
|
the content still? We will read all of them with :shcmd:`cat`:
|
|
|
|
.. runrecord:: _examples/DL-101-137-110
|
|
:workdir: dl-101/DataLad-101
|
|
:language: console
|
|
|
|
$ cat Gitjoke*
|
|
|
|
Great. Now we can go ahead and save only the file we intended
|
|
to track:
|
|
|
|
.. runrecord:: _examples/DL-101-137-111
|
|
:workdir: dl-101/DataLad-101
|
|
:language: console
|
|
|
|
$ datalad save -m "save my favorite Git joke" Gitjoke2.txt
|
|
|
|
Finally, let's check how the history looks afterwards:
|
|
|
|
.. runrecord:: _examples/DL-101-137-112
|
|
:workdir: dl-101/DataLad-101
|
|
:language: console
|
|
|
|
$ git log -2
|
|
|
|
Wow! You have rewritten history [#f4]_!
|
|
|
|
.. index::
|
|
pair: stop content tracking; with git-annex
|
|
|
|
Untracking accidentally saved contents (stored in git-annex)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The previous :gitcmd:`reset` undid the tracking of *text* files.
|
|
However, those files are stored in Git, and thus their content
|
|
is also stored in Git. Files that are annexed, however, have
|
|
their content stored in git-annex, and not the file itself is stored
|
|
in the history, but a symlink pointing to the location of the file
|
|
content in the dataset's annex. This has consequences for
|
|
a :gitcmd:`reset` command: Reverting a save of a file that is
|
|
annexed would revert the save of the symlink into Git, but it will
|
|
not revert the *annexing* of the file.
|
|
Thus, what will be left in the dataset is an untracked symlink.
|
|
|
|
To undo an accidental save of that annexed a file, the annexed file
|
|
has to be "unlocked" first with a :dlcmd:`unlock` command.
|
|
|
|
We will simulate such a situation by creating a PDF file that
|
|
gets annexed with an accidental :dlcmd:`save`:
|
|
|
|
.. runrecord:: _examples/DL-101-137-113
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ # create an empty pdf file
|
|
$ convert xc:none -page Letter apdffile.pdf
|
|
$ # accidentally save it
|
|
$ datalad save
|
|
|
|
This accidental :dlcmd:`save` has thus added both text files
|
|
stored in Git, but also a PDF file to the history of the dataset.
|
|
As an :shcmd:`ls -l` reveals, the PDF file has been annexed and is
|
|
thus a :term:`symlink`:
|
|
|
|
.. runrecord:: _examples/DL-101-137-114
|
|
:language: console
|
|
:realcommand: ls -l --time-style=long-iso apdffile.pdf
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ ls -l apdffile.pdf
|
|
|
|
Prior to resetting, the PDF file has to be unannexed.
|
|
To unannex files, i.e., get the contents out of the object tree,
|
|
the :dlcmd:`unlock` command is relevant:
|
|
|
|
.. runrecord:: _examples/DL-101-137-115
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad unlock apdffile.pdf
|
|
|
|
The file is now no longer symlinked:
|
|
|
|
.. runrecord:: _examples/DL-101-137-116
|
|
:language: console
|
|
:realcommand: ls -l --time-style=long-iso apdffile.pdf
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ ls -l apdffile.pdf
|
|
|
|
Finally, :gitcmd:`reset --mixed` can be used to revert the
|
|
accidental :dlcmd:`save`. Again, find out the shasum first, and
|
|
afterwards, reset it.
|
|
|
|
.. runrecord:: _examples/DL-101-137-117
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 3 --oneline
|
|
|
|
.. runrecord:: _examples/DL-101-137-118
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:realcommand: echo "$ git reset --mixed $(git rev-parse HEAD~1)" && git reset --mixed $(git rev-parse HEAD~1)
|
|
|
|
To see what has happened, let's check the history:
|
|
|
|
.. runrecord:: _examples/DL-101-137-119
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 2 --oneline
|
|
|
|
... and also the status of the dataset:
|
|
|
|
.. runrecord:: _examples/DL-101-137-120
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
The accidental save has been undone, and the file is present
|
|
as untracked content again.
|
|
As before, this action has not been recorded in your history.
|
|
|
|
Viewing previous versions of files and datasets
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The next question is truly magical: How does one *see*
|
|
data as it was at a previous state in history?
|
|
|
|
This magic trick can be performed with the :gitcmd:`checkout`.
|
|
It is a very heavily used command for various tasks, but among
|
|
many it can send you back in time to view the state of a dataset
|
|
at the time of a specific commit.
|
|
|
|
Let's say you want to find out which notes you took in the first
|
|
few chapters of the handbook. Find a commit :term:`shasum` in your history
|
|
to specify the point in time you want to go back to:
|
|
|
|
.. runrecord:: _examples/DL-101-137-121
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 16 --oneline
|
|
|
|
Let's go 15 commits back in time:
|
|
|
|
.. runrecord:: _examples/DL-101-137-122
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:realcommand: echo "$ git checkout $(git rev-parse HEAD~15)" && git checkout $(git rev-parse HEAD~15)
|
|
|
|
How did your ``notes.txt`` file look at this point?
|
|
|
|
.. runrecord:: _examples/DL-101-137-123
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ tail notes.txt
|
|
|
|
Neat, isn't it? By checking out a commit shasum you can explore a previous
|
|
state of a datasets history. And this does not only apply to simple text
|
|
files, but every type of file in your dataset, regardless of size.
|
|
The checkout command however led to something that Git calls a "detached HEAD state".
|
|
While this sounds scary, a :gitcmd:`checkout main` will bring you
|
|
back into the most recent version of your dataset and get you out of the
|
|
"detached HEAD state":
|
|
|
|
.. runrecord:: _examples/DL-101-137-124
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git checkout main
|
|
|
|
|
|
Note one very important thing: The previously untracked files are still
|
|
there.
|
|
|
|
.. runrecord:: _examples/DL-101-137-125
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
The contents of ``notes.txt`` will now be the most recent version again:
|
|
|
|
.. runrecord:: _examples/DL-101-137-126
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ tail notes.txt
|
|
|
|
... Wow! You traveled back and forth in time!
|
|
But an even more magical way to see the contents of files in previous
|
|
versions is Git's :shcmd:`cat-file` command: Among many other things, it lets
|
|
you read a file's contents as of any point in time in the history, without a
|
|
prior :gitcmd:`checkout` (note that the output is shortened for brevity and shows only the last few lines of the file):
|
|
|
|
.. runrecord:: _examples/DL-101-137-127
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:lines: 2, 48-
|
|
:realcommand: echo "$ git cat-file --textconv $(git rev-parse HEAD~15):notes.txt" && git cat-file --textconv $(git rev-parse HEAD~15):notes.txt
|
|
|
|
.. index::
|
|
pair: cat-file; Git command
|
|
|
|
The cat-file command is very versatile, and
|
|
`it's documentation <https://git-scm.com/docs/git-cat-file>`_ will list all
|
|
of its functionality. To use it to see the contents of a file at a previous
|
|
state as done above, this is how the general structure looks like:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git cat-file --textconv SHASUM:<path/to/file>
|
|
|
|
Undoing latest modifications of files
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Previously, we saw how to remove files from a datasets history that
|
|
were accidentally saved and thus tracked for the first time.
|
|
How does one undo a *modification* to a tracked file?
|
|
|
|
Let's modify the saved ``Gitjoke1.txt``:
|
|
|
|
.. runrecord:: _examples/DL-101-137-128
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ echo "this is by far my favorite joke!" >> Gitjoke2.txt
|
|
|
|
.. runrecord:: _examples/DL-101-137-129
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat Gitjoke2.txt
|
|
|
|
.. runrecord:: _examples/DL-101-137-130
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
.. runrecord:: _examples/DL-101-137-131
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad save -m "add joke evaluation to joke" Gitjoke2.txt
|
|
|
|
How could this modification to ``Gitjoke2.txt`` be undone?
|
|
With the :gitcmd:`reset` command again. If you want to
|
|
"unsave" the modification but keep it in the file, use
|
|
:gitcmd:`reset --mixed` as before. However, if you want to
|
|
get rid of the modifications entirely, use the option ``--hard``
|
|
instead of ``--mixed``:
|
|
|
|
.. runrecord:: _examples/DL-101-137-132
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git log -n 2 --oneline
|
|
|
|
.. runrecord:: _examples/DL-101-137-133
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:realcommand: echo "$ git reset --hard $(git rev-parse HEAD~1)" && git reset --hard $(git rev-parse HEAD~1)
|
|
|
|
.. runrecord:: _examples/DL-101-137-134
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat Gitjoke2.txt
|
|
|
|
The change has been undone completely. This method will work with
|
|
files stored in Git and annexed files.
|
|
|
|
Note that this operation only restores this one file, because the commit that
|
|
was undone only contained modifications to this one file. This is a
|
|
demonstration of one of the reasons why one should strive for commits to
|
|
represent meaningful logical units of change -- if necessary, they can be
|
|
undone easily.
|
|
|
|
Undoing past modifications of files
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
What :gitcmd:`reset` did was to undo commits from
|
|
the most recent version of your dataset. How
|
|
would one undo a change that happened a while ago, though,
|
|
with important changes being added afterwards that you want
|
|
to keep?
|
|
|
|
Let's save a bad modification to ``Gitjoke2.txt``,
|
|
but also a modification to ``notes.txt``:
|
|
|
|
.. runrecord:: _examples/DL-101-137-140
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ echo "bad modification" >> Gitjoke2.txt
|
|
|
|
.. runrecord:: _examples/DL-101-137-141
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad save -m "did a bad modification" Gitjoke2.txt
|
|
|
|
.. runrecord:: _examples/DL-101-137-142
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat << EOT >> notes.txt
|
|
|
|
Git has many handy tools to go back in forth in time and work with the
|
|
history of datasets. Among many other things you can rewrite commit
|
|
messages, undo changes, or look at previous versions of datasets.
|
|
A superb resource to find out more about this and practice such Git
|
|
operations is this chapter in the Pro-git book:
|
|
https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History
|
|
EOT
|
|
|
|
.. runrecord:: _examples/DL-101-137-143
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad save -m "add note on helpful git resource" notes.txt
|
|
|
|
The objective is to remove the first, "bad" modification, but
|
|
keep the more recent modification of ``notes.txt``. A :gitcmd:`reset`
|
|
command is not convenient, because resetting would need to reset
|
|
the most recent, "good" modification as well.
|
|
|
|
One way to accomplish it is with an *interactive rebase*, using the
|
|
:gitcmd:`rebase -i` command [#f5]_. Experienced Git-users will know
|
|
under which situations and how to perform such an interactive rebase.
|
|
|
|
However, outlining an interactive rebase here in the handbook could lead to
|
|
problems for readers without (much) Git experience: An interactive rebase,
|
|
even if performed successfully, can lead to many problems if it is applied with
|
|
too little experience, for example, in any collaborative real-world project.
|
|
|
|
.. index::
|
|
pair: revert; Git command
|
|
|
|
Instead, we demonstrate a different, less intrusive way to revert one or more
|
|
changes at any point in the history of a dataset: the :gitcmd:`revert`
|
|
command.
|
|
Instead of *rewriting* the history, it will add an additional commit in which
|
|
the changes of an unwanted commit are reverted.
|
|
|
|
The command looks like this:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git revert SHASUM
|
|
|
|
where ``SHASUM`` specifies the commit hash of the modification that should
|
|
be reverted.
|
|
|
|
.. index::
|
|
pair: revert multiple commit; with Git
|
|
.. find-out-more:: Reverting more than a single commit
|
|
|
|
You can also specify a range of commits like this:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git revert OLDER_SHASUM..NEWERSHASUM
|
|
|
|
This command will revert all commits starting with the one after
|
|
``OLDER_SHASUM`` (i.e. **not including** this commit) until and **including**
|
|
the one specified with ``NEWERSHASUM``.
|
|
For each reverted commit, one new commit will be added to the history that
|
|
reverts it. Thus, if you revert a range of three commits, there will be three
|
|
reversal commits. If you however want the reversal of a range of commits
|
|
saved in a single commit, supply the ``--no-commit`` option as in
|
|
|
|
.. code-block:: console
|
|
|
|
$ git revert --no-commit OLDER_SHASUM..NEWERSHASUM
|
|
|
|
After running this command, run a single ``git commit`` to conclude the
|
|
reversal and save it in a single commit.
|
|
|
|
Let's see how it looks like:
|
|
|
|
.. runrecord:: _examples/DL-101-137-144
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:realcommand: echo "$ git revert $(git rev-parse HEAD~1)" && git revert $(git rev-parse HEAD~1)
|
|
|
|
This is the state of the file in which we reverted a modification:
|
|
|
|
.. runrecord:: _examples/DL-101-137-145
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ cat Gitjoke2.txt
|
|
|
|
It does not contain the bad modification anymore. And this is what happened in
|
|
the history of the dataset:
|
|
|
|
.. runrecord:: _examples/DL-101-137-146
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
:emphasize-lines: 6-8, 20
|
|
|
|
$ git log -n 3
|
|
|
|
The commit that introduced the bad modification is still present, but it
|
|
transparently gets undone with the most recent commit. At the same time, the
|
|
good modification of ``notes.txt`` was not influenced in any way. The
|
|
:gitcmd:`revert` command is thus a transparent and safe way of undoing past
|
|
changes. Note though that this command can only be used efficiently if the
|
|
commits in your datasets history are meaningful, independent units -- having
|
|
several unrelated modifications in a single commit may make an easy solution
|
|
with :gitcmd:`revert` impossible and instead require a complex
|
|
:shcmd:`checkout`, :shcmd:`revert`, or :shcmd:`rebase` operation.
|
|
|
|
Finally, let's take a look at the state of the dataset after this operation:
|
|
|
|
.. runrecord:: _examples/DL-101-137-147
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
As you can see, unsurprisingly, the :gitcmd:`revert` command had no
|
|
effects on anything else but the specified commit, and previously untracked
|
|
files are still present.
|
|
|
|
.. index::
|
|
pair: resolve merge conflict; with Git
|
|
|
|
Oh no! I'm in a merge conflict!
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When working with the history of a dataset, especially when rewriting
|
|
the history with an interactive rebase or when reverting commits, it is
|
|
possible to run into so-called *merge conflicts*.
|
|
Merge conflicts happen when Git needs assistance in deciding
|
|
which changes to keep and which to apply. It will require
|
|
you to edit the file the merge conflict is happening in with
|
|
a text editor, but such merge conflict are by far not as scary as
|
|
they may seem during the first few times of solving merge conflicts.
|
|
|
|
This section is not a guide on how to solve merge-conflicts, but a broad
|
|
overview on the necessary steps, and a pointer to a more comprehensive guide.
|
|
|
|
- The first thing to do if you end up in a merge conflict is
|
|
to read the instructions Git is giving you -- they are a useful guide.
|
|
- Also, it is reassuring to remember that you can always get out of
|
|
a merge conflict by aborting the operation that led to it (e.g.,
|
|
``git rebase --abort``).
|
|
- To actually solve a merge conflict, you will have to edit files: In the
|
|
documents the merge conflict applies to, Git marks the sections it needs
|
|
help with with markers that consists of ``>``, ``<``, and ``=``
|
|
signs and commit shasums or branch names.
|
|
There will be two marked parts, and you have to delete the one you do not
|
|
want to keep, as well as all markers.
|
|
- Afterwards, run ``git add <path/to/file>`` and finally a ``git commit``.
|
|
|
|
GitHub has an `excellent resource on how to deal with merge conflicts <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line>`_.
|
|
|
|
Summary
|
|
^^^^^^^
|
|
|
|
This guest lecture has given you a glimpse into how to work with the
|
|
history of your DataLad datasets.
|
|
To conclude this section, let's remove all untracked contents from
|
|
the dataset. This can be done with :gitcmd:`clean`: The command
|
|
:gitcmd:`clean -f` swipes your dataset clean and removes any untracked
|
|
file.
|
|
**Careful! This is not revertible, and content lost with this commands cannot be recovered!**
|
|
If you want to be extra sure, run :gitcmd:`clean -fn` beforehand -- this will
|
|
give you a list of the files that would be deleted.
|
|
|
|
.. runrecord:: _examples/DL-101-137-148
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git clean -f
|
|
|
|
Afterwards, the :dlcmd:`status` returns nothing, indicating a
|
|
clean dataset state with no untracked files or modifications.
|
|
|
|
.. runrecord:: _examples/DL-101-137-149
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ datalad status
|
|
|
|
Finally, if you want, apply your new knowledge about reverting commits
|
|
to remove the ``Gitjoke2.txt`` file.
|
|
|
|
|
|
.. only:: adminmode
|
|
|
|
Add a tag at the section end.
|
|
|
|
.. runrecord:: _examples/DL-101-137-160
|
|
:language: console
|
|
:workdir: dl-101/DataLad-101
|
|
|
|
$ git branch sct_back_and_forth_in_time
|
|
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#f1] For example, the :dlcmd:`rerun` command introduced in section
|
|
:ref:`run2` takes such a hash as an argument, and re-executes
|
|
the ``datalad run`` or ``datalad rerun`` :term:`run record` associated with
|
|
this hash. Likewise, the :gitcmd:`diff` command can work with commit hashes.
|
|
|
|
.. [#f2] There are other alternatives to reference commits in the history of a dataset,
|
|
for example, "counting" ancestors of the most recent commit using the notation
|
|
``HEAD~2``, ``HEAD^2`` or ``HEAD@{2}``. However, using hashes to reference
|
|
commits is a very fail-save method and saves you from accidentally miscounting.
|
|
|
|
.. [#f3] The option ``--mixed`` is the default mode for a :gitcmd:`reset`
|
|
command, omitting it (i.e., running just ``git reset``) leads to the
|
|
same behavior. It is explicitly stated in this book to make the mode
|
|
clear, though.
|
|
|
|
.. [#f4] Note though that rewriting history can be dangerous, and you should
|
|
be aware of what you are doing. For example, rewriting parts of the
|
|
dataset's history that have been published (e.g., to a GitHub repository)
|
|
already or that other people have copies of, is not advised.
|
|
|
|
.. [#f5] When in need to interactively rebase, please consult further documentation
|
|
and tutorials. It is out of the scope of this handbook to be a complete
|
|
guide on rebasing, and not all interactive rebasing operations are
|
|
complication-free. However, you can always undo mistakes that occur
|
|
during rebasing with the help of the `reflog <https://git-scm.com/docs/git-reflog>`_.
|