Directory Versioning
The three cardinal virtues of a master technologist
are: laziness, impatience, and hubris." —Larry
Wall
This describes some of the theoretical pitfalls around the
(possibly arrogant) notion that one can simply version
directories just as one versions files.
Directory Revisions
To begin, recall that the Subversion repository is an array
of trees. Each tree represents the application of a new atomic
commit, and is called a revision. This
is very different from a CVS repository, which stores file
histories in a collection of RCS files (and doesn't track
tree-structure.)
So when we refer to revision 4 of
foo.c
(written
foo.c:4) in CVS, this means the fourth
distinct version of foo.c—but in
Subversion this means the version of
foo.c in the fourth revision
(tree)
. It's quite possible that
foo.c has never changed at all since
revision 1! In other words, in Subversion, different revision
numbers of the same versioned item do not
imply different contents.
Nevertheless, the content of foo.c:4
is still well-defined. The file foo.c in
revision 4 has specific text and properties.
Suppose, now, that we extend this concept to directories.
If we have a directory DIR, define
DIR:N to be the directory DIR in the
fourth revision.
The contents are defined to be a
particular set of directory entries (dirents)
and properties.
So far, so good. The concept of versioning directories
seems fine in the repository—the repository is very
theoretically pure anyway. However, because working copies
allow mixed revisions, it's easy to create problematic
use-cases.
The Lagging Directory
The Problem
This is the first part of the Greg
Hudson
problem, so named because he was the first
one to bring it up and define it well. :-)
Suppose our working copy has directory
DIR:1 containing file
foo:1, along with some other files. We
remove foo and commit.
Already, we have a problem: our working copy still claims
to have DIR:1. But on the repository,
revision 1 of DIR is
defined to contain
foo—and our working copy
DIR clearly does not have it anymore.
How can we truthfully say that we still have
DIR:1?
One answer is to force DIR to be
updated when we commit foo's deletion.
Assuming that our commit created revision 2, we would
immediately update our working copy to
DIR:2. Then the client and server would
both agree that DIR:2 does not contain
foo, and that DIR:2 is indeed exactly
what is in the working copy.
This solution has nasty, un-user-friendly side effects,
though. It's likely that other people may have committed
before us, possibly adding new properties to
DIR, or adding a new file
bar. Now pretend our committed deletion
creates revision 5 in the repository. If we instantly update
our local DIR to 5, that means
unexpectedly receiving a copy of bar and
some new propchanges. This clearly violates a UI principle:
``the client will never change your working copy until you ask
it to.'' Committing changes to the repository is a
server-write operation only; it should
not modify your working data!
Another solution is to do the naive thing: after
committing the deletion of foo, simply
stop tracking the file in the .svn
administrative directory. The client then loses all knowledge
of the file.
But this doesn't work either: if we now update our working
copy, the communication between client and server is
incorrect. The client still believes that it has
DIR:1—which is false, since a
true
DIR:1 contains
foo. The client gives this incorrect
report to the repository, and the repository decides that in
order to update to revision 2, foo must
be deleted. Thus the repository sends a bogus (or at least
unnecessary) deletion command.
The Solution
After deleting foo and committing,
the file is not totally forgotten by the
.svn directory. While the file is no
longer considered to be under version control, it is still
secretly remembered as having been
deleted
.
When the user updates the working copy, the client
correctly informs the server that the file is already missing
from its local DIR:1; therefore the
repository doesn't try to re-delete it when patching the
client up to revision 2.
Note to developers
How the deleted
flag works under the hood.
The svn status command won't
display a deleted item, unless you make the deleted item
the specific target of status.
When a deleted item's parent is updated, one of two
things will happen:
The repository will re-add the item, thereby
overwriting the entire entry. (no more
deleted
flag)
The repository will say nothing about the item,
which means that it's fully aware that your item is
gone, and this is the correct state to be in. In
this case, the entire entry is removed. (no more
deleted
flag)
If a user schedules an item for addition that has
the same name as a deleted
entry, then
entry will have both flags simultaneously. This is
perfectly fine:
The commit-crawler will notice both flags and
do a delete() and then an
add(). This ensures that the
transaction is built correctly. (without the
delete(), the
add() would be on top of an
already-existing item.)
When the commit completes, the client rewrites
the entry as normal. (no more
deleted
flag)
The Overeager Directory
This is the 2nd part of the Greg
Hudson
problem.
The Problem
Again, suppose our working copy has directory
DIR:1 containing file
foo:1, along with some other files.
Now, unbeknownst to us, somebody else adds a new file
bar to this directory, creating revision
2 (and DIR:2).
Now we add a property to DIR and
commit, which creates revision 3. Our working-copy
DIR is now marked as being at revision
3.
Of course, this is false; our working copy does
not have DIR:3,
because the true
DIR:3 on
the repository contains the new file bar.
Our working copy has no knowledge of bar
at all.
Again, we can't follow our commit of
DIR with an automatic update (and
addition of bar). As mentioned
previously, commits are a one-way write operation; they must
not change working copy data.
The Solution
Let's enumerate exactly those times when a directory's
local revision number changes:
When a directory is updated:
If the directory is either the direct target of an
update command, or is a child of an updated directory,
it will be bumped (along with many other siblings and
children) to a uniform revision number.
When a directory is committed:
A directory can only be considered a
committed object
if it has a new property
change. (Otherwise, to commit a
directory
really implies that its modified
children are being committed, and only such children
will have local revisions bumped.)
In this light, it's clear that our overeager
directory
problem only happens in the second
situation—those times when we're committing directory
propchanges.
Thus the answer is simply not to allow property-commits on
directories that are out-of-date. It sounds a bit
restrictive, but there's no other way to keep directory
revisions accurate.
Note to developers
This restriction is enforced by the filesystem merge()
routine.
Once merge() has established that
{ancestor, source, target} are all different node-rev-ids,
it examines the property-keys of ancestor and target. If
they're different, it returns a
conflict error.
User Impact
Really, the Subversion client seems to have two
difficult—almost contradictory—goals.
First, it needs to make the user experience friendly, which
generally means being a bit sloppy
about deciding
what a user can or cannot do. This is why it allows
mixed-revision working copies, and why it tries to let users
execute local tree-changing operations (delete, add, move, copy)
in situations that aren't always perfectly, theoretically
safe
or pure.
Second, the client tries to keep the working copy in
correctly in sync with the repository using as little
communication as possible. Of course, this is made much harder
by the first goal!
So in the end, there's a tension here, and the resolutions
to problems can vary. In one case (the lagging
directory
), the problem can be solved through a bit of
clever entry tracking in the client. In the other case
(the overeager directory
), the only solution is
to restrict some of the theoretical laxness allowed by the
client.