Directory Versioning

The three cardinal virtues of a master technologist are: laziness, impatience, and hubris." —Larry Wall

This describes some of the theoretical pitfalls around the (possibly arrogant) notion that one can simply version directories just as one versions files. Directory Revisions To begin, recall that the Subversion repository is an array of trees. Each tree represents the application of a new atomic commit, and is called a revision. This is very different from a CVS repository, which stores file histories in a collection of RCS files (and doesn't track tree-structure.) So when we refer to revision 4 of foo.c (written foo.c:4) in CVS, this means the fourth distinct version of foo.c—but in Subversion this means the version of foo.c in the fourth revision (tree). It's quite possible that foo.c has never changed at all since revision 1! In other words, in Subversion, different revision numbers of the same versioned item do not imply different contents. Nevertheless, the content of foo.c:4 is still well-defined. The file foo.c in revision 4 has specific text and properties. Suppose, now, that we extend this concept to directories. If we have a directory DIR, define DIR:N to be the directory DIR in the fourth revision. The contents are defined to be a particular set of directory entries (dirents) and properties. So far, so good. The concept of versioning directories seems fine in the repository—the repository is very theoretically pure anyway. However, because working copies allow mixed revisions, it's easy to create problematic use-cases. The Lagging Directory The Problem This is the first part of the Greg Hudson problem, so named because he was the first one to bring it up and define it well. :-) Suppose our working copy has directory DIR:1 containing file foo:1, along with some other files. We remove foo and commit. Already, we have a problem: our working copy still claims to have DIR:1. But on the repository, revision 1 of DIR is defined to contain foo—and our working copy DIR clearly does not have it anymore. How can we truthfully say that we still have DIR:1? One answer is to force DIR to be updated when we commit foo's deletion. Assuming that our commit created revision 2, we would immediately update our working copy to DIR:2. Then the client and server would both agree that DIR:2 does not contain foo, and that DIR:2 is indeed exactly what is in the working copy. This solution has nasty, un-user-friendly side effects, though. It's likely that other people may have committed before us, possibly adding new properties to DIR, or adding a new file bar. Now pretend our committed deletion creates revision 5 in the repository. If we instantly update our local DIR to 5, that means unexpectedly receiving a copy of bar and some new propchanges. This clearly violates a UI principle: ``the client will never change your working copy until you ask it to.'' Committing changes to the repository is a server-write operation only; it should not modify your working data! Another solution is to do the naive thing: after committing the deletion of foo, simply stop tracking the file in the .svn administrative directory. The client then loses all knowledge of the file. But this doesn't work either: if we now update our working copy, the communication between client and server is incorrect. The client still believes that it has DIR:1—which is false, since a true DIR:1 contains foo. The client gives this incorrect report to the repository, and the repository decides that in order to update to revision 2, foo must be deleted. Thus the repository sends a bogus (or at least unnecessary) deletion command. The Solution After deleting foo and committing, the file is not totally forgotten by the .svn directory. While the file is no longer considered to be under version control, it is still secretly remembered as having been deleted. When the user updates the working copy, the client correctly informs the server that the file is already missing from its local DIR:1; therefore the repository doesn't try to re-delete it when patching the client up to revision 2. Note to developers How the deleted flag works under the hood. The svn status command won't display a deleted item, unless you make the deleted item the specific target of status. When a deleted item's parent is updated, one of two things will happen: The repository will re-add the item, thereby overwriting the entire entry. (no more deleted flag) The repository will say nothing about the item, which means that it's fully aware that your item is gone, and this is the correct state to be in. In this case, the entire entry is removed. (no more deleted flag) If a user schedules an item for addition that has the same name as a deleted entry, then entry will have both flags simultaneously. This is perfectly fine: The commit-crawler will notice both flags and do a delete() and then an add(). This ensures that the transaction is built correctly. (without the delete(), the add() would be on top of an already-existing item.) When the commit completes, the client rewrites the entry as normal. (no more deleted flag) The Overeager Directory This is the 2nd part of the Greg Hudson problem. The Problem Again, suppose our working copy has directory DIR:1 containing file foo:1, along with some other files. Now, unbeknownst to us, somebody else adds a new file bar to this directory, creating revision 2 (and DIR:2). Now we add a property to DIR and commit, which creates revision 3. Our working-copy DIR is now marked as being at revision 3. Of course, this is false; our working copy does not have DIR:3, because the true DIR:3 on the repository contains the new file bar. Our working copy has no knowledge of bar at all. Again, we can't follow our commit of DIR with an automatic update (and addition of bar). As mentioned previously, commits are a one-way write operation; they must not change working copy data. The Solution Let's enumerate exactly those times when a directory's local revision number changes: When a directory is updated: If the directory is either the direct target of an update command, or is a child of an updated directory, it will be bumped (along with many other siblings and children) to a uniform revision number. When a directory is committed: A directory can only be considered a committed object if it has a new property change. (Otherwise, to commit a directory really implies that its modified children are being committed, and only such children will have local revisions bumped.) In this light, it's clear that our overeager directory problem only happens in the second situation—those times when we're committing directory propchanges. Thus the answer is simply not to allow property-commits on directories that are out-of-date. It sounds a bit restrictive, but there's no other way to keep directory revisions accurate. Note to developers This restriction is enforced by the filesystem merge() routine. Once merge() has established that {ancestor, source, target} are all different node-rev-ids, it examines the property-keys of ancestor and target. If they're different, it returns a conflict error. User Impact Really, the Subversion client seems to have two difficult—almost contradictory—goals. First, it needs to make the user experience friendly, which generally means being a bit sloppy about deciding what a user can or cannot do. This is why it allows mixed-revision working copies, and why it tries to let users execute local tree-changing operations (delete, add, move, copy) in situations that aren't always perfectly, theoretically safe or pure. Second, the client tries to keep the working copy in correctly in sync with the repository using as little communication as possible. Of course, this is made much harder by the first goal! So in the end, there's a tension here, and the resolutions to problems can vary. In one case (the lagging directory), the problem can be solved through a bit of clever entry tracking in the client. In the other case (the overeager directory), the only solution is to restrict some of the theoretical laxness allowed by the client.