Auto-versioning Research Notes ============================== [Note from sussman: if you don't understand rfc 2518 (webdav) and rfc 3253 (deltav) intimately, you'll probably not understand these notes. Read the rfcs, and also read the 'webdav-general-summary' notes in this directory as a quick review.] Phase 1: a lone PUT results in an immediate commit. This can be done purely via libsvn_fs, using an auto-generated log message. This covers the "drag-n-drop" use-case -- when a user simply drops a file into a mounted repository. Phase 2: come up with a system for dealing with the more common class-2 DAV sequence: LOCK, GET, PUT, PUT, PUT, UNLOCK. This covers most DAV clients, such as MSOffice and OpenOffice. On first glance, it seems that Phase 1 should be doable by simply noticing a PUT on a public URI, and triggering a commit. But apparently this completely circumvents the fact that mod_dav *already* has a notion of auto-versioning, and we want to mesh with that. This feature was added by the Rational guys, but isn't well-reviewed by gstein. Apparently mod_dav defines a concept of whether resources are auto-versionable, and then deals with the checkout/modify/checkin of those resources. So *first* we need to understand the existing system before we can do anything else, and figure out how mod_dav_svn can act as a "provider" to that framework. (Greg also warns: this autoversioning feature added by Rational was done based on an OLD version of the deltaV RFC, so watch out for mismatches with the final RFC 3253.) [gstein sez: Note: the reason for the auto-versioning framework is to take the load off of the provider for modeling WebDAV's auto-vsn concepts to clients. mod_dav itself can deal with the property management, sequence of operations, error responses, whatnot. That said, it is also open to change and refinement -- there is no way that it is set in stone. That only happens once an Open Source implementation has used it.] Phase 2 is more complicated: * Greg proposed a system whereby the LOCK creates a txn, the PUTs only write to the txn (the txn name is the lock "token"), and the UNLOCK commits the txn. The problem with this is that DAV clients expect real locking here, and this is just a "fake out": - If client #1 LOCKS a file, then when client #2 does a GET, they should see the latest version that client #1 has PUT, not some older version. [gstein sez he doesn't believe that the GET sans locktoken has to reflect the latest PUT-with-locktoken. I disagree. See below for a response from the DeltaV IETF Working Group] - Also, if client #2 tries to work on the file, its LOCK request should be denied if it's already locked. Users will be mighty pissed if they get a LOCK on the file, but when they finally close MSWord, they get an out-of-date error! [gstein sez this is only if we take an exclusive lock. shared locks are more interesting. I say, yah, but so what. We only care about write-locks anyway, which according to 2518, are always exclusive, I think. shared-locks are just read-locks, and can be done with unversioned props.] * It seems that the Right Way to do this is to actually design and implement some kind of locking system. We've had a huuuuge discussion on the dev list about this, and folks like jimb and kfogel want the system to be more of a "communication" system, rather than a system for unconditionally handcuffing naughty users. This goal doesn't necessarily contradict the needs of DAV clients, however. Smart svn clients should be able to easily override a LOCK failure, perhaps by using some special 'Force: true' request header. Dumb DAV clients won't know about this technique, so they effectively end up with the 'handcuff' locking system they expect. [brane sez: Exclusive and shared lcoks can both be used for communication, and which one you use depends on context -- see below.] ---------------------------------------------------------------- I sent a mail off to the deltaV working group, asking about the locking issue. Geoff Clemm came back and said, "yah, if a lock-holder does a PUT to a locked resource, then the changes should be immediately visible to *all* users who do a GET, whether they hold the lock token or not." This is my (sussman)'s intuition too, but it throws a big wrench into gstein's proposal about how to do Phase 2. [brane sez: Not really. All you have to do is maintain a list of the public URLs of objects that were actually modified through a "locked" PUT -- *not* the bubble-up dirs -- and you have to maintain that anyway, if you want to implement exclusive locks. A GET will just check that list first, and if it finds the URL, look into the associated txn instead of HEAD.] [ gstein: note that list is cross-txn; we probably want a new dbm in the REPOS/dav/ subdir. map the repos path (derived from the URL) to the txn-name containing the most recent copy. my hope was to avoid additional state like this, and encode that state in something like the locktoken. ] ---------------------------------------------------------------- Here are some thoughts Bill Tutt and I shared on IRC some time ago. They're more about locking than auto-versioning, but the two concepts are related, so this brain dump might as well go in here. <<>> ----- From: "Bill Tutt" To: "Branko Cibej" Subject: Locks Discussion Date: Wed, 4 Sep 2002 15:49:54 -0700 Edited from IRC: "svn edit" has other uses, too e.g., you could check out a wc that has only checksums, not text bases, and makes wc files read-only. "svn edit" would make them writable, and temporarily store the text base. it doesn't have to cerate a lock. "svn edit" can be completely client-side. It could, but ideally it would just work as if it were connected. i.e. executing "svn note" if connected, and not if not. i.e. laptop on bus mode. basically, you're non-exclusive lock would add an unversioned annotation to an object. ok. so we have "svn lock", which is an exclusive lock and "svn edit", which may or may not create locks At a minimum annotates the file in the WC, for the "svn commit" default log message case below. At the far out end, it would create an exclusive lock if the file (via the pluggable diff protocol) was determined to be non-mergable. and "svn note", which just adds a note to the object and "svn lock" can also add a note to the object and "svn unlock" takes the note away and "svn rmnote" takes the note away, too and "svn commit" clears locks and removes notes and "svn commit" uses the note (if any, keyed off the username) as the default log message "svn note" and "svn rmnote", always contacts the server "svn revert" now becomes "svn revert" + "svn rmnote" all rolled into one. "svn rmnote" undos (as appropriate) any annotation on a WC entry. If created via "svn note" functionality, then the server is contacted. If via "svn edit" disconnected client functionality, then the server is NOT contacted. I've edited out my original comments, and inserted my own post log comments. Bill ---- Do you want a dangerous fugitive staying in your flat? No. Well, don't upset him and he'll be a nice fugitive staying in your flat. ----- ----------------------------------------------- PHASE 1 STRATEGY: * ? options response includes autoversioning feature... required? * all resources gain new live property: 'DAV:auto-version'. This property will always be set to 'DAV:checkout-checkin'. (There are four possible values, and this is the one that has nothing whatsoever to do with locking.) * use-case 1: PUT or PROPPATCH against existing VCR, or a PUT of a new VCR. * use-case 2: DELETE of VCR * use-case 3: MKCOL (totally new, by definition) ----------------------------------------------------------- Analysis of dav_svn_put() ========================= At the moment, ra_dav is only attempting to PUT WR's. mod_dav, however, already has an autoversioning infrastructure, and it currently attempts to bookend the stream-writing with an auto-checkout and auto-checkin. But mod_dav_svn doesn't support those operations yet, so they're just no-ops. By supporting auto_checkout and auto_checkin, we're adding the magic ability for a PUT on a VCR to happen: the VCR is magically transformed 'in place' into a WR, and then back again. auto_checkout: * tries to checkout parent resource if deemed necessary, i.e. the resource doesn't exist, or if explicit parent checkout was requested by caller: - vsn_hooks->auto_versionable() We should *always* return DAV_AUTO_VERSION_ALWAYS for now. The other values require that locks exist or not, and we're not supporting any kind of locks yet. - vsn_hooks->checkout(parent, 1 /*auto-checkout*/...) So we need to allow an auto-checkout of a parent VCR. See checkout() discussion below. * if the resource doesn't exist, then create the resource: - vsn_hooks->vsn_control(resource, NULL). We need to implement this from scratch. For now, we only allow a NULL target, which means, 'create an empty file'. The resource itself must be tweaked in-place into a true VCR. * if the resource exists but isn't a WR, check it out: - vsn_hooks->checkout(resource, 1 /*auto-checkout*/...) This routine currently takes a VR and an activity, and returns a totally new WR. Here's what we need to make happen if we get 'auto-checkout' flag passed in: - verify we have a VCR, and get the VCR's VR. - create a new activity (txn) - checkout the VR into the activity, creating a WR. - don't return the WR via pointer, but instead tweak the VCR to look like the WR (think about how to do this.) [ gstein: the docco for checkout() states you're allowed to tweak the passed-in resource; that is why it is non-const ] dav_svn_put() then attempts to push data into the WR's stream, no prob. auto_checkin: * if something went wrong when PUTting data into the resource's stream, then this function attempts to either - vsn_hooks->uncheckout() [if a resource or parent was checked out] I guess we would abort the svn txn and magically change the WR back into the VCR? (think about how to do this.) [ gstein: the dav_resource is non-const; just change it. we aren't talking a stateful change, just altering a runtime structure. ] - vsn_hooks->remove_resource() [if a new resource was created] No prob. This just calls svn_fs_delete_tree() on the newly created object. * otherwise, in normal case, if resource was checked out: - vsn_hooks->checkin(resource) Need to write this routine! It would commit the txn hidden within the WR, using an auto-generated log message. Furthermore, it needs to possibly return the new VR that was created, and convert the WR resource back into a VCR that points to the new VR. (Do our VCR's point to VR's right now? [ gstein: VCRs never "point"; semantically, they just get updated with properties and content to match a VR. ] just implicitly through the checked-in property, right?) * then, if parent was checked out too, - vsn_hooks->checkin(parent) Oops, this is a problem. it's very likely that we just committed the txn in the previous call to checkin(). the best strategy here, I suppose, is to not throw an error... i.e. if the txn no longer exists, just do nothing. (cmpilato isn't sure what happens if you try to open_txn() on a txn that is already committed.) [ gstein: mod_dav should auto-checkin a set of resources rather than one at a time. the provider can then do it atomically, or one at a time, as they see fit ] [ gstein: note that we're more than likely going to need to update the mod_dav provider APIs. I think the answer is to add a binary API version to the new ap_provider() interface, to publish a mod_dav provider (binary) API version, and to state that the old provider registration function now throws an error (by definition, modules using it would be obsolete). as we rev the API, we just bump the published mod_dav API version. one problem here is that the current httpd release strategy might get in our way; I need to review some of the recent decisions to see how that affects us from an ongoing "httpd needs some fixes for svn" standpoint. ] ----------------------------------------------------------- Late 2004 Notes: We're working on a real locking system now. Eventually, we'll be able to use this feature to complete autoversioning ("phase 2" above.) - remember that we'll need to be able to look up a lock in the lock-table by UUID. Generic DAV clients use UUID URIs to talk about locks. - MSWord locks a document with a timeout of 180 seconds, then continuously re-LOCKs every so often, passing the existing lock-token back in an If: header. mod_dav_fs returns the same lock-token UUID (presumably with a newer expiration time). Our current implementation doesn't allow for mutable lock tokens. We need to make sure that this doesn't mess up MSWord... that it's usin the *last* token to renew locks, not the first one.