detection.txt   [plain text]



                               -*- text -*-

                          TREE CONFLICT DETECTION

This file describes how we plan to detect the tree conflicts described
in use-cases.txt, for both files and directories.

Issue reference:

  http://subversion.tigris.org/issues/show_bug.cgi?id=2282

==========
USE CASE 1
==========

If 'svn update' opens an item (file or directory) that is scheduled
for deletion in the working copy, then the item is a tree conflict victim.
The update of the item (including items within it, if it is a directory)
will be skipped.

==========
USE CASE 2
==========

If 'svn update' is about to delete a locally-modified item, then the
item is a tree conflict victim.  The deletion of the item will be
skipped.

Note
----

A directory is considered to be locally modified if the directory's
own properties have been modified, or if any item in the directory has
been modified, added or deleted within the directory.  The check for
modifications continues to the "ambient" depth.

==========
USE CASE 3
==========

If 'svn update' is about to delete an item that is scheduled for
deletion in the working copy, then the item is a tree conflict victim.
The deletion of the item will be skipped.

==========
USE CASE 4
==========

If 'svn merge' tries to modify an item that does not exist in the
target working copy, then the nonexistent item is a tree conflict
victim.

Note
----

Often, the target item has been renamed in the history of the working
copy's branch. It would be handy if the user could run 'svn merge'
again, specifying where to apply an incoming text diff. This is
the "ELSEWHERE" scenario discussed in
notes/tree-conflicts/resolution.txt.

A similar situation occurs if the source diff doesn't cover as many
revisions of a file as it should. Either the range of the source diff
should be extended to include the revision that created the file, or
the range should be reduced to avoid including any revisions that
modify the file.

However, the current plan is to disallow merges into tree-conflicted
directories. This means that users will first have to mark the
tree-conflict around the missing victim as resolved before attempting
to merge the file again. This work flow may be awkward, but has the
benefit of ensuring that no missing files are overlooked while
merging.

==========
USE CASE 5
==========

If 'svn merge' is about to delete an existing item, and the existing
item does not match the corresponding item at the merge's
start-revision, then the item is a tree conflict victim.  The merge
will skip the item (including its content, if it is a directory).

Notes
-----

We don't want to flag every file deletion as a tree conflict.  We
want to warn the user if the file to be deleted locally is different
from the file deleted in the merge source.  The user then has a chance
to merge these unique changes.

When comparing items, local modifications take precedence over the
pristine content.

For a directory, the comparison will descend to the depth specified in
the merge command.  The merge depth is usually infinite, but in a
sparse working copy, the default merge depth is the "ambient" depth of
the given directory.

==========
USE CASE 6
==========

If 'svn merge' tries to delete an item that does not exist in the
target working copy, then the nonexistent item is a tree conflict
victim.

Notes
-----

This is similar to use case 4.

Semantically, a tree conflict occurs if 'svn merge' either tries to
apply the "delete" half of a "move" onto a file that was simply
deleted in the target branch's history, or tries to apply a simple
"delete" onto a file that has been moved in the target branch, or
tries to move a file that has already been moved to a different name
in the target branch.

Some users may want to skip the tree conflict and have the result
automatically resolved if two rename operations have the same
destination, or if a file is simply deleted on both branches. But we
have to mark these as tree conflicts due to the current lack of "true
rename" support. It does not appear to be feasible to detect more than
the double-delete aspect of the move operation.

===========
PERSISTENCE
===========

Persistent conflict data will be stored in the metadata of the
directory containing the tree conflict victim.

===================
PER-VICTIM HANDLING
===================

Our initial design, in which tree conflicts were displayed and
resolved at the parent-directory level, will be discarded.  The status
of each tree conflict victim will be displayed separately, and each
tree conflict victim will be resolved separately.

The status command will gain a column in the sixth position, after the
lock-status column.  This new tree-conflict column will contain 'C'
for a tree conflict victim, and is otherwise blank.  Corresponding
columns will be added to the output of the update, merge, switch and
checkout commands.  The info command will include tree conflict
descriptions for victims only.  The resolved and revert commands will
be called per victim (by default), not on the parent directory.

As a minor benefit, this will allow commits of non-tree-conflicted
items in a directory containing tree conflict victims.

==================
SKIPPING DETECTION
================== 

During an update or switch, we skip tree conflict detection if the
user has provided the '--force' option.  This allows an interrupted
update to continue (see the use case 1 example below).  This is in
addition to the already-existing behavior: with '--force', update or
switch will tolerate an obstruction of the same type as the item added
at that path by the operation.

During a merge, we skip tree conflict detection if the record_only
field of the merge-command baton is TRUE. A record-only merge
operation updates mergeinfo without touching files.

=========================
OBSTRUCTIONS DURING MERGE
=========================

If 'svn merge' fails to apply an operation to an item because the
item is obstructed (i.e. an unversioned item of the same name is
in the file's place), the obstructed file is a tree conflict victim.

We want to make sure that a merge either completes successfully
or any problems found during a merge are flagged as conflicts.
Skipping obstructed items during merge is no longer acceptable
behaviour, since users might not be aware of obstructions that were
skipped when they commit the result of a merge.

====================
NOTES ON DIRECTORIES
====================

=======================
Equality of directories
=======================

How do we define equality between directories?

Two directories with no subdirectories are equal if they contain the
same files with the same content, and the same properties with the
same content.

Two directories with subdirectories are equal if they contain the same
files with the same content, and the same properties with the same
content, and all their subdirectories are equal.

How can this be implemented?

For each directory, it could retrieve the corresponding dir entry from
the repository as it existed in the merge-start source of the merge,
and compare the two for equality, i.e. check whether all fields in the
svn_dirent_t returned by the repo match their corresponding attributes
of the directory as found in the working copy.

The merge-start revision shall be a new additional parameter to
merge_dir_deleted(). The ra session needed to contact the repository
via the get_dir() method is already contained in the merge baton which
is passed to merge_dir_deleted().

The last two paragraphs were taken from:
http://subversion.tigris.org/servlets/ReadMsg?listName=dev&msgNo=136794

=======================================
Deep tree conflict example (use case 1)
=======================================

In a working copy, a directory named B is scheduled for deletion.
Running 'svn status' lists the entire tree rooted at B.

  D       A/B/E/alpha
  D       A/B/E/beta
  D       A/B/E
  D       A/B/F
  D       A/B

Running 'svn status -uq' warns that the repository contains changes to
the locally-deleted directory.

  D       *        1   A/B
  Status against revision:      2

In the HEAD revision on the repository, another user has modified a
file, deleted a file and a directory, and added a file and a
directory.

  M       A/B/E/alpha
  D       A/B/E/beta
  A       A/B/E/gamma
  D       A/B/F
  A       A/B/G

Here is the output of 'svn update'.  The 'C' in the fourth column marks
the tree-conflicted item.  The update of A/B has been skipped.

     C A/B
  Update incomplete due to conflicts.
  Tree conflicts:   1

The tree conflict revealed by the update is recorded in the metadata
of directory A.  It is described by 'svn info A/B'.

  The update wants to modify files or directories inside 'A/B'.
  You have deleted or renamed 'A/B' locally.

Note: The exact wording of the update and info warnings is not yet
settled.

To view the incoming changes that were delayed by the tree conflict,
the user can run 'svn status -u'.

  D      *       2   A/B/E/alpha
  D      *       2   A/B/E/beta
         *           A/B/E/gamma
  D      *       2   A/B/E/
  D      *       2   A/B/F
         *           A/B/G
  D      *       2   A/B
  Status against revision:      2

To see more detail, the user can run 'svn log -v' and 'svn diff -r2'.

Any commit of A (including any commit in a parent directory of A) and
any commit within A (including any commit in a subdirectory of A) will
be blocked by the tree conflict.  The user can revert the deletion of
A/B and update it, or keep the deletion and force the update via 'svn
update --force A/B'.

=========================================
TREE CONFLICT DETECTION WITH TRUE RENAMES
=========================================

To properly detect the situations described in the "diagram of current
behaviour" for use case 2 and 3, we need to have access to a list of
all files the update will add with history.

For use cases 1 and 3, we need a list of all files added locally with
history.

We need access to this list during the whole update editor drive.
Then we could do something like this in the editor callbacks:

      edit_file(file):

        if file is locally deleted:
          for each added_file in files_locally_added_with_history:
            if file has common ancestor with added_file:
              /* user ran "svn move file added_file" */
              use case 1 has happened!

      delete_file(file):

        if file is locally modified:
          for each added_file in files_added_with_history_by_update:
            if file has common ancestor with added_file:
              use case 2 has happened!

        else if file is locally deleted:
          for each added_file in files_added_with_history_by_update:
            if file has common ancestor with added_file:
              use case 3 has happened!

Since the update editor drive crawls through the working copy and the
callbacks consider only a single file, we need to generate the list
before checking for tree conflicts.  Two ideas for this are:

        1) Wrap the update editor with another editor that passes
           all calls through but takes note of which files the
           update adds with history. Once the wrapped editor is
           done run a second pass over the working copy to populate
           it with tree conflict info.

        2) Wrap the update editor with another editor that does
           not actually execute any edits but remembers them all.
           It only applies the edits once the wrapped editor has
           been fully driven. Tree conflicts could now be detected
           precisely because the list of files we need would be
           present before the actual edit is carried out.

Approach 1 has the problem that there is no reliable way of storing
the file list in face of an abort.

Approach 2 is obviously insane. ;-)

Keeping the list in RAM is dangerous, because the list would be lost
if the user aborts, leaving behind an inconsistent working copy that
potentially lacks tree conflict info for some conflicts.

The usual place to store persistent information inside the working
copy is the entries file in the administrative area. Loggy writes to
this file ensure consistency even if the update is aborted.  But
keeping the list in entries files also has problems: Which entries
file do we keep it in? Scattering the list across lots of entries
files isn't an option because the list needs to be global.  Crawling
the whole working copy at the start of an update to gather lost file
lists would be too much of a performance penalty.

Storing it in the entries file of the anchor of the update operation
(i.e. the current working directory of the "svn update" process) is a
bad idea as well because when the interrupted update is continued the
anchor might have changed. The user may change the working directory
before running "svn update" again.

Either way, interrupted updates would leave scattered partial lists of
files in entries throughout the working copy. And interrupted updates
may not correctly mark all tree conflicts.

So how can, for example, use case 3 be detected properly?

The answer could be "true renames". All the above is due to the fact
that we have to try to catch use case 3 from a "delete this file"
callback. We are in fact trying to reconstruct whether a deletion
of a file was due to the file being moved with "svn move" or not.

But if we had a callback in the update editor like:

        move_file(source, dest);

detecting use case 3 would be extremely simple. Simply check whether
the source of the move is locally deleted. If it is, use case 3 has
happened, and the source of the move is a tree conflict victim.

Use case 2 could be caught by checking whether the source of the move
has local modifications.

Use case 1 could be detected by checking whether the target for a file
modification by update matches the source of a rename operation in the
working copy. This would require storing rename information inside the
administrative areas of both the source and target directories of file
move operations to avoid having to maintain a global list of rename
operations in the working copy for reference by the update editor.