func-spec.html   [plain text]


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css"> /* <![CDATA[ */
  @import "tigris-branding/css/tigris.css";
  @import "tigris-branding/css/inst.css";
  /* ]]> */</style>
<link rel="stylesheet" type="text/css" media="print"
  href="tigris-branding/css/print.css"/>
<script type="text/javascript" src="tigris-branding/scripts/tigris.js"></script>
<title>Merge Tracking Functional Specification</title>
</head>

<body>
<div class="h1">
<h1>Merge Tracking Functional Specification</h1>

<p><a href="index.html">Merge tracking</a> functional
specification.</p>

<p style="color: red">TODO: Incorporate <a
href="summit.html">CollabNet Summit findings</a>.  Describe how each
<a href="requirements.html">requirement</a> will actually function for
Subversion.  Remove redundancies.</p>

<div class="h2" id="repeated-merge">
<h2>Repeated Merge</h2>

<p>There are two general schemes for solving the <a
href="requirements.html#repeated-merge">repeated merge</a>
problem.</p>

<div class="h3" id="mrca-merge">
<h3>The Most Recent Common Ancestor approach (MRCA)</h3>

<p>In this scheme, you record an optional set of merge sources in each
node-revision.  When asked to do a merge with only one source (that
is, just "svn merge URL", with no second argument), you compute the
most recent ancestor and do a three-way merge between the common
ancestor, the given URL, and the WC.</p>

<p>To compute the most recent ancestor, you chain off the immediate
predecessors of each node-revision.  The immediate predecessors are
the direct predecessor (the most recent node-revision within the node)
and the merge sources.  An interleaved breadth-first search should
find the most recent common ancestor.</p>

</div>  <!-- mrca-merge -->

<div class="h3" id="as-merge">
<h3>The Ancestry Set approach (AS)</h3>

<p>In this scheme, you record the full ancestry set for each
node-revision -- that is, the set of all changes which are accounted
for in that node-revision.  (How you store this ancestry set is
unimportant; the point is, you need a reasonably efficient way of
determining it when asked.)  If you are asked to "svn merge URL", you
apply the changes present in URL's ancestry but absent in WC's
ancestry.  Note that this is not a single three-way merge; you may
have to apply a large number of disjoint changes to the WC.</p>

<p>For a longer description of this approach, see the "Merging and
Ancestry" section of the original <a
href="http://svn.collab.net/repos/svn/trunk/doc/design/model.xml">design
doc</a>.</p>

<div class="h4" id="aslb-merge">
<h4>Ancestry-Sensitive Line-Based Merge</h4>

<p>Make 'hunks' of contextually-merged text sensitive to ancestry.</p>

<p>A high-resolution version of <a
href="requirements.html#repeated-merge">repeated merge</a>.  Rather
than tracking whole changesets, we track the lineage of specific lines
of code within a file.  The basic idea is that when re-merging a
particular hunk of code, the contextual-merging process is aware that
certain lines of code already represent the merging of particular
lines of development.  Jack Repenning has a great example of this from
ClearCase (see ASCII diagram below).</p>

<p>See the <a href="../variance-adjusted-patching.html">variance
adjusted patching</a> document for an extended discussion of how to
implement this by composing diffs; see <a
href="http://svn.collab.net/svn-doxygen/svn__diff_8h.html#a11"
><code>svn_diff_diff4()</code></a> for an implementation of same.  We
may be closer to ancestry-sensitive merging than we think.</p>

<p>Here's an example demonstrating how individual lines of code can be
tracked.  In this diagram, we're drawing the lineage of a single file,
with time flowing downwards.  The file begins life with three lines of
text, "1\n2\n\3\n".  The file then splits into two lines of
development.</p>

<pre>
                    1     
                    2     
                    3     
                  /   \   
                 /     \  
                /       \ 
            one           1   
            two           2.5 
            three         3   
             |     \      |
             |      \     |   
             |       \    |            
             |        \   |            
             |         \ one                ## This node is a human's
             |           two-point-five     ## merge of two sides.
             |           three        
             |            |
             |            |
             |            |
            one          one
            Two          two-point-five
            three        newline       
               \         three  
                \         |   
                 \        |
                  \       |
                   \      |
                    \     |
                     \    |
                      \   |
                       \  |
                         one                ## This node is a human's
                         Two-point-five     ## merge of the changes
                         newline            ## since the last merge.
                         three
</pre>

<p>It's the second merge that's important here.</p>

<p>In a system like Subversion, the second merge of the left branch to
the right will fail miserably: the whole file's contents will be
placed within conflict markers.  That's because it's trying to dumbly
apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the
target file has no matching lines at all.</p>

<p>A smarter system (like Clearcase) would remember that the previous
merge had happened, and specifically notice that the lines "one" and
"three" are the results of that previous merge.  Therefore, it would
ask the human only to deal with the "Two" versus "two-point-five"
conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would
already be accounted for.</p>

</div>  <!-- aslb-merge -->

</div>  <!-- as-merge -->

<div class="h3">
<h3>Comparisons, Arguments, and Questions</h3>

<p>AS allows you to merge changes from a branch out of order, without
doing any bookkeeping.  MRCA requires you to merge changes from a
branch in order.</p>

<p>At first look, MRCA may be simpler to implement, since it results
in a single three-way merge.  However, it likely does not handle all
edge cases.  For instance, it may break down faster if the merging
topology is not hierarchical.</p>

<p>MRCA may be easier for users to understand, even though AS is
probably simpler to a mathematician.</p>

<p>Consistency with other modern version controls systems is
desirable.</p>

<p>If a user asks to merge a directory, should we apply MRCA or AS to
each subdirectory and file to determine what ancestor(s) to use?  Or
should we apply MRCA or AS just once, to the directory itself?  The
latter approach seems simpler and more efficient, but will break down
quickly if the user wants to merge subdirectories of a branch in
advance of merging in the whole thing.</p>

</div>  <!-- h3 -->

</div>  <!-- repeated-merge -->


<div class="h2" id="related-documents">
<h2>Related Documents and Discussion</h2>

<ul>
  <li><a href="http://www.codeville.org/">Codeville</a> is reputed to
  excel both in its usefulness of storage of line-history in the
  <em>weave</em> format, and a corresponding merge algorithm:

  <ul>
    <li><a href="http://revctrl.org/PreciseCodevilleMerge">"Precise
    Codeville merge"</a> algorithm and Python implementation.  The
    algorithm takes into account line history, history points where
    they came from, the ability to retrieve ancestors' text as needed,
    and a snapshot of the current file.  It purports accuracy where
    <a href="http://revctrl.org/CodevilleMerge">other algorithms fall
    down</a>.</li>

    <li>Bram Cohen describes <a
    href="http://thread.gmane.org/gmane.comp.version-control.revctrl/2"
    >the merge algorithm</a> (May 2005)</li>
  </ul>
  </li>

  <li><a
  href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_base/notes/structure">Structure
  of the Subversion FS BerkeleyDB backend</a></li>

</ul>

</div>  <!-- related-documents -->

</div>  <!-- h1 -->
</body>
</html>