documentation: add some internals documentation about bid merge
authorPierre-Yves David <pierre-yves.david@octobus.net>
Mon, 22 Jun 2020 13:32:20 +0200
changeset 45060 79f6f9fa18c1
parent 45059 f43bc4ce0d69
child 45061 baffdfa5bd1a
documentation: add some internals documentation about bid merge This is an important algorithm that was only documented on the wiki so far. Some update to the algorithm (and associated doc) is to expected in the future since the bid merge algorithm is bug-ridden when it comes to file deletion comes to play. Differential Revision: https://phab.mercurial-scm.org/D8711
mercurial/help.py
mercurial/helptext/internals/bid-merge.txt
tests/test-help.t
--- a/mercurial/help.py	Sun Jun 28 17:32:32 2020 +0900
+++ b/mercurial/help.py	Mon Jun 22 13:32:20 2020 +0200
@@ -345,6 +345,11 @@
 
 internalstable = sorted(
     [
+        (
+            [b'bid-merge'],
+            _(b'Bid Merge Algorithm'),
+            loaddoc(b'bid-merge', subdir=b'internals'),
+        ),
         ([b'bundle2'], _(b'Bundle2'), loaddoc(b'bundle2', subdir=b'internals')),
         ([b'bundles'], _(b'Bundles'), loaddoc(b'bundles', subdir=b'internals')),
         ([b'cbor'], _(b'CBOR'), loaddoc(b'cbor', subdir=b'internals')),
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mercurial/helptext/internals/bid-merge.txt	Mon Jun 22 13:32:20 2020 +0200
@@ -0,0 +1,115 @@
+Bid merge is a feature introduced in Mercurial 3.0, a merge algorithm for
+dealing with complicated merges.
+
+Bid merge is controled  by the `merge.preferancestor` configuration option. The
+default is set to `merge.preferancetors=*` and enable bid merge. Mercurial will
+perform a bid merge in the cases where a merge otherwise would emit a note:
+using X as ancestor of X and X message.
+
+Problem it is solving
+=====================
+
+Mercurial's core merge algorithm is the traditional "three-way merge". This
+algorithm combines all the changes in two changesets relative to a common
+ancestor. But with complex DAGs, it is often possible to have more than one
+"best" common ancestor, with no easy way to distinguish between them.
+
+For example, C and D has 2 common ancestors in the following graph::
+
+  C   D
+  |\ /|
+  | x |
+  |/ \|
+  A   B
+   \ /
+    R
+
+Mercurial used to arbitrarily chooses the first of these, which can result in
+various issues:
+
+* unexpected hard 3-way merges that would have been completely trivial if
+  another ancestor had been used
+
+* conflicts that have already been resolved may reappear
+
+* changes that have been reversed can silently oscillate
+
+One common problem is a merge which with the "right" ancestor would be trivial
+to resolve because only one side changed. Using another ancestor where the same
+lines are different, it will give an annoying 3-way merge.
+
+Other systems like Git have attacked some of these problems with a so-called
+"recursive" merge strategy, that internally merges all the possible ancestors
+to produce a single "virtual" ancestor to merge against. This is awkward as the
+internal merge itself may involve conflicts (and possibly even multiple levels
+of recursion), which either requires choosing a conflict disposition (e.g.
+always choose the local version) or exposing the user to extremely confusing
+merge prompts for old revisions. Generating the virtual merge also potentially
+involves invoking filters and extensions.
+
+Concept
+=======
+
+(Bid merge is pretty much the same as Consensus merge.)
+
+Bid merge is a strategy that attempts to sensibly combine the results of the
+multiple possible three-way merges directly without producing a virtual
+ancestor. The basic idea is that for each ancestor, we perform a top-level
+manifest merge and generate a list of proposed actions, which we consider
+"bids". We then make an "auction" among all the bids for each file and pick the
+most favourable. Some files might be trivial to merge with one ancestor, other
+files with another ancestor.
+
+The most obvious advantage of considering multiple ancestors is the case where
+some of the bids for a file is a "real" (interactive) merge but where one or
+more bids just take on of the parent revisions. A bid for just taking an
+existing revision is very simple and low risk and is an obvious winner.
+
+The auction algorithm for merging the bids is so far very simple:
+
+* If there is consensus from all the ancestors, there is no doubt what to do. A
+  clever result will be indistinguishable from just picking a random bid. The
+  consensus case is thus not only trivial, it is also already handled
+  perfectly.
+
+* If "keep local" or "get from other" actions is an option (and there is only
+  one such option), just do it.
+
+* If the auction doesn't have a single clear winner, pick one of the bids
+  "randomly" - just as it would have done if only one ancestor was considered.
+
+This meta merge algorithm has room for future improvements, especially for
+doing better than picking a random bid.
+
+Some observations
+=================
+
+Experience with bid merge shows that many merges that actually have a very
+simple solution (because only one side changed) only can be solved efficiently
+when we start looking at file content in filemerge ... and it thus also
+requires all ancestors passed to filemerge. That is because Mercurial includes
+the history in filelog hashes. A file with changes that ends up not changing
+the content (could be change + backout or graft + merge or criss cross merges)
+still shows up as a changed file to manifestmerge. (The git data model has an
+advantage here when it uses hashes of content without history.) One way to
+handle that would be to refactor manifestmerge, mergestate/resolve and
+filemerge so they become more of the same thing.
+
+There is also cases where different conflicting chunks could benefit from using
+multiple ancestors in filemerge - but that will require merge tools with fancy
+support for using multiple ancestors in 3+-way merge. That is left as an
+exercise for another day. That seems to be a case where "recursive merge" has
+an advantage.
+
+The current manifest merge actions are very low level imperative and not
+symmetrical. They do not only describe how two manifests should be merged, they
+also describe a strategy for changing a context from a state where it is one of
+the parents to the state where it is the result of the merge with the other
+parent. I can imagine that manifestmerge could be simplified (and made more
+suitable for in memory merges) by separating the abstract merge actions from
+the actual file system operation actions. A more clever wcontext could perhaps
+also take care of some of the branchmerge special cases.
+
+We assume that the definition of Mercurial manifest merge will make sure that
+exactly the same files will be produced, no matter which ancestor is used. That
+assumption might be wrong in very rare cases that really not is a problem.
--- a/tests/test-help.t	Sun Jun 28 17:32:32 2020 +0900
+++ b/tests/test-help.t	Mon Jun 22 13:32:20 2020 +0200
@@ -1094,6 +1094,7 @@
   
       To access a subtopic, use "hg help internals.{subtopic-name}"
   
+       bid-merge     Bid Merge Algorithm
        bundle2       Bundle2
        bundles       Bundles
        cbor          CBOR
@@ -3439,6 +3440,13 @@
   <tr><td colspan="2"><h2><a name="topics" href="#topics">Topics</a></h2></td></tr>
   
   <tr><td>
+  <a href="/help/internals.bid-merge">
+  bid-merge
+  </a>
+  </td><td>
+  Bid Merge Algorithm
+  </td></tr>
+  <tr><td>
   <a href="/help/internals.bundle2">
   bundle2
   </a>