changeset 4833:72cebe6642d7

stablesort: add some documentation for stablesort This should help people to understand what is going on here.
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Sun, 08 Sep 2019 11:56:11 +0200
parents 4c6dd20e8cc2
children 9a52930f6781
files hgext3rd/evolve/stablesort.py
diffstat 1 files changed, 72 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/hgext3rd/evolve/stablesort.py	Tue Sep 03 12:48:47 2019 +0200
+++ b/hgext3rd/evolve/stablesort.py	Sun Sep 08 11:56:11 2019 +0200
@@ -7,6 +7,78 @@
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
+"""Stable sorting for the mercurial graph
+
+The goal is to provided an efficient, revnum independant way, to sort revisions
+in a topologicaly. Having it independant from revnum is important to make it
+stable from one repository to another, unlocking various capabilities. For
+example it can be used for discovery purposes.
+
+This docstring describe the currently preferred solution:
+
+Basic principle:
+----------------
+
+We are always talking about set of revision defined by a single heads
+(eg: `stablesort(::r)`)
+
+For non merge revisions, the definition is simple::
+
+    stablesort(::r) == stablesort(p1(r)) + r
+
+For merge revision, we reuse as much as possible of the parents order:
+
+    pl = stablemin(parents(m))
+    ph = stablemax(parents(m))
+    stablesort(::m) == stablesort(pl)
+                       + [i for in in stablesort(ph) if in ph % pl]
+                       + m
+
+The `ph % pl` set of revision is called the "exclusive part". In this area we
+try to reuse as much as the stable-sorted order for `ph`. In simple case, the
+`[i for i in stablesort(ph) if i in ph % pl]` is just the contiguous final range of
+`stablesort(ph)`. However in more advance case, this will not be contiguous and
+we'll need to skip over multiple parts of `stablesort(ph)` to cover `ph % pl`.
+
+Another important details is that, in practice, the sorted revision are always
+walked backward, from the head of the set of revisions.
+
+preexisting cached data
+-----------------------
+
+The stable sort assume we already have 2 important property cached for each
+changesets:
+
+1) changeset depth == len(::r)
+2) first merge == max(merge() and ::r)
+
+Caching strategy
+----------------
+
+Since we always walk from the head, the iteration mostly have to follow the
+unique parent of non merge revision. For merge revision, we need to iterate
+through one of the parent before coming back to the other parent eventually.
+
+To effiently cache the path we need to walk, we records "jumps". A jump is a
+point where the next revision will not be a parent of the current revision, but
+another point in the graph. This correspond to point were we need to "come back
+to the other parent".
+
+Jumps are recorded using the following formats:
+
+    (jump-point, jump-destination, section-size)
+
+* jump-point is the last revision number we should iterate over before jumping,
+* jump-destination is the next revision we should iterate over after the jump point,
+* section-size is the number of revision to be iterated before reaching jump-point.
+
+the section-size is not directly used when doing a stable-sorted walk. However
+it is useful for higher level piece of code to take decision without having to
+actually walk the graph, (see stable range documentation).
+
+For each merge, we store the set of jumps that cover the exclusive side.
+"""
+
 import array
 import collections
 import struct