Mercurial > evolve
changeset 4833:72cebe6642d7
stablesort: add some documentation for stablesort
This should help people to understand what is going on here.
author | Pierre-Yves David <pierre-yves.david@octobus.net> |
---|---|
date | Sun, 08 Sep 2019 11:56:11 +0200 |
parents | 4c6dd20e8cc2 |
children | 9a52930f6781 |
files | hgext3rd/evolve/stablesort.py |
diffstat | 1 files changed, 72 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/hgext3rd/evolve/stablesort.py Tue Sep 03 12:48:47 2019 +0200 +++ b/hgext3rd/evolve/stablesort.py Sun Sep 08 11:56:11 2019 +0200 @@ -7,6 +7,78 @@ # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. +"""Stable sorting for the mercurial graph + +The goal is to provided an efficient, revnum independant way, to sort revisions +in a topologicaly. Having it independant from revnum is important to make it +stable from one repository to another, unlocking various capabilities. For +example it can be used for discovery purposes. + +This docstring describe the currently preferred solution: + +Basic principle: +---------------- + +We are always talking about set of revision defined by a single heads +(eg: `stablesort(::r)`) + +For non merge revisions, the definition is simple:: + + stablesort(::r) == stablesort(p1(r)) + r + +For merge revision, we reuse as much as possible of the parents order: + + pl = stablemin(parents(m)) + ph = stablemax(parents(m)) + stablesort(::m) == stablesort(pl) + + [i for in in stablesort(ph) if in ph % pl] + + m + +The `ph % pl` set of revision is called the "exclusive part". In this area we +try to reuse as much as the stable-sorted order for `ph`. In simple case, the +`[i for i in stablesort(ph) if i in ph % pl]` is just the contiguous final range of +`stablesort(ph)`. However in more advance case, this will not be contiguous and +we'll need to skip over multiple parts of `stablesort(ph)` to cover `ph % pl`. + +Another important details is that, in practice, the sorted revision are always +walked backward, from the head of the set of revisions. + +preexisting cached data +----------------------- + +The stable sort assume we already have 2 important property cached for each +changesets: + +1) changeset depth == len(::r) +2) first merge == max(merge() and ::r) + +Caching strategy +---------------- + +Since we always walk from the head, the iteration mostly have to follow the +unique parent of non merge revision. For merge revision, we need to iterate +through one of the parent before coming back to the other parent eventually. + +To effiently cache the path we need to walk, we records "jumps". A jump is a +point where the next revision will not be a parent of the current revision, but +another point in the graph. This correspond to point were we need to "come back +to the other parent". + +Jumps are recorded using the following formats: + + (jump-point, jump-destination, section-size) + +* jump-point is the last revision number we should iterate over before jumping, +* jump-destination is the next revision we should iterate over after the jump point, +* section-size is the number of revision to be iterated before reaching jump-point. + +the section-size is not directly used when doing a stable-sorted walk. However +it is useful for higher level piece of code to take decision without having to +actually walk the graph, (see stable range documentation). + +For each merge, we store the set of jumps that cover the exclusive side. +""" + import array import collections import struct