comparison mercurial/changegroup.py @ 39861:db5501d93bcf

changegroup: remove reordering control (BC) This logic - including the experimental bundle.reorder option - was originally added in a8e3931e3fb5 in 2011 and then later ported to changegroup.py. The intent of this option and associated logic is to control the ordering of revisions in deltagroups in changegroups. At the time it was implemented, only changegroup version 1 existed and generaldelta revlogs were just coming into the world. Changegroup version 1 requires that deltas be made against the last revision sent over the wire. Used with generaldelta, this created an impedance mismatch of sorts and resulted in changegroup producers spending a lot of time recomputing deltas. Revision reordering was introduced so outgoing revisions would be sent in "generaldelta order" and producers would be able to reuse internal deltas from storage. Later on, we introduced changegroup version 2. It supported denoting which revision a delta was against. So we no longer needed to sort outgoing revisions to ensure optimal delta generation from the producer. So, subsequent changegroup versions disabled reordering. We also later made the changelog not store deltas by default. And we also made the changelog send out deltas in storage order. Why we do this for changelog, I'm not sure. Maybe we want to preserve revision order across clones? It doesn't really matter for this commit. Fast forward to 2018. We want to abstract storage backends. And having changegroup code require knowledge about how deltas are stored internally interferes with that goal. This commit removes reordering control from changegroup generation. After this commit, the reordering behavior is: * The changelog is always sent out in storage order (no behavior change). * Non-changelog generaldelta revlogs are reordered to always be in DAG topological order (previously, generaldelta revlogs would be emitted in storage order for version 2 and 3 changegroups). * Non-changelog non-generaldelta revlogs are sent in storage order (no behavior change). * There exists no config option to override behavior. The big difference here is that generaldelta revlogs now *always* have their revisions sorted in DAG order before going out over the wire. This behavior was previously only done for changegroup version 1. Version 2 and version 3 changegroups disabled reordering because the interchange format supported encoding arbitrary delta parents, so reordering wasn't strictly necessary. I can think of a few significant implications for this change. Because changegroup receivers will now see non-changelog revisions in DAG order instead of storage order, the internal storage order of manifests and files may differ substantially between producer and consumer. I don't think this matters that much, since the storage order of manifests and files is largely hidden from users. Only the storage order of changelog matters (because `hg log` shows the changelog in storage order). I don't think there should be any controversy here. The reordering of revisions has implications for changegroup producers. Previously, generaldelta revlogs would be emitted in storage order. And in the common case, the internally-stored delta could effectively be copied from disk into the deltagroup delta. This meant that emitting delta groups for generaldelta revlogs would be mostly linear read I/O. This is desirable for performance. With us now reordering generaldelta revlog revisions in DAG order, the read operations may use more random I/O instead of sequential I/O. This could result in performance loss. But with the prevalence of SSDs and fast random I/O, I'm not too worried. (Note: the optimal emission order for revlogs is actually delta encoding order. But the changegroup code wasn't doing that before or after this change. We could potentially implement that in a later commit.) Changegroups in DAG order will have implications for receivers. Previously, receiving storage order might mean seeing a number of interleaved branches. This would mean long delta chains, sparse I/O, and possibly more fulltext revisions instead of deltas, blowing up storage storage. (This is the same set of problems that sparse revlogs aims to address.) With the producer now sending revisions in DAG order, the receiver also stores revisions in DAG order. That means revisions for the same DAG branch are all grouped together. And this should yield better storage outcomes. In other words, sending the reordered changegroup allows the receiver to have better storage order and for the producer to not propagate its (possibly sub-optimal) internal storage order. On the mozilla-unified repository, this change influences bundle generation: $ hg bundle -t none-v2 -a before: time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000) after: time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000) before: 7,150,228,967 bytes (uncompressed) after: 7,041,556,273 bytes (uncompressed) before: 1,669,063,234 bytes (zstd l=3) after: 1,628,598,830 bytes (zstd l=3) $ hg unbundle before: time: real 511.910 secs (user 466.750+0.000 sys 32.680+0.000) after: time: real 487.790 secs (user 443.940+0.000 sys 30.840+0.000) 00manifest.d size: source: 274,924,292 bytes before: 304,741,626 bytes after: 245,252,087 bytes .hg/store total file size: source: 2,649,133,490 before: 2,680,888,130 after: 2,627,875,673 We see the bundle size drop. That's probably because if a revlog internally isn't storing a delta, it will choose to delta against the last emitted revision. And on repos with interleaved branches (like mozilla-unified), the previous revision could be an unrelated branch and therefore be a large delta. But with this patch, the previous revision is likely p1 or p2 and a delta should be small. We also see the manifest size drop by ~50 MB. It's worth noting that the manifest actually *increased* in size by ~25 MB in the old strategy and decreased ~25 MB from its source in the new strategy. Again, my explanation for this is that the DAG ordering in the changegroup is resulting in better grouping of revisions in the receiver, which results in more compact delta chains and higher storage efficiency. Unbundle time also dropped. I suspect this is due to the revlog having to work less to compute deltas since the incoming deltas are more optimal. i.e. the receiver spends less time resolving fulltext revisions as incoming deltas bounce around between DAG branches and delta chains. We also see bundle generation time increase. This is not desirable. However, the regression is only significant on the original repository: if we generate a bundle from the repository created from the new, always reordered bundles, we're close to baseline (if not at it with expected noise): $ hg bundle -t none-v2 -a before (original): time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000) after (original): time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000) after (new repo): time: real 362.280 secs (user 260.300+0.000 sys 17.700+0.000) This regression is a bit worrying because it will impact serving canonical repositories (that don't have optimal internal storage unless they are reordered - possibly as part of running `hg debugupgraderepo`). However, this regression will only be noticed by very large changegroups. And I'm guessing/hoping that any repository that large is using clonebundles to mitigate server load. Again, sending DAG order isn't the optimal send order for servers: sending in storage-delta order is. But in order to enable storage-optimal send order, we'll need a storage API that handles sorting. Future commits will introduce such an API. Differential Revision: https://phab.mercurial-scm.org/D4721
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 24 Sep 2018 09:41:42 -0700
parents 5adc5fe41a7d
children 31b7e8e7132e
comparison
equal deleted inserted replaced
39860:d9b3cc3d5d07 39861:db5501d93bcf
34 util, 34 util,
35 ) 35 )
36 36
37 from .utils import ( 37 from .utils import (
38 interfaceutil, 38 interfaceutil,
39 stringutil,
40 ) 39 )
41 40
42 _CHANGEGROUPV1_DELTA_HEADER = struct.Struct("20s20s20s20s") 41 _CHANGEGROUPV1_DELTA_HEADER = struct.Struct("20s20s20s20s")
43 _CHANGEGROUPV2_DELTA_HEADER = struct.Struct("20s20s20s20s20s") 42 _CHANGEGROUPV2_DELTA_HEADER = struct.Struct("20s20s20s20s20s")
44 _CHANGEGROUPV3_DELTA_HEADER = struct.Struct(">20s20s20s20s20sH") 43 _CHANGEGROUPV3_DELTA_HEADER = struct.Struct(">20s20s20s20s20sH")
535 yield meta 534 yield meta
536 if prefix: 535 if prefix:
537 yield prefix 536 yield prefix
538 yield data 537 yield data
539 538
540 def _sortnodesnormal(store, nodes, reorder): 539 def _sortnodesnormal(store, nodes):
541 """Sort nodes for changegroup generation and turn into revnums.""" 540 """Sort nodes for changegroup generation and turn into revnums."""
542 # for generaldelta revlogs, we linearize the revs; this will both be 541 # for generaldelta revlogs, we linearize the revs; this will both be
543 # much quicker and generate a much smaller bundle 542 # much quicker and generate a much smaller bundle
544 if (store._generaldelta and reorder is None) or reorder: 543 if store._generaldelta:
545 revs = set(store.rev(n) for n in nodes) 544 revs = set(store.rev(n) for n in nodes)
546 return dagop.linearize(revs, store.parentrevs) 545 return dagop.linearize(revs, store.parentrevs)
547 else: 546 else:
548 return sorted([store.rev(n) for n in nodes]) 547 return sorted([store.rev(n) for n in nodes])
549 548
654 basenode=nullid, 653 basenode=nullid,
655 ellipsis=True, 654 ellipsis=True,
656 ) 655 )
657 656
658 def deltagroup(repo, store, nodes, ischangelog, lookup, forcedeltaparentprev, 657 def deltagroup(repo, store, nodes, ischangelog, lookup, forcedeltaparentprev,
659 allowreorder,
660 topic=None, 658 topic=None,
661 ellipses=False, clrevtolocalrev=None, fullclnodes=None, 659 ellipses=False, clrevtolocalrev=None, fullclnodes=None,
662 precomputedellipsis=None): 660 precomputedellipsis=None):
663 """Calculate deltas for a set of revisions. 661 """Calculate deltas for a set of revisions.
664 662
689 # store producing the deltas. 687 # store producing the deltas.
690 revs = sorted(cl.rev(n) for n in nodes) 688 revs = sorted(cl.rev(n) for n in nodes)
691 elif ellipses: 689 elif ellipses:
692 revs = _sortnodesellipsis(store, nodes, cl, lookup) 690 revs = _sortnodesellipsis(store, nodes, cl, lookup)
693 else: 691 else:
694 revs = _sortnodesnormal(store, nodes, allowreorder) 692 revs = _sortnodesnormal(store, nodes)
695 693
696 # In the first pass, collect info about the deltas we'll be 694 # In the first pass, collect info about the deltas we'll be
697 # generating. 695 # generating.
698 requests = [] 696 requests = []
699 697
754 752
755 if progress: 753 if progress:
756 progress.complete() 754 progress.complete()
757 755
758 class cgpacker(object): 756 class cgpacker(object):
759 def __init__(self, repo, filematcher, version, allowreorder, 757 def __init__(self, repo, filematcher, version,
760 builddeltaheader, manifestsend, 758 builddeltaheader, manifestsend,
761 forcedeltaparentprev=False, 759 forcedeltaparentprev=False,
762 bundlecaps=None, ellipses=False, 760 bundlecaps=None, ellipses=False,
763 shallow=False, ellipsisroots=None, fullnodes=None): 761 shallow=False, ellipsisroots=None, fullnodes=None):
764 """Given a source repo, construct a bundler. 762 """Given a source repo, construct a bundler.
765 763
766 filematcher is a matcher that matches on files to include in the 764 filematcher is a matcher that matches on files to include in the
767 changegroup. Used to facilitate sparse changegroups. 765 changegroup. Used to facilitate sparse changegroups.
768
769 allowreorder controls whether reordering of revisions is allowed.
770 This value is used when ``bundle.reorder`` is ``auto`` or isn't
771 set.
772 766
773 forcedeltaparentprev indicates whether delta parents must be against 767 forcedeltaparentprev indicates whether delta parents must be against
774 the previous revision in a delta group. This should only be used for 768 the previous revision in a delta group. This should only be used for
775 compatibility with changegroup version 1. 769 compatibility with changegroup version 1.
776 770
811 self._fullclnodes = fullnodes 805 self._fullclnodes = fullnodes
812 806
813 # Maps ellipsis revs to their roots at the changelog level. 807 # Maps ellipsis revs to their roots at the changelog level.
814 self._precomputedellipsis = ellipsisroots 808 self._precomputedellipsis = ellipsisroots
815 809
816 # experimental config: bundle.reorder
817 reorder = repo.ui.config('bundle', 'reorder')
818 if reorder == 'auto':
819 self._reorder = allowreorder
820 else:
821 self._reorder = stringutil.parsebool(reorder)
822
823 self._repo = repo 810 self._repo = repo
824 811
825 if self._repo.ui.verbose and not self._repo.ui.debugflag: 812 if self._repo.ui.verbose and not self._repo.ui.debugflag:
826 self._verbosenote = self._repo.ui.note 813 self._verbosenote = self._repo.ui.note
827 else: 814 else:
860 # We need to make sure that the linkrev in the changegroup refers to 847 # We need to make sure that the linkrev in the changegroup refers to
861 # the first changeset that introduced the manifest or file revision. 848 # the first changeset that introduced the manifest or file revision.
862 # The fastpath is usually safer than the slowpath, because the filelogs 849 # The fastpath is usually safer than the slowpath, because the filelogs
863 # are walked in revlog order. 850 # are walked in revlog order.
864 # 851 #
865 # When taking the slowpath with reorder=None and the manifest revlog 852 # When taking the slowpath when the manifest revlog uses generaldelta,
866 # uses generaldelta, the manifest may be walked in the "wrong" order. 853 # the manifest may be walked in the "wrong" order. Without 'clrevorder',
867 # Without 'clrevorder', we would get an incorrect linkrev (see fix in 854 # we would get an incorrect linkrev (see fix in cc0ff93d0c0c).
868 # cc0ff93d0c0c).
869 # 855 #
870 # When taking the fastpath, we are only vulnerable to reordering 856 # When taking the fastpath, we are only vulnerable to reordering
871 # of the changelog itself. The changelog never uses generaldelta, so 857 # of the changelog itself. The changelog never uses generaldelta and is
872 # it is only reordered when reorder=True. To handle this case, we 858 # never reordered. To handle this case, we simply take the slowpath,
873 # simply take the slowpath, which already has the 'clrevorder' logic. 859 # which already has the 'clrevorder' logic. This was also fixed in
874 # This was also fixed in cc0ff93d0c0c. 860 # cc0ff93d0c0c.
875 fastpathlinkrev = fastpathlinkrev and not self._reorder 861
876 # Treemanifests don't work correctly with fastpathlinkrev 862 # Treemanifests don't work correctly with fastpathlinkrev
877 # either, because we don't discover which directory nodes to 863 # either, because we don't discover which directory nodes to
878 # send along with files. This could probably be fixed. 864 # send along with files. This could probably be fixed.
879 fastpathlinkrev = fastpathlinkrev and ( 865 fastpathlinkrev = fastpathlinkrev and (
880 'treemanifest' not in repo.requirements) 866 'treemanifest' not in repo.requirements)
1001 } 987 }
1002 988
1003 gen = deltagroup( 989 gen = deltagroup(
1004 self._repo, cl, nodes, True, lookupcl, 990 self._repo, cl, nodes, True, lookupcl,
1005 self._forcedeltaparentprev, 991 self._forcedeltaparentprev,
1006 # Reorder settings are currently ignored for changelog.
1007 True,
1008 ellipses=self._ellipses, 992 ellipses=self._ellipses,
1009 topic=_('changesets'), 993 topic=_('changesets'),
1010 clrevtolocalrev={}, 994 clrevtolocalrev={},
1011 fullclnodes=self._fullclnodes, 995 fullclnodes=self._fullclnodes,
1012 precomputedellipsis=self._precomputedellipsis) 996 precomputedellipsis=self._precomputedellipsis)
1085 1069
1086 lookupfn = makelookupmflinknode(tree, nodes) 1070 lookupfn = makelookupmflinknode(tree, nodes)
1087 1071
1088 deltas = deltagroup( 1072 deltas = deltagroup(
1089 self._repo, store, prunednodes, False, lookupfn, 1073 self._repo, store, prunednodes, False, lookupfn,
1090 self._forcedeltaparentprev, self._reorder, 1074 self._forcedeltaparentprev,
1091 ellipses=self._ellipses, 1075 ellipses=self._ellipses,
1092 topic=_('manifests'), 1076 topic=_('manifests'),
1093 clrevtolocalrev=clrevtolocalrev, 1077 clrevtolocalrev=clrevtolocalrev,
1094 fullclnodes=self._fullclnodes, 1078 fullclnodes=self._fullclnodes,
1095 precomputedellipsis=self._precomputedellipsis) 1079 precomputedellipsis=self._precomputedellipsis)
1182 1166
1183 progress.update(i + 1, item=fname) 1167 progress.update(i + 1, item=fname)
1184 1168
1185 deltas = deltagroup( 1169 deltas = deltagroup(
1186 self._repo, filerevlog, filenodes, False, lookupfilelog, 1170 self._repo, filerevlog, filenodes, False, lookupfilelog,
1187 self._forcedeltaparentprev, self._reorder, 1171 self._forcedeltaparentprev,
1188 ellipses=self._ellipses, 1172 ellipses=self._ellipses,
1189 clrevtolocalrev=clrevtolocalrev, 1173 clrevtolocalrev=clrevtolocalrev,
1190 fullclnodes=self._fullclnodes, 1174 fullclnodes=self._fullclnodes,
1191 precomputedellipsis=self._precomputedellipsis) 1175 precomputedellipsis=self._precomputedellipsis)
1192 1176
1198 shallow=False, ellipsisroots=None, fullnodes=None): 1182 shallow=False, ellipsisroots=None, fullnodes=None):
1199 builddeltaheader = lambda d: _CHANGEGROUPV1_DELTA_HEADER.pack( 1183 builddeltaheader = lambda d: _CHANGEGROUPV1_DELTA_HEADER.pack(
1200 d.node, d.p1node, d.p2node, d.linknode) 1184 d.node, d.p1node, d.p2node, d.linknode)
1201 1185
1202 return cgpacker(repo, filematcher, b'01', 1186 return cgpacker(repo, filematcher, b'01',
1203 allowreorder=None,
1204 builddeltaheader=builddeltaheader, 1187 builddeltaheader=builddeltaheader,
1205 manifestsend=b'', 1188 manifestsend=b'',
1206 forcedeltaparentprev=True, 1189 forcedeltaparentprev=True,
1207 bundlecaps=bundlecaps, 1190 bundlecaps=bundlecaps,
1208 ellipses=ellipses, 1191 ellipses=ellipses,
1213 def _makecg2packer(repo, filematcher, bundlecaps, ellipses=False, 1196 def _makecg2packer(repo, filematcher, bundlecaps, ellipses=False,
1214 shallow=False, ellipsisroots=None, fullnodes=None): 1197 shallow=False, ellipsisroots=None, fullnodes=None):
1215 builddeltaheader = lambda d: _CHANGEGROUPV2_DELTA_HEADER.pack( 1198 builddeltaheader = lambda d: _CHANGEGROUPV2_DELTA_HEADER.pack(
1216 d.node, d.p1node, d.p2node, d.basenode, d.linknode) 1199 d.node, d.p1node, d.p2node, d.basenode, d.linknode)
1217 1200
1218 # Since generaldelta is directly supported by cg2, reordering
1219 # generally doesn't help, so we disable it by default (treating
1220 # bundle.reorder=auto just like bundle.reorder=False).
1221 return cgpacker(repo, filematcher, b'02', 1201 return cgpacker(repo, filematcher, b'02',
1222 allowreorder=False,
1223 builddeltaheader=builddeltaheader, 1202 builddeltaheader=builddeltaheader,
1224 manifestsend=b'', 1203 manifestsend=b'',
1225 bundlecaps=bundlecaps, 1204 bundlecaps=bundlecaps,
1226 ellipses=ellipses, 1205 ellipses=ellipses,
1227 shallow=shallow, 1206 shallow=shallow,
1232 shallow=False, ellipsisroots=None, fullnodes=None): 1211 shallow=False, ellipsisroots=None, fullnodes=None):
1233 builddeltaheader = lambda d: _CHANGEGROUPV3_DELTA_HEADER.pack( 1212 builddeltaheader = lambda d: _CHANGEGROUPV3_DELTA_HEADER.pack(
1234 d.node, d.p1node, d.p2node, d.basenode, d.linknode, d.flags) 1213 d.node, d.p1node, d.p2node, d.basenode, d.linknode, d.flags)
1235 1214
1236 return cgpacker(repo, filematcher, b'03', 1215 return cgpacker(repo, filematcher, b'03',
1237 allowreorder=False,
1238 builddeltaheader=builddeltaheader, 1216 builddeltaheader=builddeltaheader,
1239 manifestsend=closechunk(), 1217 manifestsend=closechunk(),
1240 bundlecaps=bundlecaps, 1218 bundlecaps=bundlecaps,
1241 ellipses=ellipses, 1219 ellipses=ellipses,
1242 shallow=shallow, 1220 shallow=shallow,