Gregory Szorc <gregory.szorc@gmail.com> [Tue, 28 Aug 2018 18:19:23 -0700] rev 39632
wireprotov2: add phases to "changesetdata" command
This commit teaches the "changesetdata" wire protocol command
to emit the phase state for each changeset.
This is a different approach from existing phase transfer in a
few ways. Previously, if there are no new revisions (or we're
not using bundle2), we perform a "listkeys" request to retrieve
phase heads. And when revision data is being transferred
with bundle2, phases data is encoded in a standalone bundle2 part.
In both cases, phases data is logically decoupled from the changeset
data and is encountered/applied after changeset revision data
is received.
The new wire protocol purposefully tries to more tightly associate
changeset metadata (phases, bookmarks, obsolescence markers, etc)
with the changeset revision and index data itself, rather than
have it live as a separate entity that must be fetched and
processed separately. I reckon that one reason we didn't do this
before was it was difficult to add new data types/fields without
breaking existing consumers. By using CBOR maps to transfer
changeset data and putting clients in control of what fields are
requested / present in those maps, we can easily add additional
changeset data while maintaining backwards compatibility. I believe
this to be a superior approach to the problem.
That being said, for performance reasons, we may need to resort
to alternative mechanisms for transferring data like phases. But
for now, I think giving the wire protocol the ability to transfer
changeset metadata next to the changeset itself is a powerful feature
because it is a raw, changeset-centric data API. And if you build
simple APIs for accessing the fundamental units of repository data,
you enable client-side experimentation (partial clone, etc). If it
turns out that we need specialized APIs or mechanisms for transferring
data like phases, we can build in those APIs later. For now, I'd
like to see how far we can get on simple APIs.
It's worth noting that when phase data is being requested, the
server will also emit changeset records for nodes in the bases
specified by the "noderange" argument. This is to ensure that
phase-only updates for nodes the client has are available to the
client, even if no new changesets will be transferred.
Differential Revision: https://phab.mercurial-scm.org/D4483
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:36 -0700] rev 39631
exchangev2: fetch changeset revisions
All Mercurial repository data is derived from changesets:
you can't do anything unless you have changesets. Therefore,
it makes sense for changesets to be the first piece of data
that we transfer as part of pull.
To do this, we call our new "changesetdata" command, requesting
parents and revision data. This gives us all the data that a
changegroup delta group would give us. We simply normalize
this data into what addgroup() expects and call that API on
the changelog to bulk insert revisions into the changelog.
Code in this commit is heavily borrowed from
changegroup.cg1unpacker.apply().
Differential Revision: https://phab.mercurial-scm.org/D4482
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:16 -0700] rev 39630
wireprotov2: define and implement "changesetdata" command
This commit introduces the "changesetdata" wire protocol command.
The role of the command is to expose data associated with changelog
revisions, including the raw revision data itself.
This command is the first piece of a new clone/pull strategy that
is built on top of domain-specific commands for data retrieval.
Instead of a monolithic "getbundle" command that transfers all of the
things, we'll be introducing commands for fetching specific pieces
of data.
Since the changeset is the fundamental unit from which we derive
pointers to other data (manifests, file nodes, etc), it makes sense
to start reimplementing pull with this data.
The command accepts as arguments a set of root and head revisions
defining the changesets that should be fetched as well as an explicit
list of nodes. By default, the command returns only the node values:
the client must explicitly request additional fields be added to the
response. Current supported fields are the list of parent nodes and
the revision fulltext.
My plan is to eventually add support for transferring other data
associated with changesets, including phases, bookmarks, obsolescence
markers, etc. Since the response format is CBOR, we'll be able to add
this data into the response object relatively easily (it should be
as simple as adding a key in a map).
The documentation captures a number of TODO items. Some of these may
require BC breaking changes. That's fine: wire protocol v2 is still
highly experimental.
Differential Revision: https://phab.mercurial-scm.org/D4481
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 09:58:23 -0700] rev 39629
exchangev2: start to implement pull with wire protocol v2
Wire protocol version 2 will take a substantially different
approach to exchange than version 1 (at least as far as pulling
is concerned).
This commit establishes a new exchangev2 module for holding
code related to exchange using wire protocol v2. I could have
added things to the existing exchange module. But it is already
quite big. And doing things inline isn't in question because
the existing code is already littered with conditional code
for various states of support for the existing wire protocol
as it evolved over 10+ years. A new module gives us a chance
to make a clean break.
This approach does mean we'll end up writing some duplicate
code. And there's a significant chance we'll miss functionality
as code is ported. The plan is to eventually add #testcase's
to existing tests so the new wire protocol is tested side-by-side
with the existing one. This will hopefully tease out any
features that weren't ported properly. But before we get there,
we need to build up support for the new exchange methods.
Our journey towards implementing a new exchange begins with pulling.
And pulling begins with discovery.
The discovery code added to exchangev2 is heavily drawn from
the following functions:
* exchange._pulldiscoverychangegroup
* discovery.findcommonincoming
For now, we build on top of existing discovery mechanisms. The
new wire protocol should be capable of doing things more efficiently.
But I'd rather defer on this problem.
To foster the transition, we invent a fake capability on the HTTPv2
peer and have the main pull code in exchange.py call into exchangev2
when the new wire protocol is being used.
Differential Revision: https://phab.mercurial-scm.org/D4480
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 21 Aug 2018 15:33:11 -0700] rev 39628
httppeer: expose capabilities for each command
This will help code using peers to sniff out exactly what servers
support.
Differential Revision: https://phab.mercurial-scm.org/D4436
spectral <spectral@google.com> [Thu, 13 Sep 2018 22:48:27 -0700] rev 39627
narrow: intersect provided matcher with narrowmatcher in `hg diff`
This provides significant speedups when running diff, and no change in behavior
that I'm aware of (or that the tests found). I tested with a repo that I started
using narrow in after it was created and attempted to run `hg diff -c .` and
similar commands in it on a commit that had files not in the narrowspec.
Timing numbers below, using a similar setup as my previous commits.
before=
9db85644, m-u is mozilla-unified at
eb39298e432d (flatmanifest) and
0553b7f29eaf (treemanifest). l-d-r is a repo simulating a situation I've
encountered where there's one directory with 30k+ subdirectories. N means
narrow, T means treemanifest. The narrowspec is pretty small when in use, and
importantly the narrowspec is applied *after* doing the initial checkout
(without narrowing), so all of these files exist in the filesystem, which is not
normally the case if someone has been using narrow for the entire life of the
clone.
Anything less than a 5% difference in performance is most likely noise.
diff --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.292 s +- 0.009 s | 1.295 s +- 0.010 s | 100.2%
m-u | | x | 1.296 s +- 0.042 s | 1.299 s +- 0.026 s | 100.2%
m-u | x | | 1.292 s +- 0.010 s | 1.297 s +- 0.021 s | 100.4%
m-u | x | x | 84.2 ms +- 1.2 ms | 83.6 ms +- 0.2 ms | 99.3%
l-d-r | | | 188.7 ms +- 2.7 ms | 188.8 ms +- 2.0 ms | 100.1%
l-d-r | | x | 189.9 ms +- 1.5 ms | 189.4 ms +- 1.2 ms | 99.7%
l-d-r | x | | 97.1 ms +- 1.0 ms | 87.1 ms +- 1.0 ms | 89.7% <--
l-d-r | x | x | 96.9 ms +- 0.8 ms | 87.2 ms +- 0.7 ms | 90.0% <--
diff -c . --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 231.6 ms +- 3.1 ms | 228.9 ms +- 1.6 ms | 98.8%
m-u | | x | 150.5 ms +- 1.7 ms | 150.7 ms +- 1.4 ms | 100.1%
m-u | x | | 233.7 ms +- 2.4 ms | 232.2 ms +- 1.9 ms | 99.4%
m-u | x | x | 126.1 ms +- 1.2 ms | 126.8 ms +- 1.2 ms | 100.6%
l-d-r | | | 82.1 ms +- 2.0 ms | 81.8 ms +- 1.4 ms | 99.6%
l-d-r | | x | 3.732 s +- 0.020 s | 3.746 s +- 0.027 s | 100.4%
l-d-r | x | | 83.1 ms +- 0.8 ms | 107.6 ms +- 2.4 ms | 129.5% <--
l-d-r | x | x | 758.2 ms +- 38.8 ms | 188.5 ms +- 1.8 ms | 24.9% <--
rebase -r . --keep -d .^^:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 5.532 s +- 0.087 s | 5.496 s +- 0.016 s | 99.3%
m-u | | x | 5.554 s +- 0.061 s | 5.532 s +- 0.013 s | 99.6%
m-u | x | | 5.602 s +- 0.134 s | 5.508 s +- 0.035 s | 98.3%
m-u | x | x | 582.2 ms +- 15.2 ms | 572.9 ms +- 12.0 ms | 98.4%
l-d-r | | | 629.5 ms +- 12.3 ms | 622.5 ms +- 7.3 ms | 98.9%
l-d-r | | x | 6.173 s +- 0.062 s | 6.185 s +- 0.076 s | 100.2%
l-d-r | x | | 274.5 ms +- 10.0 ms | 272.1 ms +- 6.2 ms | 99.1%
l-d-r | x | x | 4.835 s +- 0.056 s | 4.826 s +- 0.034 s | 99.8%
status --change . --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 214.4 ms +- 1.4 ms | 212.2 ms +- 1.7 ms | 99.0%
m-u | | x | 130.9 ms +- 1.2 ms | 131.7 ms +- 1.1 ms | 100.6%
m-u | x | | 215.0 ms +- 2.1 ms | 214.9 ms +- 2.7 ms | 100.0%
m-u | x | x | 109.5 ms +- 2.3 ms | 107.8 ms +- 0.9 ms | 98.4%
l-d-r | | | 79.6 ms +- 0.9 ms | 79.8 ms +- 1.6 ms | 100.3%
l-d-r | | x | 3.799 s +- 0.037 s | 3.928 s +- 0.021 s | 103.4% <--?
l-d-r | x | | 82.7 ms +- 0.7 ms | 83.2 ms +- 1.0 ms | 100.6%
l-d-r | x | x | 746.8 ms +- 6.1 ms | 739.0 ms +- 4.2 ms | 99.0%
status --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.884 s +- 0.012 s | 1.885 s +- 0.013 s | 100.1%
m-u | | x | 1.897 s +- 0.027 s | 1.909 s +- 0.077 s | 100.6%
m-u | x | | 1.886 s +- 0.021 s | 1.891 s +- 0.030 s | 100.3%
m-u | x | x | 92.0 ms +- 0.7 ms | 92.4 ms +- 0.4 ms | 100.4%
l-d-r | | | 570.3 ms +- 18.7 ms | 552.2 ms +- 4.5 ms | 96.8%
l-d-r | | x | 568.9 ms +- 16.1 ms | 567.2 ms +- 11.9 ms | 99.7%
l-d-r | x | | 171.1 ms +- 2.5 ms | 170.4 ms +- 1.2 ms | 99.6%
l-d-r | x | x | 171.6 ms +- 3.4 ms | 171.5 ms +- 1.7 ms | 99.9%
update $rev^; ~/src/hg/hg{hg}/hg update $rev:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 3.107 s +- 0.017 s | 3.116 s +- 0.012 s | 100.3%
m-u | | x | 2.943 s +- 0.010 s | 2.945 s +- 0.019 s | 100.1%
m-u | x | | 3.116 s +- 0.033 s | 3.118 s +- 0.027 s | 100.1%
m-u | x | x | 318.5 ms +- 2.7 ms | 320.8 ms +- 4.8 ms | 100.7%
l-d-r | | | 428.9 ms +- 4.4 ms | 429.5 ms +- 4.0 ms | 100.1%
l-d-r | | x | 9.593 s +- 0.081 s | 9.869 s +- 0.043 s | 102.9%
l-d-r | x | | 253.2 ms +- 3.6 ms | 254.0 ms +- 2.8 ms | 100.3%
l-d-r | x | x | 1.613 s +- 0.009 s | 1.630 s +- 0.017 s | 101.1%
Differential Revision: https://phab.mercurial-scm.org/D4587
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 12:15:02 +0900] rev 39626
identify: change {parents} to a list of nodes (BC)
This is a part of the name unification. {parents} is a list of nodes in
"hg log -Tjson" output. Since {rev} can be computed from (repo, node) pair,
we no longer need to put it to provide {rev} to user templates.
https://www.mercurial-scm.org/wiki/GenericTemplatingPlan#Dictionary
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 12:09:22 +0900] rev 39625
identify: use fm.hexfunc thoroughly
This fixes the length of {id} in JSON and template outputs.
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 15:52:18 +0900] rev 39624
formatter: replace contexthint() with demand loading of ctx object
And pass in repo instead to resolve ctx from (repo, node) pair.
Yuya Nishihara <yuya@tcha.org> [Thu, 07 Jun 2018 21:48:11 +0900] rev 39623
formatter: populate ctx from repo and node value
This will basically replace the fm.contexthint() API. I originally thought
this would be too complicated, and I wrote
8399438bc7ef "formatter: provide
hint of context keys required by template" because of that. However, I had
to add a similar mechanism for fctx templates, and the overall machinery
became way simpler than my original patch.
The test output slightly changed as {author} is no longer available in
the {manifest} context, which isn't the point this test targeted on.
Augie Fackler <augie@google.com> [Fri, 14 Sep 2018 18:18:46 -0400] rev 39622
merge with stable
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:37:20 +0300] rev 39621
py3: call hgweb.hgweb() with bytes values
# skip-blame because just b'' prefixes
I believe this should fix some tests.
Differential Revision: https://phab.mercurial-scm.org/D4594
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:24:05 +0300] rev 39620
py3: use '%d' for integers instead of '%s'
Differential Revision: https://phab.mercurial-scm.org/D4593
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:17:56 +0300] rev 39619
py3: use "%f" for floats instead of "%s"
Differential Revision: https://phab.mercurial-scm.org/D4592
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:01:52 +0300] rev 39618
py3: suppress the return value from .write() call
Differential Revision: https://phab.mercurial-scm.org/D4591
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:01:20 +0300] rev 39617
py3: add b'' prefixes in tests/test-diff-color.t
# skip-blame because just b'' prefixes
Differential Revision: https://phab.mercurial-scm.org/D4590
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 14 Sep 2018 23:59:41 +0300] rev 39616
py3: slice through bytes to prevent getting ascii value
I still don't know why python-dev thought it was a nice idea to do this.
Differential Revision: https://phab.mercurial-scm.org/D4589
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Thu, 13 Sep 2018 16:22:53 -0400] rev 39615
censor: use a reasonable amount of memory
Before this change, trying to censor some random revision uses an ever
increasing amount of memory (I stopped at 20GB, but it was by no means
finished), presumably because these contexts have a lot of
information that is kept alive.
After this change, the memory usage plateaus quickly.
Differential Revision: https://phab.mercurial-scm.org/D4582
Yuya Nishihara <yuya@tcha.org> [Fri, 14 Sep 2018 22:25:44 +0900] rev 39614
help: add internals.wireprotocolrpc to the table
Yuya Nishihara <yuya@tcha.org> [Fri, 14 Sep 2018 22:23:02 +0900] rev 39613
setup: exclude vendored futures package on Python 3
The vendored future can't live on Python 3.
Augie Fackler <augie@google.com> [Thu, 13 Sep 2018 11:08:08 -0400] rev 39612
py3: whitelist another passing test
Differential Revision: https://phab.mercurial-scm.org/D4562
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 00:42:25 -0400] rev 39611
py3: prevent the win32 ctype _fields_ from being transformed to bytes
Otherwise, any hg invocation dies with
TypeError: '_fields_' must be a sequence of (name, C type) pairs
# skip-blame just a r prefix
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 17:32:20 -0400] rev 39610
cext: fix warnings when building for py3 on Windows
MSVC++ 14 now has standard int types that don't need to be redefined (I didn't
go back to see when they came along since the build system wants either 2008 or
2015), but doesn't have ssize_t. The FILE pointer in posixfile is only used on
python2.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 12:43:50 -0400] rev 39609
cext: stop preprocessing a partial function call
MSVC++ 14 yelled:
mercurial/cext/revlog.c(1913): fatal error C1057: unexpected end of file in
macro expansion
At this point, the C extensions build (with warnings), and it dies in win32.py
because the `_fields_` strings in the ctypes classes are being converted to
bytes by the source translator.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 12:37:32 -0400] rev 39608
py3: add b'' to some setup.py strings for Windows
These were things found trying to do `make PYTHON="py -3" local`. The following
is dumped out, before dying while compiling the C extensions:
C:\Program Files\Python37\lib\site-packages\setuptools\dist.py:406: UserWarning:
The version specified (b'4.7.1') is an invalid version, this may not work as
expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440
for more details.
"details." % self.metadata.version
running build_py
byte-compiling .\mercurial\thirdparty\concurrent\futures\_base.py to _base.cpython-37.pyc
File "mercurial\thirdparty\concurrent\futures\_base.py", line 416
raise exception_type, self._exception, self._traceback
^
SyntaxError: invalid syntax
# skip-blame since these are just converting to bytes literals
Augie Fackler <augie@google.com> [Thu, 13 Sep 2018 18:09:22 -0400] rev 39607
dagop: fix typo spotted while doing unrelated investigation
Differential Revision: https://phab.mercurial-scm.org/D4584
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 19:00:46 -0700] rev 39606
hg: don't reuse repo instance after unshare()
Unsharing a repository is a pretty invasive procedure and fundamentally
changes the behavior of the repository.
Currently, hg.unshare() calls into localrepository.__init__ to
re-initialize the repository instance. This is a bit hacky. And
future commits that refactor how localrepository instances are
constructed will make this difficult to support.
This commit changes unshare() so it constructs a new repo instance
once the unshare I/O has completed. It then poisons the old repo
instance so any further use will result in error.
Surprisingly, nothing in core appears to access a repo instance
after it has been unshared!
.. api::
``hg.unshare()`` now poisons the repo instance so it can't be used.
It also returns a new repo instance suitable for interacting with
the unshared repository.
Differential Revision: https://phab.mercurial-scm.org/D4557
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 20:06:39 -0700] rev 39605
unionrepo: dynamically create repository type from base repository
This is basically the same thing we just did for bundlerepo except
for union repositories.
.. api::
``unionrepo.unionrepository()`` is no longer usable on its own.
To instantiate an instance, call ``unionrepo.instance()`` or
``unionrepo.makeunionrepository()``.
Differential Revision: https://phab.mercurial-scm.org/D4556
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 19:50:07 -0700] rev 39604
bundlerepo: dynamically create repository type from base repository
Previously, bundlerepository inherited from localrepo.localrepository.
You simply instantiated a bundlerepository and its __init__ called
localrepo.localrepository.__init__. Things were simple.
Unfortunately, this strategy is limiting because it assumes that
the base repository is a localrepository instance. And it assumes
various properties of localrepository, such as the arguments its
__init__ takes. And it prevents us from changing behavior of
localrepository.__init__ without also having to change derived classes.
Previous and ongoing work to abstract storage revealed these
limitations.
This commit changes the initialization strategy of bundle repositories
to dynamically create a type to represent the repository. Instead of
a static type, we instantiate a new local repo instance via
localrepo.instance(). We then combine its __class__ with
bundlerepository to produce a new type. This ensures that no matter
how localrepo.instance() decides to create a repository object, we
can derive a bundle repo object from it. i.e. localrepo.instance()
could return a type that isn't a localrepository and it would "just
work."
Well, it would "just work" if bundlerepository's custom implementations
only accessed attributes in the documented repository interface. I'm
pretty sure it violates the interface contract in a handful of
places. But we can worry about that another day. This change gets us
closer to doing more clever things around instantiating repository
instances without having to worry about teaching bundlerepository about
them.
.. api::
``bundlerepo.bundlerepository`` is no longer usable on its own.
The class is combined with the class of the base repository it is
associated with at run-time.
New bundlerepository instances can be obtained by calling
``bundlerepo.instance()`` or ``bundlerepo.makebundlerepository()``.
Differential Revision: https://phab.mercurial-scm.org/D4555
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 19:16:32 -0700] rev 39603
bundlerepo: factor out code for instantiating a bundle repository
This code will soon become a bit more complicated. So extract to its
own function.
And change both instantiators of bundlerepository to use it.
Differential Revision: https://phab.mercurial-scm.org/D4554