Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:58:01 +0530] rev 39673
py3: use codecs.encode() to encode in rot-13 encoding
The other occurence will need some more love as description is bytes by default
and we need to decode it and then encode it.
Differential Revision: https://phab.mercurial-scm.org/D4608
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:18:15 +0530] rev 39672
py3: add two passing tests to whitelist found by buildbot
The buildbot found these two new passing tests on Python 3.
Differential Revision: https://phab.mercurial-scm.org/D4607
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:36:43 -0400] rev 39671
phabricator: mark extension as experimental for now
I don't want us to commit to this having a stable interface just yet.
Differential Revision: https://phab.mercurial-scm.org/D4605
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:16:31 -0400] rev 39670
phabricator: fix templating bug by using hybriddict
Differential Revision: https://phab.mercurial-scm.org/D4604
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:13:37 -0400] rev 39669
phabricator: add tests of templatekeyword
Having tests is paying off: I found a bug and now it'll be easy to
fix!
Differential Revision: https://phab.mercurial-scm.org/D4603
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:46:17 -0400] rev 39668
phabricator: move extension from contrib to hgext
It's well-enough tested now and widely enough used I think we should
ship it.
Differential Revision: https://phab.mercurial-scm.org/D4602
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:50:21 -0400] rev 39667
tests: add some basic tests of phabricator interactions
This uses the vcr library to avoid hitting phabricator on every test
execution. In order to generate new recordings (vcr calls them
cassettes) just remove the appropriate json file, and the test will
regenerate it. It's not my favorite way to test things, but it'll let
us have test coverage on the phabricator extension that'll make it
resilient to refactors in core and let us move it to hgext.
In the future, it'd probably be better to have a docker container we
can spin up for creating the vcr recordings, but for now this is
enough better than nothing I'm going to declare victory.
Coverage reports about 73% of the extension is now covered.
Differential Revision: https://phab.mercurial-scm.org/D4601
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:20:03 -0400] rev 39666
phabricator: add support for using the vcr library to mock interactions
I'll use this in an upcoming test. The decorator dancing in this is
more complicated than I'd like, but it beats repeating all this code
everywhere.
Differential Revision: https://phab.mercurial-scm.org/D4600
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:19:09 -0400] rev 39665
keepalive: work around slight deficiency in vcr
VCR's response type doesn't define the will_close attribute. Let's
just have keepalive default to closing the socket if the will_close
attribute is missing.
Differential Revision: https://phab.mercurial-scm.org/D4599
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:18:16 -0400] rev 39664
hghave: add a checker for the vcr HTTP record/replay library
I'm going to use this to write some tests of the phabricator
extension.
Differential Revision: https://phab.mercurial-scm.org/D4598
Matt Harbison <matt_harbison@yahoo.com> [Sat, 15 Sep 2018 00:04:06 -0400] rev 39663
py3: allow run-tests.py to run on Windows
This is now functional:
HGMODULEPOLICY=py py -3 run-tests.py --local test-help.t --pure --view bcompare
However, on this machine without a C compiler, it tries to load cext anyway, and
blows up. I haven't looked into why, other than to see that it does set the
environment variable. When the test exits though, I see it can't find
killdaemons.py, get-with-headers.py, etc.
I have no idea why these changes are needed, given that it runs on Linux. But
os.system() is insisting that it take a str, and subprocess.Popen() blows up
without str:
Errored test-help.t: Traceback (most recent call last):
File "run-tests.py", line 810, in run
self.runTest()
File "run-tests.py", line 858, in runTest
ret, out = self._run(env)
File "run-tests.py", line 1268, in _run
exitcode, output = self._runcommand(cmd, env)
File "run-tests.py", line 1141, in _runcommand
env=env)
File "C:\Program Files\Python37\lib\subprocess.py", line 756, in __init__
restore_signals, start_new_session)
File "C:\Program Files\Python37\lib\subprocess.py", line 1100, in _execute_child
args = list2cmdline(args)
File "C:\Program Files\Python37\lib\subprocess.py", line 511, in list2cmdline
needquote = (" " in arg) or ("\t" in arg) or not arg
TypeError: argument of type 'int' is not iterable
This is exactly how it crashes when trying to spin up a pager too. I left one
instance of os.system() unchanged in _installhg(), because it doesn't get there.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 23:04:18 -0400] rev 39662
py3: ensure run-tests environment is uniformly str
subprocess.popen() was crashing, and when I printed out `env`, all of the keys
and most of the values were str. Except these.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 22:57:35 -0400] rev 39661
py3: ensure run-tests.osenvironb is actually bytes
Windows doesn't have os.environb, so it was falling back to the Unicode form,
and all of the accesses are trying to use bytes.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 22:07:00 -0400] rev 39660
py3: fix str vs bytes in enough places to run `hg version` on Windows
I don't have Visual Studio 2015 at home, but this now works with a handful of
extensions (blackbox, extdiff, patchbomb, phabricator and rebase, but not
evolve):
$ HGMODULEPOLICY=py py -3 ../hg version
Enabling the evolve extension causes the usual "failed to import ..." line, but
then print this before the usual version output:
('commit', '[b'debugancestor', b'debugapplystreamclonebundle', ...,
b'verify', b'version']')
... where the elided part seems to be every command and alias known.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 20:54:53 -0400] rev 39659
windows: open registry keys using unicode names
Python3 complained it must be str. While here, use a context manager to close
the key- it wouldn't wrap at 80 characters the old way, and would have had to
move anyway.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 00:39:02 -0400] rev 39658
py3: byteify strings in pycompat
These surfaced when disabling the source transformer to debug the problems in
win32.py. ./contrib/byteify-strings.py found a couple false positives, so I
marked them with r'' explicitly (in case I'm wrong).
# skip-blame since this is just b'' and r'' prefixing
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 30 Aug 2018 14:55:34 -0700] rev 39657
wireprotov2: let clients drive delta behavior
Previously, the "manifestdata" and "filedata" commands assumed the
receiver had all parent revisions for requested nodes. Unless the
revision had no parents, they emitted a delta instead of a fulltext.
This strategy isn't appropriate for shallow clones and for clients
that only want to access fulltext revision data for a single node
without fetching their parent revisions.
This commit adds an "haveparents" argument to the "manifestdata"
and "filedata" commands that controls delta generation behavior.
Unless "haveparents" is set, the server assumes that the client
doesn't have parent revisions unless they were previously sent
as part of the current group of revisions.
This change allows the fulltext revision data of any individual
revision to be obtained. This will facilitate shallow clones
and other data retrieval strategies that don't require all previous
revisions of an entity to be fetched.
Differential Revision: https://phab.mercurial-scm.org/D4492
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 04 Sep 2018 10:42:24 -0700] rev 39656
exchangev2: fetch file revisions
Now that the server has an API for fetching file data, we can call into
it to fetch file revisions.
The implementation is relatively straightforward: we examine the
manifests that we fetched and find all new file revisions referenced
by them. We build up a mapping from file path to file nodes to
manifest node. (The mapping to first manifest node allows us to
map back to first changelog node/revision, which is used for the
linkrev.)
Once that map is built up, we iterate over it in a deterministic
manner and fetch and store file data. The code is very similar
to manifest fetching. So similar that we could probably extract the
common bits into a generic function.
With file data retrieval implemented, `hg clone` and `hg pull` are
effectively feature complete, at least as far as the completeness
of data transfer for essential repository data (changesets, manifests,
files, phases, and bookmarks). We're still missing support for
obsolescence markers, the hgtags fnodes cache, and the branchmap
cache. But these are non-essential for the moment (and will be
implemented later).
This is a good point to assess the state of exchangev2 in terms of
performance. I ran a local `hg clone` for the mozilla-unified
repository using both version 1 and version 2 of the wire protocols
and exchange methods. This is effectively comparing the performance
of the wire protocol overhead and "getbundle" versus domain-specific
commands. Wire protocol version 2 doesn't have compression implemented
yet. So I tested version 1 with `server.compressionengines=none` to
remove compression overhead from the equation.
server
before: user 220.420+0.000 sys 14.420+0.000
after: user 321.980+0.000 sys 18.990+0.000
client
before: real 561.650 secs (user 497.670+0.000 sys 28.160+0.000)
after: real 1226.260 secs (user 944.240+0.000 sys 354.150+0.000)
We have substantial regressions on both client and server. This
is obviously not desirable. I'm aware of some reasons:
* Lack of hgtagsfnodes transfer (contributes significant CPU to
client).
* Lack of branch cache transfer (contributes significant CPU to
client).
* Little to no profiling / optimization performed on wire protocol
version 2 code.
* There appears to be a memory leak on the client and that is likely
causing swapping on my machine.
* Using multiple threads on the client may be counter-productive because
Python.
* We're not compressing on the server.
* We're tracking file nodes on the client via manifest diffing
rather than using linkrev shortcuts on the server.
I'm pretty confident that most of these issues are addressable.
But even if we can't get wire protocol version 2 on performance parity
with "getbundle," I still think it is important to have the set of low
level data-specific retrieval commands that we have implemented so
far. This is because the existence of such commands allows flexibility
in how clients access server data.
Differential Revision: https://phab.mercurial-scm.org/D4491
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:10:17 -0700] rev 39655
wireprotov2: define and implement "filedata" command
Continuing our trend of implementing *data commands for retrieving
information about specific repository data primitives, this commit
implements a command for retrieving data about an individual tracked
file.
The command is very similar to "manifestdata." The only significant
difference is that we have a standalone function for obtaining
storage for a tracked file. This is to provide a monkeypatch point
for extensions to implement path-based access control.
With this API available, wire protocol version 2 now exposes all
data primitives necessary to implement a full clone. Of course,
since "filedata" can only resolve data for a single path at a time,
clients would need to issue N commands to perform a full clone. On
the Firefox repository, this would be ~461k commands. We'll likely
need to implement a file data retrieval command that supports
multiple paths. But that can be implemented later.
Differential Revision: https://phab.mercurial-scm.org/D4490
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:57 -0700] rev 39654
exchangev2: fetch manifest revisions
Now that the server has support for retrieving manifest data, we can
implement the client bits to call it.
We teach the changeset fetching code to capture the manifest revisions
that are encountered on incoming changesets. We then feed this into a
new function which filters out known manifests and then batches up
manifest data requests to the server.
This is different from the previous wire protocol in a few notable
ways.
First, the client fetches manifest data separately and explicitly.
Before, we'd ask the server for data pertaining to some changesets
(via a "getbundle" command) and manifests (and files) would be sent
automatically. Providing an API for looking up just manifest data
separately gives clients much more flexibility for manifest management.
For example, a client may choose to only fetch manifest data on demand
instead of prefetching it (i.e. partial clone).
Second, we send N commands to the server for manifest retrieval instead
of 1. This property has a few nice side-effects. One is that the
deterministic nature of the requests lends itself to server-side
caching. For example, say the remote has 50,000 manifests. If the
server is configured to cache responses, each time a new commit
arrives, you will have a cache miss and need to regenerate all outgoing
data. But if you makes N requests requesting 10,000 manifests each,
a new commit will still yield cache hits on the initial, unchanged
manifest batches/requests.
A derived benefit from these properties is that resumable clone is
conceptually simpler to implement. When making a monolithic request
for all of the repository data, recovering from an interrupted clone
is hard because the server was in the driver's seat and was maintaining
state about all the data that needed transferred. With the client
driving fetching, the client can persist the set of unfetched entities
and retry/resume a fetch if something goes wrong. Or we can fetch all
data N changesets at a time and slowly build up a repository. This
approach is drastically easier to implement when we have server APIs
exposing low-level repository primitives (such as manifests and files).
We don't yet support tree manifests. But it should be possible to
implement that with the existing wire protocol command.
Differential Revision: https://phab.mercurial-scm.org/D4489
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:52 -0700] rev 39653
wireprotov2: define and implement "manifestdata" command
The added command can be used for obtaining manifest data.
Given a manifest path and set of manifest nodes, data about
manifests can be retrieved.
Unlike changeset data, we wish to emit deltas to describe
manifest revisions. So the command uses the relatively new
API for building delta requests and emitting them.
The code calls into deltaparent(), which I'm not very keen of.
There's still work to be done in delta generation land so
implementation details of storage (e.g. exactly one delta
is stored/available) don't creep into higher levels. But we
can worry about this later (there is already a TODO on
imanifestorage tracking this).
On the subject of parent deltas, the server assumes parent revisions
exist on the receiving end. This is obviously wrong for shallow
clone. I've added TODOs to add a mechanism to the command to
allow clients to specify desired behavior. This shouldn't be
too difficult to implement.
Another big change is that the client must explicitly request
manifest nodes to retrieve. This is a major departure from
"getbundle," where the server derives relevant manifests as it
iterates changesets and sends them automatically. As implemented,
the client must transmit each requested node to the server. At
20 bytes per node, we're looking at 2 MB per 100,000 nodes. Plus
wire encoding overhead. This isn't ideal for clients with limited
upload bandwidth. I plan to address this in the future by allowing
alternate mechanisms for defining the revisions to retrieve. One
idea is to define a range of changeset revisions whose manifest
revisions to retrieve (similar to how "changesetdata" works).
We almost certainly want an API to look up an individual manifest
by node. And that's where I've chosen to start with the implementation.
Again, a theme of this early exchangev2 work is I want to start by
building primitives for accessing raw repository data first and see
how far we can get with those before we need more complexity.
Differential Revision: https://phab.mercurial-scm.org/D4488
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 22 Aug 2018 14:51:11 -0700] rev 39652
wireprotov2: add TODOs around extending changesetdata fields
Extensions will inevitably want to extend the set of changeset
data/fields that can be requested. We'll need to implement support
for extending this in the future. Add some TODOs to track that.
Differential Revision: https://phab.mercurial-scm.org/D4487
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 29 Aug 2018 17:03:19 -0700] rev 39651
exchangev2: fetch and apply bookmarks
This is pretty similar to phases data. We collect bookmarks data
as we process records. Then at the end we make a call to the
bookmarks subsystem to reflect the remote's bookmarks.
Like phases, the code for handling bookmarks is vastly simpler
than the previous wire protocol code because the server always
transfers the full set of bookmarks when bookmarks are requested.
We don't have to keep track of whether we requested bookmarks or
not.
Differential Revision: https://phab.mercurial-scm.org/D4486
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 23 Aug 2018 18:14:19 -0700] rev 39650
wireprotov2: add bookmarks to "changesetdata" command
Like we did for phases, we want to emit bookmarks data attached
to each changeset.
The approach here is very similar to phases: we emit bookmarks
data inline with requested revision data. But we emit
records for nodes that weren't requested as well so consumers have
access to the full set of defined bookmarks.
Differential Revision: https://phab.mercurial-scm.org/D4485
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:58 -0700] rev 39649
exchangev2: fetch and apply phases data
Now that the server supports emitting phases data, we can request it
and apply it on the client.
Because we may receive phases-only updates from the server, we no
longer conditionally perform the "changesetdata" command depending
on whether there are revisions to fetch. In the previous wire
protocol, this case would result in us falling back to performing
"listkeys" commands to look up phases, bookmarks, etc data. But
since "changesetdata" is smart enough to handle metadata only
fetches, we can keep things consistent.
It's worth noting that because of the unified approach to changeset
data retrieval, phase handling code in wire proto v2 exchange is
drastically simpler. Contrast with all the code in exchange.py
dealing with all the variations for obtaining phases data.
Differential Revision: https://phab.mercurial-scm.org/D4484
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 28 Aug 2018 18:19:23 -0700] rev 39648
wireprotov2: add phases to "changesetdata" command
This commit teaches the "changesetdata" wire protocol command
to emit the phase state for each changeset.
This is a different approach from existing phase transfer in a
few ways. Previously, if there are no new revisions (or we're
not using bundle2), we perform a "listkeys" request to retrieve
phase heads. And when revision data is being transferred
with bundle2, phases data is encoded in a standalone bundle2 part.
In both cases, phases data is logically decoupled from the changeset
data and is encountered/applied after changeset revision data
is received.
The new wire protocol purposefully tries to more tightly associate
changeset metadata (phases, bookmarks, obsolescence markers, etc)
with the changeset revision and index data itself, rather than
have it live as a separate entity that must be fetched and
processed separately. I reckon that one reason we didn't do this
before was it was difficult to add new data types/fields without
breaking existing consumers. By using CBOR maps to transfer
changeset data and putting clients in control of what fields are
requested / present in those maps, we can easily add additional
changeset data while maintaining backwards compatibility. I believe
this to be a superior approach to the problem.
That being said, for performance reasons, we may need to resort
to alternative mechanisms for transferring data like phases. But
for now, I think giving the wire protocol the ability to transfer
changeset metadata next to the changeset itself is a powerful feature
because it is a raw, changeset-centric data API. And if you build
simple APIs for accessing the fundamental units of repository data,
you enable client-side experimentation (partial clone, etc). If it
turns out that we need specialized APIs or mechanisms for transferring
data like phases, we can build in those APIs later. For now, I'd
like to see how far we can get on simple APIs.
It's worth noting that when phase data is being requested, the
server will also emit changeset records for nodes in the bases
specified by the "noderange" argument. This is to ensure that
phase-only updates for nodes the client has are available to the
client, even if no new changesets will be transferred.
Differential Revision: https://phab.mercurial-scm.org/D4483
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:36 -0700] rev 39647
exchangev2: fetch changeset revisions
All Mercurial repository data is derived from changesets:
you can't do anything unless you have changesets. Therefore,
it makes sense for changesets to be the first piece of data
that we transfer as part of pull.
To do this, we call our new "changesetdata" command, requesting
parents and revision data. This gives us all the data that a
changegroup delta group would give us. We simply normalize
this data into what addgroup() expects and call that API on
the changelog to bulk insert revisions into the changelog.
Code in this commit is heavily borrowed from
changegroup.cg1unpacker.apply().
Differential Revision: https://phab.mercurial-scm.org/D4482
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:16 -0700] rev 39646
wireprotov2: define and implement "changesetdata" command
This commit introduces the "changesetdata" wire protocol command.
The role of the command is to expose data associated with changelog
revisions, including the raw revision data itself.
This command is the first piece of a new clone/pull strategy that
is built on top of domain-specific commands for data retrieval.
Instead of a monolithic "getbundle" command that transfers all of the
things, we'll be introducing commands for fetching specific pieces
of data.
Since the changeset is the fundamental unit from which we derive
pointers to other data (manifests, file nodes, etc), it makes sense
to start reimplementing pull with this data.
The command accepts as arguments a set of root and head revisions
defining the changesets that should be fetched as well as an explicit
list of nodes. By default, the command returns only the node values:
the client must explicitly request additional fields be added to the
response. Current supported fields are the list of parent nodes and
the revision fulltext.
My plan is to eventually add support for transferring other data
associated with changesets, including phases, bookmarks, obsolescence
markers, etc. Since the response format is CBOR, we'll be able to add
this data into the response object relatively easily (it should be
as simple as adding a key in a map).
The documentation captures a number of TODO items. Some of these may
require BC breaking changes. That's fine: wire protocol v2 is still
highly experimental.
Differential Revision: https://phab.mercurial-scm.org/D4481
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 09:58:23 -0700] rev 39645
exchangev2: start to implement pull with wire protocol v2
Wire protocol version 2 will take a substantially different
approach to exchange than version 1 (at least as far as pulling
is concerned).
This commit establishes a new exchangev2 module for holding
code related to exchange using wire protocol v2. I could have
added things to the existing exchange module. But it is already
quite big. And doing things inline isn't in question because
the existing code is already littered with conditional code
for various states of support for the existing wire protocol
as it evolved over 10+ years. A new module gives us a chance
to make a clean break.
This approach does mean we'll end up writing some duplicate
code. And there's a significant chance we'll miss functionality
as code is ported. The plan is to eventually add #testcase's
to existing tests so the new wire protocol is tested side-by-side
with the existing one. This will hopefully tease out any
features that weren't ported properly. But before we get there,
we need to build up support for the new exchange methods.
Our journey towards implementing a new exchange begins with pulling.
And pulling begins with discovery.
The discovery code added to exchangev2 is heavily drawn from
the following functions:
* exchange._pulldiscoverychangegroup
* discovery.findcommonincoming
For now, we build on top of existing discovery mechanisms. The
new wire protocol should be capable of doing things more efficiently.
But I'd rather defer on this problem.
To foster the transition, we invent a fake capability on the HTTPv2
peer and have the main pull code in exchange.py call into exchangev2
when the new wire protocol is being used.
Differential Revision: https://phab.mercurial-scm.org/D4480
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 21 Aug 2018 15:33:11 -0700] rev 39644
httppeer: expose capabilities for each command
This will help code using peers to sniff out exactly what servers
support.
Differential Revision: https://phab.mercurial-scm.org/D4436