Mon, 17 Sep 2018 15:55:18 +0300 narrow: use diffmatcher to send only new filelogs in non-ellipses widening
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 15:55:18 +0300] rev 39665
narrow: use diffmatcher to send only new filelogs in non-ellipses widening Before this patch, when we widen a non-ellipses narrow clone, we downloads all the filelogs matching the resulting new matcher. This is same as the ellipses case but can be improved because, we don't pull new csets in non-ellipses cases, we can only download the new added files instead of downloading all the files which matches the new matcher. So, we only download files which matches the new matcher but does not matches the old matcher. There exists a match.differencematcher() which is used here. This will lead to significant amount of speedup in extending a non-ellipses narrow copy on large repos because we will download and process only the new required filelogs. The tests changes demonstrate that we are downloading now less files. Thanks to Augie for pointing that functionality of differencematcher exists in core. Differential Revision: https://phab.mercurial-scm.org/D4614
Mon, 17 Sep 2018 15:27:39 +0300 py3: add missing b'' prefixes in couple of test files
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 15:27:39 +0300] rev 39664
py3: add missing b'' prefixes in couple of test files These were missed in the earlier patch and caught by Yuya. # skip-blame because just b'' prefix Differential Revision: https://phab.mercurial-scm.org/D4613
Sun, 16 Sep 2018 23:13:05 -0400 run-tests: convert the remaining os.system() call to Unicode
Matt Harbison <matt_harbison@yahoo.com> [Sun, 16 Sep 2018 23:13:05 -0400] rev 39663
run-tests: convert the remaining os.system() call to Unicode I wasn't able to hit this path in 543a788eea2d, but I have now when I accidentally left off `--local`.
Sat, 15 Sep 2018 13:31:41 -0400 py3: partially fix pager spawning on Windows
Matt Harbison <matt_harbison@yahoo.com> [Sat, 15 Sep 2018 13:31:41 -0400] rev 39662
py3: partially fix pager spawning on Windows Previously, spinning up the pager crashed because the command and environment was in bytes. (See also 543a788eea2d.) Now it aborts with an invalid handle: $ HGMODULEPOLICY=py py -3 ../hg --traceback --config extensions.evolve=! Traceback (most recent call last): File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 967, in _write self.fout.write(''.join(msgs)) File "c:\Users\Matt\projects\hg\mercurial\windows.py", line 173, in write self.fp.write(s[start:end]) OSError: [WinError 6] The handle is invalid During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\Users\Matt\projects\hg\mercurial\scmutil.py", line 164, in callcatch return func() File "c:\Users\Matt\projects\hg\mercurial\dispatch.py", line 350, in _runcatchfunc return _dispatch(req) File "c:\Users\Matt\projects\hg\mercurial\dispatch.py", line 930, in _dispatch return commands.help_(ui, 'shortlist') File "c:\Users\Matt\projects\hg\mercurial\commands.py", line 2930, in help_ ui.write(formatted) File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 948, in write self._writenobuf(*args, **opts) File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 960, in _writenobuf self._write(*msgs, **opts) File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 969, in _write raise error.StdioError(err) mercurial.error.StdioError: [Errno 9] The handle is invalid abort: The handle is invalid The interesting bit here is that the abort message is marked with ANSI color, but the OSError is not.
Sat, 15 Sep 2018 10:35:00 +0900 censor: rename loop variable to silence pyflakes warning
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 10:35:00 +0900] rev 39661
censor: rename loop variable to silence pyflakes warning hgext/censor.py:92: list comprehension redefines 'c' from line 88
Sun, 16 Sep 2018 20:58:51 +0530 py3: add b'' prefixes in tests/test-hgweb-no-request-uri.t
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:58:51 +0530] rev 39660
py3: add b'' prefixes in tests/test-hgweb-no-request-uri.t # skip-blame because just b'' prefixes. Differential Revision: https://phab.mercurial-scm.org/D4611
Sun, 16 Sep 2018 20:49:37 +0530 py3: add b'' prefixes in tests/test-hgweb-no-path-info.t
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:49:37 +0530] rev 39659
py3: add b'' prefixes in tests/test-hgweb-no-path-info.t # skip-blame because just b'' prefixes Differential Revision: https://phab.mercurial-scm.org/D4610
Sun, 16 Sep 2018 20:20:59 +0530 py3: add b'' prefixes in tests/test-hgweb-non-interactive.t
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:20:59 +0530] rev 39658
py3: add b'' prefixes in tests/test-hgweb-non-interactive.t # skip-blame because just b'' prefix Differential Revision: https://phab.mercurial-scm.org/D4609
Sun, 16 Sep 2018 19:58:01 +0530 py3: use codecs.encode() to encode in rot-13 encoding
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:58:01 +0530] rev 39657
py3: use codecs.encode() to encode in rot-13 encoding The other occurence will need some more love as description is bytes by default and we need to decode it and then encode it. Differential Revision: https://phab.mercurial-scm.org/D4608
Sun, 16 Sep 2018 19:18:15 +0530 py3: add two passing tests to whitelist found by buildbot
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:18:15 +0530] rev 39656
py3: add two passing tests to whitelist found by buildbot The buildbot found these two new passing tests on Python 3. Differential Revision: https://phab.mercurial-scm.org/D4607
Sat, 15 Sep 2018 01:36:43 -0400 phabricator: mark extension as experimental for now
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:36:43 -0400] rev 39655
phabricator: mark extension as experimental for now I don't want us to commit to this having a stable interface just yet. Differential Revision: https://phab.mercurial-scm.org/D4605
Sat, 15 Sep 2018 01:16:31 -0400 phabricator: fix templating bug by using hybriddict
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:16:31 -0400] rev 39654
phabricator: fix templating bug by using hybriddict Differential Revision: https://phab.mercurial-scm.org/D4604
Sat, 15 Sep 2018 01:13:37 -0400 phabricator: add tests of templatekeyword
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:13:37 -0400] rev 39653
phabricator: add tests of templatekeyword Having tests is paying off: I found a bug and now it'll be easy to fix! Differential Revision: https://phab.mercurial-scm.org/D4603
Sat, 15 Sep 2018 00:46:17 -0400 phabricator: move extension from contrib to hgext
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:46:17 -0400] rev 39652
phabricator: move extension from contrib to hgext It's well-enough tested now and widely enough used I think we should ship it. Differential Revision: https://phab.mercurial-scm.org/D4602
Sat, 15 Sep 2018 00:50:21 -0400 tests: add some basic tests of phabricator interactions
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:50:21 -0400] rev 39651
tests: add some basic tests of phabricator interactions This uses the vcr library to avoid hitting phabricator on every test execution. In order to generate new recordings (vcr calls them cassettes) just remove the appropriate json file, and the test will regenerate it. It's not my favorite way to test things, but it'll let us have test coverage on the phabricator extension that'll make it resilient to refactors in core and let us move it to hgext. In the future, it'd probably be better to have a docker container we can spin up for creating the vcr recordings, but for now this is enough better than nothing I'm going to declare victory. Coverage reports about 73% of the extension is now covered. Differential Revision: https://phab.mercurial-scm.org/D4601
Sat, 15 Sep 2018 00:20:03 -0400 phabricator: add support for using the vcr library to mock interactions
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:20:03 -0400] rev 39650
phabricator: add support for using the vcr library to mock interactions I'll use this in an upcoming test. The decorator dancing in this is more complicated than I'd like, but it beats repeating all this code everywhere. Differential Revision: https://phab.mercurial-scm.org/D4600
Sat, 15 Sep 2018 00:19:09 -0400 keepalive: work around slight deficiency in vcr
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:19:09 -0400] rev 39649
keepalive: work around slight deficiency in vcr VCR's response type doesn't define the will_close attribute. Let's just have keepalive default to closing the socket if the will_close attribute is missing. Differential Revision: https://phab.mercurial-scm.org/D4599
Sat, 15 Sep 2018 00:18:16 -0400 hghave: add a checker for the vcr HTTP record/replay library
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:18:16 -0400] rev 39648
hghave: add a checker for the vcr HTTP record/replay library I'm going to use this to write some tests of the phabricator extension. Differential Revision: https://phab.mercurial-scm.org/D4598
Sat, 15 Sep 2018 00:04:06 -0400 py3: allow run-tests.py to run on Windows
Matt Harbison <matt_harbison@yahoo.com> [Sat, 15 Sep 2018 00:04:06 -0400] rev 39647
py3: allow run-tests.py to run on Windows This is now functional: HGMODULEPOLICY=py py -3 run-tests.py --local test-help.t --pure --view bcompare However, on this machine without a C compiler, it tries to load cext anyway, and blows up. I haven't looked into why, other than to see that it does set the environment variable. When the test exits though, I see it can't find killdaemons.py, get-with-headers.py, etc. I have no idea why these changes are needed, given that it runs on Linux. But os.system() is insisting that it take a str, and subprocess.Popen() blows up without str: Errored test-help.t: Traceback (most recent call last): File "run-tests.py", line 810, in run self.runTest() File "run-tests.py", line 858, in runTest ret, out = self._run(env) File "run-tests.py", line 1268, in _run exitcode, output = self._runcommand(cmd, env) File "run-tests.py", line 1141, in _runcommand env=env) File "C:\Program Files\Python37\lib\subprocess.py", line 756, in __init__ restore_signals, start_new_session) File "C:\Program Files\Python37\lib\subprocess.py", line 1100, in _execute_child args = list2cmdline(args) File "C:\Program Files\Python37\lib\subprocess.py", line 511, in list2cmdline needquote = (" " in arg) or ("\t" in arg) or not arg TypeError: argument of type 'int' is not iterable This is exactly how it crashes when trying to spin up a pager too. I left one instance of os.system() unchanged in _installhg(), because it doesn't get there.
Fri, 14 Sep 2018 23:04:18 -0400 py3: ensure run-tests environment is uniformly str
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 23:04:18 -0400] rev 39646
py3: ensure run-tests environment is uniformly str subprocess.popen() was crashing, and when I printed out `env`, all of the keys and most of the values were str. Except these.
Fri, 14 Sep 2018 22:57:35 -0400 py3: ensure run-tests.osenvironb is actually bytes
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 22:57:35 -0400] rev 39645
py3: ensure run-tests.osenvironb is actually bytes Windows doesn't have os.environb, so it was falling back to the Unicode form, and all of the accesses are trying to use bytes.
Thu, 13 Sep 2018 22:07:00 -0400 py3: fix str vs bytes in enough places to run `hg version` on Windows
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 22:07:00 -0400] rev 39644
py3: fix str vs bytes in enough places to run `hg version` on Windows I don't have Visual Studio 2015 at home, but this now works with a handful of extensions (blackbox, extdiff, patchbomb, phabricator and rebase, but not evolve): $ HGMODULEPOLICY=py py -3 ../hg version Enabling the evolve extension causes the usual "failed to import ..." line, but then print this before the usual version output: ('commit', '[b'debugancestor', b'debugapplystreamclonebundle', ..., b'verify', b'version']') ... where the elided part seems to be every command and alias known.
Thu, 13 Sep 2018 20:54:53 -0400 windows: open registry keys using unicode names
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 20:54:53 -0400] rev 39643
windows: open registry keys using unicode names Python3 complained it must be str. While here, use a context manager to close the key- it wouldn't wrap at 80 characters the old way, and would have had to move anyway.
Thu, 13 Sep 2018 00:39:02 -0400 py3: byteify strings in pycompat
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 00:39:02 -0400] rev 39642
py3: byteify strings in pycompat These surfaced when disabling the source transformer to debug the problems in win32.py. ./contrib/byteify-strings.py found a couple false positives, so I marked them with r'' explicitly (in case I'm wrong). # skip-blame since this is just b'' and r'' prefixing
Thu, 30 Aug 2018 14:55:34 -0700 wireprotov2: let clients drive delta behavior
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 30 Aug 2018 14:55:34 -0700] rev 39641
wireprotov2: let clients drive delta behavior Previously, the "manifestdata" and "filedata" commands assumed the receiver had all parent revisions for requested nodes. Unless the revision had no parents, they emitted a delta instead of a fulltext. This strategy isn't appropriate for shallow clones and for clients that only want to access fulltext revision data for a single node without fetching their parent revisions. This commit adds an "haveparents" argument to the "manifestdata" and "filedata" commands that controls delta generation behavior. Unless "haveparents" is set, the server assumes that the client doesn't have parent revisions unless they were previously sent as part of the current group of revisions. This change allows the fulltext revision data of any individual revision to be obtained. This will facilitate shallow clones and other data retrieval strategies that don't require all previous revisions of an entity to be fetched. Differential Revision: https://phab.mercurial-scm.org/D4492
Tue, 04 Sep 2018 10:42:24 -0700 exchangev2: fetch file revisions
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 04 Sep 2018 10:42:24 -0700] rev 39640
exchangev2: fetch file revisions Now that the server has an API for fetching file data, we can call into it to fetch file revisions. The implementation is relatively straightforward: we examine the manifests that we fetched and find all new file revisions referenced by them. We build up a mapping from file path to file nodes to manifest node. (The mapping to first manifest node allows us to map back to first changelog node/revision, which is used for the linkrev.) Once that map is built up, we iterate over it in a deterministic manner and fetch and store file data. The code is very similar to manifest fetching. So similar that we could probably extract the common bits into a generic function. With file data retrieval implemented, `hg clone` and `hg pull` are effectively feature complete, at least as far as the completeness of data transfer for essential repository data (changesets, manifests, files, phases, and bookmarks). We're still missing support for obsolescence markers, the hgtags fnodes cache, and the branchmap cache. But these are non-essential for the moment (and will be implemented later). This is a good point to assess the state of exchangev2 in terms of performance. I ran a local `hg clone` for the mozilla-unified repository using both version 1 and version 2 of the wire protocols and exchange methods. This is effectively comparing the performance of the wire protocol overhead and "getbundle" versus domain-specific commands. Wire protocol version 2 doesn't have compression implemented yet. So I tested version 1 with `server.compressionengines=none` to remove compression overhead from the equation. server before: user 220.420+0.000 sys 14.420+0.000 after: user 321.980+0.000 sys 18.990+0.000 client before: real 561.650 secs (user 497.670+0.000 sys 28.160+0.000) after: real 1226.260 secs (user 944.240+0.000 sys 354.150+0.000) We have substantial regressions on both client and server. This is obviously not desirable. I'm aware of some reasons: * Lack of hgtagsfnodes transfer (contributes significant CPU to client). * Lack of branch cache transfer (contributes significant CPU to client). * Little to no profiling / optimization performed on wire protocol version 2 code. * There appears to be a memory leak on the client and that is likely causing swapping on my machine. * Using multiple threads on the client may be counter-productive because Python. * We're not compressing on the server. * We're tracking file nodes on the client via manifest diffing rather than using linkrev shortcuts on the server. I'm pretty confident that most of these issues are addressable. But even if we can't get wire protocol version 2 on performance parity with "getbundle," I still think it is important to have the set of low level data-specific retrieval commands that we have implemented so far. This is because the existence of such commands allows flexibility in how clients access server data. Differential Revision: https://phab.mercurial-scm.org/D4491
Wed, 05 Sep 2018 09:10:17 -0700 wireprotov2: define and implement "filedata" command
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:10:17 -0700] rev 39639
wireprotov2: define and implement "filedata" command Continuing our trend of implementing *data commands for retrieving information about specific repository data primitives, this commit implements a command for retrieving data about an individual tracked file. The command is very similar to "manifestdata." The only significant difference is that we have a standalone function for obtaining storage for a tracked file. This is to provide a monkeypatch point for extensions to implement path-based access control. With this API available, wire protocol version 2 now exposes all data primitives necessary to implement a full clone. Of course, since "filedata" can only resolve data for a single path at a time, clients would need to issue N commands to perform a full clone. On the Firefox repository, this would be ~461k commands. We'll likely need to implement a file data retrieval command that supports multiple paths. But that can be implemented later. Differential Revision: https://phab.mercurial-scm.org/D4490
Wed, 05 Sep 2018 09:09:57 -0700 exchangev2: fetch manifest revisions
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:57 -0700] rev 39638
exchangev2: fetch manifest revisions Now that the server has support for retrieving manifest data, we can implement the client bits to call it. We teach the changeset fetching code to capture the manifest revisions that are encountered on incoming changesets. We then feed this into a new function which filters out known manifests and then batches up manifest data requests to the server. This is different from the previous wire protocol in a few notable ways. First, the client fetches manifest data separately and explicitly. Before, we'd ask the server for data pertaining to some changesets (via a "getbundle" command) and manifests (and files) would be sent automatically. Providing an API for looking up just manifest data separately gives clients much more flexibility for manifest management. For example, a client may choose to only fetch manifest data on demand instead of prefetching it (i.e. partial clone). Second, we send N commands to the server for manifest retrieval instead of 1. This property has a few nice side-effects. One is that the deterministic nature of the requests lends itself to server-side caching. For example, say the remote has 50,000 manifests. If the server is configured to cache responses, each time a new commit arrives, you will have a cache miss and need to regenerate all outgoing data. But if you makes N requests requesting 10,000 manifests each, a new commit will still yield cache hits on the initial, unchanged manifest batches/requests. A derived benefit from these properties is that resumable clone is conceptually simpler to implement. When making a monolithic request for all of the repository data, recovering from an interrupted clone is hard because the server was in the driver's seat and was maintaining state about all the data that needed transferred. With the client driving fetching, the client can persist the set of unfetched entities and retry/resume a fetch if something goes wrong. Or we can fetch all data N changesets at a time and slowly build up a repository. This approach is drastically easier to implement when we have server APIs exposing low-level repository primitives (such as manifests and files). We don't yet support tree manifests. But it should be possible to implement that with the existing wire protocol command. Differential Revision: https://phab.mercurial-scm.org/D4489
Wed, 05 Sep 2018 09:09:52 -0700 wireprotov2: define and implement "manifestdata" command
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:52 -0700] rev 39637
wireprotov2: define and implement "manifestdata" command The added command can be used for obtaining manifest data. Given a manifest path and set of manifest nodes, data about manifests can be retrieved. Unlike changeset data, we wish to emit deltas to describe manifest revisions. So the command uses the relatively new API for building delta requests and emitting them. The code calls into deltaparent(), which I'm not very keen of. There's still work to be done in delta generation land so implementation details of storage (e.g. exactly one delta is stored/available) don't creep into higher levels. But we can worry about this later (there is already a TODO on imanifestorage tracking this). On the subject of parent deltas, the server assumes parent revisions exist on the receiving end. This is obviously wrong for shallow clone. I've added TODOs to add a mechanism to the command to allow clients to specify desired behavior. This shouldn't be too difficult to implement. Another big change is that the client must explicitly request manifest nodes to retrieve. This is a major departure from "getbundle," where the server derives relevant manifests as it iterates changesets and sends them automatically. As implemented, the client must transmit each requested node to the server. At 20 bytes per node, we're looking at 2 MB per 100,000 nodes. Plus wire encoding overhead. This isn't ideal for clients with limited upload bandwidth. I plan to address this in the future by allowing alternate mechanisms for defining the revisions to retrieve. One idea is to define a range of changeset revisions whose manifest revisions to retrieve (similar to how "changesetdata" works). We almost certainly want an API to look up an individual manifest by node. And that's where I've chosen to start with the implementation. Again, a theme of this early exchangev2 work is I want to start by building primitives for accessing raw repository data first and see how far we can get with those before we need more complexity. Differential Revision: https://phab.mercurial-scm.org/D4488
Wed, 22 Aug 2018 14:51:11 -0700 wireprotov2: add TODOs around extending changesetdata fields
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 22 Aug 2018 14:51:11 -0700] rev 39636
wireprotov2: add TODOs around extending changesetdata fields Extensions will inevitably want to extend the set of changeset data/fields that can be requested. We'll need to implement support for extending this in the future. Add some TODOs to track that. Differential Revision: https://phab.mercurial-scm.org/D4487
(0) -30000 -10000 -3000 -1000 -300 -100 -50 -30 +30 +50 +100 +300 +1000 +3000 +10000 tip