Thu, 17 Nov 2016 12:59:36 +0100 posix: move checkexec test file to .hg/cache
Mads Kiilerich <madski@unity3d.com> [Thu, 17 Nov 2016 12:59:36 +0100] rev 30453
posix: move checkexec test file to .hg/cache This avoids unnecessary churn in the working directory. It is not necessarily a fully valid assumption that .hg/cache is on the same filesystem as the working directory, but I think it is an acceptable approximation. It could also be the case that different parts of the working directory is on different mount points so checking in the root folder could also be wrong.
Thu, 17 Nov 2016 15:31:19 -0800 manifest: move manifestctx creation into manifestlog.get()
Durham Goode <durham@fb.com> [Thu, 17 Nov 2016 15:31:19 -0800] rev 30452
manifest: move manifestctx creation into manifestlog.get() Most manifestctx creation already happened in manifestlog.get(), but there was one spot in the manifestctx class itself that created an instance manually. This patch makes that one instance go through the manifestlog. This means extensions can just wrap manifestlog.get() and it will cover all manifestctx creations. It also means this code path now hits the manifestlog cache.
Fri, 11 Nov 2016 01:10:07 -0800 util: implement zstd compression engine
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 11 Nov 2016 01:10:07 -0800] rev 30451
util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
Thu, 10 Nov 2016 23:38:41 -0800 hghave: add check for zstd support
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:38:41 -0800] rev 30450
hghave: add check for zstd support Not all configurations will support zstd. Add a check so we can conditionalize tests.
Thu, 10 Nov 2016 23:34:15 -0800 exchange: obtain compression engines from the registrar
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:34:15 -0800] rev 30449
exchange: obtain compression engines from the registrar util.compengines has knowledge of all registered compression engines and the metadata that associates them with various bundle types. This patch removes the now redundant declaration of this metadata from exchange.py and obtains it from the new source. The effect of this patch is that once a new compression engine is registered with util.compengines, `hg bundle -t <engine>` will just work.
Thu, 10 Nov 2016 23:29:01 -0800 bundle2: equate 'UN' with no compression
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:29:01 -0800] rev 30448
bundle2: equate 'UN' with no compression An upcoming patch will change the "alg" argument passed to this function from None to "UN" when no compression is wanted. The existing implementation of bundle2 does not set a "Compression" parameter if no compression is used. In theory, setting "Compression=UN" should work. But I haven't audited the code to see if all client versions supporting bundle2 will accept this. Rather than take the risk, avoid the BC breakage and treat "UN" the same as None.
Thu, 10 Nov 2016 23:15:02 -0800 util: check for compression engine availability before returning
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:15:02 -0800] rev 30447
util: check for compression engine availability before returning If a requested compression engine is registered but not available, requesting it will now abort. To be honest, I'm not sure if this is the appropriate mechanism for handling optional compression engines. I won't know until all uses of compression (bundles, wire protocol, revlogs, etc) are using the new API and zstd (our planned optional engine) is implemented. So this API could change.
Thu, 10 Nov 2016 23:03:48 -0800 util: expose an "available" API on compression engines
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:03:48 -0800] rev 30446
util: expose an "available" API on compression engines When the zstd compression engine is introduced, it won't work in all installations, namely pure Python installs. So, we need a mechanism to declare whether a compression engine is available. We don't want to conditionally register the compression engine because it is sometimes useful to know when a compression engine name or encountered data is valid but just not available versus unknown.
Thu, 10 Nov 2016 22:26:35 -0800 setup: compile zstd C extension
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:26:35 -0800] rev 30445
setup: compile zstd C extension Now that zstd and python-zstandard are vendored, we can start compiling them as part of the install. python-zstandard provides a self-contained Python function that returns a distutils.extension.Extension, so it is really easy to add zstd to our setup.py without having to worry about defining source files, include paths, etc. The function even allows specifying the module name the extension should be compiled as. This conveniently allows us to compile the module into the "mercurial" package so "our" version won't collide with a version installed under the canonical "zstd" module name.
Thu, 10 Nov 2016 22:15:58 -0800 zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:15:58 -0800] rev 30444
zstd: vendor python-zstandard 0.5.0 As the commit message for the previous changeset says, we wish for zstd to be a 1st class citizen in Mercurial. To make that happen, we need to enable Python to talk to the zstd C API. And that requires bindings. This commit vendors a copy of existing Python bindings. Why do we need to vendor? As the commit message of the previous commit says, relying on systems in the wild to have the bindings or zstd present is a losing proposition. By distributing the zstd and bindings with Mercurial, we significantly increase our chances that zstd will work. Since zstd will deliver a better end-user experience by achieving better performance, this benefits our users. Another reason is that the Python bindings still aren't stable and the API is somewhat fluid. While Mercurial could be coded to target multiple versions of the Python bindings, it is safer to bundle an explicit, known working version. The added Python bindings are mostly a fully-featured interface to the zstd C API. They allow one-shot operations, streaming, reading and writing from objects implements the file object protocol, dictionary compression, control over low-level compression parameters, and more. The Python bindings work on Python 2.6, 2.7, and 3.3+ and have been tested on Linux and Windows. There are CFFI bindings, but they are lacking compared to the C extension. Upstream work will be needed before we can support zstd with PyPy. But it will be possible. The files added in this commit come from Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard and are added without modifications. Some files from the upstream repository have been omitted, namely files related to continuous integration. In the spirit of full disclosure, I'm the maintainer of the "python-zstandard" project and have authored 100% of the code added in this commit. Unfortunately, the Python bindings have not been formally code reviewed by anyone. While I've tested much of the code thoroughly (I even have tests that fuzz APIs), there's a good chance there are bugs, memory leaks, not well thought out APIs, etc. If someone wants to review the code and send feedback to the GitHub project, it would be greatly appreciated. Despite my involvement with both projects, my opinions of code style differ from Mercurial's. The code in this commit introduces numerous code style violations in Mercurial's linters. So, the code is excluded from most lints. However, some violations I agree with. These have been added to the known violations ignore list for now.
Thu, 10 Nov 2016 21:45:29 -0800 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 21:45:29 -0800] rev 30443
zstd: vendor zstd 1.1.1 zstd is a new compression format and it is awesome, yielding higher compression ratios and significantly faster compression and decompression operations compared to zlib (our current compression engine of choice) across the board. We want zstd to be a 1st class citizen in Mercurial and to eventually be the preferred compression format for various operations. This patch starts the formal process of supporting zstd by vendoring a copy of zstd. Why do we need to vendor zstd? Good question. First, zstd is relatively new and not widely available yet. If we didn't vendor zstd or distribute it with Mercurial, most users likely wouldn't have zstd installed or even available to install. What good is a feature if you can't use it? Vendoring and distributing the zstd sources gives us the highest liklihood that zstd will be available to Mercurial installs. Second, the Python bindings to zstd (which will be vendored in a separate changeset) make use of zstd APIs that are only available via static linking. One reason they are only available via static linking is that they are unstable and could change at any time. While it might be possible for the Python bindings to attempt to talk to different versions of the zstd C library, the safest thing to do is link against a specific, known-working version of zstd. This is why the Python zstd bindings themselves vendor zstd and why we must as well. This also explains why the added files are in a "python-zstandard" directory. The added files are from the 1.1.1 release of zstd (Git commit 4c0b44f8ced84c4c8edfa07b564d31e4fa3e8885 from https://github.com/facebook/zstd) and are added without modifications. Not all files from the zstd "distribution" have been added. Notably missing are files to support interacting with "legacy," pre-1.0 versions of zstd. The decision of which files to include is made by the upstream python-zstandard project (which I'm the author of). The files in this commit are a snapshot of the files from the 0.5.0 release of that project, Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard.
Tue, 15 Nov 2016 21:56:49 +0100 bdiff: give slight preference to removing trailing lines
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30442
bdiff: give slight preference to removing trailing lines [This change could be folded into the previous changeset to minimize the repo churn ...] Similar to the previous change, introduce an exception to the general preference for matches in the middle of bdiff ranges: If the best match on the B side starts at the beginning of the bdiff range, don't aim for the middle-most A side match but for the earliest. New (later) matches on the A side will only be considered better if the corresponding match on the B side *not* is at the beginning of the range. Thus, if the best (middle-most) match on the B side turns out to be at the beginning of the range, the earliest match on the A side will be used. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22807275 to 22808120 bytes - a 0.004% increase.
Tue, 15 Nov 2016 21:56:49 +0100 bdiff: give slight preference to appending lines
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30441
bdiff: give slight preference to appending lines [This change could be folded into the previous changeset to minimize the repo churn ...] The general preference to matches in the middle of bdiff ranges helps getting balanced recursion and efficient computation. But, as previous changes have shown, it might also give diffs that seems "obviously wrong". To mitigate that: If the best match on the A side starts at the beginning of the bdiff range, don't aim for the middle-most B side match but for the earliest. This will make the matches balanced (by both sides being "early") even though the bisection will be less balanced. Still, this case only apply if the *best* and middle-most match was fully unbalanced on the A side. Each recursion will thus even in this worst case reduce the problem significantly and we are not re-introducing the problem that was fixed in f1ca249696ed. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22806817 to 22807275 bytes - a 0.002% increase. This make the recent test-bdiff.py changes give a more pretty output ... but they no longer show that the recursion is around middle matches (because it in these cases isn't).
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: give slight preference to longest matches in the middle of the B side
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30440
bdiff: give slight preference to longest matches in the middle of the B side We already have a slight preference for matches close to the middle on the A side. Now, do the same on the B side. j is iterating the b range backwards and we thus accept a new j if the previous match was in the upper half. This makes the test-bhalf diff "correct". It obviously also gives more preference to balanced recursion than to appending to sequences. That is kind of correct, but will also unfortunately make some bundles bigger. No doubt, we can also create examples where it will make them smaller ... The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22803824 to 22806817 bytes - an 0.01% increase.
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30439
bdiff: rearrange the "better longest match" code This is primarily to make the code more managable and prepare for later changes. More specific assignments might also be slightly faster, even thought it also might generate a bit more code.
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: adjust criteria for getting optimal longest match in the A side middle
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30438
bdiff: adjust criteria for getting optimal longest match in the A side middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22804885 to 22803824 bytes - an 0.005% reduction.
Tue, 08 Nov 2016 18:37:33 +0100 tests: explore some bdiff cases
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30437
tests: explore some bdiff cases
Tue, 15 Nov 2016 21:56:49 +0100 tests: make test-bdiff.py easier to maintain
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30436
tests: make test-bdiff.py easier to maintain Add more stdout logging to help navigate the .out file.
Thu, 17 Nov 2016 08:52:52 -0800 perf: unbust perfbdiff --alldata
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 08:52:52 -0800] rev 30435
perf: unbust perfbdiff --alldata This broke in f84fc6a92817 due to a refactored manifest API. The fix is a bit hacky - perfbdiff doesn't yet support tree manifests for example. But it gets the job done. A test has been added for --alldata so this doesn't happen again.
Thu, 17 Nov 2016 20:57:09 +0900 worker: discard waited pid by anyone who noticed it first
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 20:57:09 +0900] rev 30434
worker: discard waited pid by anyone who noticed it first This makes sure all waited pids are removed before calling killworkers() even if waitpid()-pids.discard() sequence is interrupted by another SIGCHLD.
Thu, 17 Nov 2016 21:08:58 +0900 worker: kill workers after all zombie processes are reaped
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 21:08:58 +0900] rev 30433
worker: kill workers after all zombie processes are reaped Since we now wait child processes in non-blocking way (changed by 7bc25549e084 and e8fb03cfbbde), we don't have to kill them in the middle of the waitpid() loop. This change will help solving a possible race of waitpid()-pids.discard() sequence and another SIGCHLD. waitforworkers() is called by cleanup(), in which case we do killworkers() beforehand so we can remove killworkers() from waitforworkers().
Thu, 17 Nov 2016 20:44:05 +0900 worker: make sure killworkers() never be interrupted by another SIGCHLD
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 20:44:05 +0900] rev 30432
worker: make sure killworkers() never be interrupted by another SIGCHLD killworkers() iterates over pids, which can be updated by SIGCHLD handler. So we should either copy pids or prevent killworkers() from being interrupted by SIGCHLD. I chose the latter as it is simpler and can make pids handling more consistent. This fixes a possible "set changed size during iteration" error at killworkers() before cleanup().
Thu, 17 Nov 2016 21:43:01 +0900 worker: fix missed break on successful waitpid()
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 21:43:01 +0900] rev 30431
worker: fix missed break on successful waitpid() Follow-up for 5069a8a40b1b.
Thu, 10 Nov 2016 16:49:42 -0500 filterpyflakes: dramatically simplify the entire thing by blacklisting
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:49:42 -0500] rev 30430
filterpyflakes: dramatically simplify the entire thing by blacklisting We've only got one kind of pyflakes failure left in our codebase, so it's time to switch over to a blacklist-based checking scheme. I've left in the filtering of two undefined names for now out of paranoia, but those can probably go too.
Thu, 10 Nov 2016 16:07:24 -0500 run-tests: forward Python USER_BASE from site (issue5425)
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:07:24 -0500] rev 30429
run-tests: forward Python USER_BASE from site (issue5425) We do this so that any linters installed via pip install --user don't break. See https://docs.python.org/2/library/site.html#site.USER_BASE for a description of what this nonsense is all about. An alternative would be to not set HOME, but that'll cause other problems (see issue2707), or to forward every single path entry from sys.path in PYTHONPATH (which seems sketchy in its own way).
Tue, 15 Nov 2016 20:25:51 +0000 util: improve iterfile so it chooses code path wisely
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 20:25:51 +0000] rev 30428
util: improve iterfile so it chooses code path wisely We have performance concerns on "iterfile" as it is 4X slower on normal files. While modern systems have the nice property that reading a "fast" (on-disk) file cannot be interrupted and should be made use of. This patch dumps the related knowledge in comments. And "iterfile" chooses code paths wisely: 1. If it's CPython 3, or PyPY, use the fast path. 2. If fp is a normal file, use the fast path. 3. If fp is not a normal file and CPython version >= 2.7.4, use the same workaround (4x slower) as before. 4. If fp is not a normal file and CPython version < 2.7.4, use another workaround (2x slower but may block longer then necessary) which basically re-invents the buffer + readline logic in Python. This will give us good confidence on both correctness and performance dealing with EINTR in iterfile(fp) for all known supported Python versions.
Wed, 16 Nov 2016 23:29:28 -0500 merge with stable
Augie Fackler <augie@google.com> [Wed, 16 Nov 2016 23:29:28 -0500] rev 30427
merge with stable
Sat, 12 Nov 2016 03:06:07 +0000 worker: stop using a separate thread waiting for children
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:06:07 +0000] rev 30426
worker: stop using a separate thread waiting for children Now that we have a SIGCHLD hander, and it could get executed when waiting for I/O. It's no longer necessary to have a separated waitpid thread. So just remove it.
Sat, 12 Nov 2016 03:07:22 +0000 worker: add a SIGCHLD handler to collect worker immediately
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:07:22 +0000] rev 30425
worker: add a SIGCHLD handler to collect worker immediately As planned by previous patches, add a SIGCHLD handler to get notifications about worker exits, and deals with worker failure immediately. Note that the SIGCHLD handler gets unregistered before killworkers(), so SIGCHLD won't interrupt "killworkers" - making it harder to send kill signals to waited processes.
Tue, 15 Nov 2016 02:12:16 +0000 worker: make waitforworkers reentrant
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 02:12:16 +0000] rev 30424
worker: make waitforworkers reentrant We are going to use it in the SIGCHLD handler. The handler will be executed in the main thread with the non-blocking version of waitpid, while the waitforworkers thread runs the blocking version. It's possible that one of them collects a worker and makes the other error out (no child to wait). This patch handles these errors: ECHILD is ignored. EINTR needs a retry. The "pids" set is designed to be only modifiable by "waitforworkers". And we only remove items after a successful waitpid. Since a child process can only be "waitpid"-ed once. It's guaranteed that "pids.remove(p)" won't be called with duplicated "p"s. And once a "p" is removed from "pids", that "p" does not need to be killed or waited any more.
(0) -30000 -10000 -3000 -1000 -300 -100 -50 -30 +30 +50 +100 +300 +1000 +3000 +10000 tip