Thu, 17 Nov 2016 20:09:10 -0800 setup: add flag to build_ext to control building zstd
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 20:09:10 -0800] rev 30450
setup: add flag to build_ext to control building zstd Downstream packagers will inevitably want to disable building the vendored python-zstandard Python package. Rather than force them to patch setup.py, let's give them a knob to use. distutils Command classes support defining custom options. It requires setting certain class attributes (yes, class attributes: instance attributes don't work because the class type is consulted before it is instantiated). We already have a custom child class of build_ext, so we set these class attributes, implement some scaffolding, and override build_extensions to filter the Extension instance for the zstd extension if the `--no-zstd` argument is specified. Example usage: $ python setup.py build_ext --no-zstd
Wed, 09 Nov 2016 16:01:34 +0000 drawdag: update test repos by drawing the changelog DAG in ASCII
Jun Wu <quark@fb.com> [Wed, 09 Nov 2016 16:01:34 +0000] rev 30449
drawdag: update test repos by drawing the changelog DAG in ASCII Currently, we have "debugbuilddag" which is a powerful tool to build test cases but not intuitive. We may end up running "hg log" in the test to make the test more readable. This patch adds a "drawdag" extension with a "debugdrawdag" command for similar testing purpose. Unlike the cryptic "debugbuilddag" command, it reads an ASCII graph that is intuitive to human, so the test case can be more readable. Unlike "debugbuilddag", "drawdag" does not require an empty repo. So it can be used to add new changesets to an existing repo. Since the "drawdag" logic is not that trivial and only makes sense for testing purpose, the extension is added to the "tests" directory, to make the core logic clean. If we find it useful (for example, to demonstrate cases and help user understand some cases) and want to ship it by default in the future, we can move it to a ship-by-default "debugdrawdag" at that time.
Wed, 14 Jan 2015 01:15:26 +0100 posix: give checklink a fast path that cache the check file and is read only
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30448
posix: give checklink a fast path that cache the check file and is read only util.checklink would create a symlink and remove it again. That would sometimes happen multiple times. Write operations are relatively expensive and give disk tear and noise for applications monitoring file system activity. Instead of creating a symlink and deleting it again, just create it once and leave it in .hg/cache/check-link . If the file exists, just verify that os.islink reports true. We will assume that this check is as good as symlink creation not failing. Note: The symlink left in .hg/cache has to resolve to a file - otherwise 'make dist' will fail ... test-symlink-os-yes-fs-no.py does some monkey patching to simulate a platform without symlink support. The slightly different testing method requires additional monkeying.
Thu, 17 Nov 2016 12:59:36 +0100 posix: move checklink test file to .hg/cache
Mads Kiilerich <madski@unity3d.com> [Thu, 17 Nov 2016 12:59:36 +0100] rev 30447
posix: move checklink test file to .hg/cache This avoids unnecessary churn in the working directory. It is not necessarily a fully valid assumption that .hg/cache is on the same filesystem as the working directory, but I think it is an acceptable approximation. It could also be the case that different parts of the working directory is on different mount points so checking in the root folder could also be wrong.
Wed, 14 Jan 2015 01:15:26 +0100 posix: give checkexec a fast path; keep the check files and test read only
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30446
posix: give checkexec a fast path; keep the check files and test read only Before, Mercurial would create a new temporary file every time, stat it, change its exec mode, stat it again, and delete it. Most of this dance was done to handle the rare and not-so-essential case of VFAT mounts on unix. The cost of that was paid by the much more common and important case of using normal file systems. Instead, try to create and preserve .hg/cache/checkisexec and .hg/cache/checknoexec with and without exec flag set. If the files exist and have correct exec flags set, we can conclude that that file system supports the exec flag. Best case, the whole exec check can thus be done with two stat calls. Worst case, we delete the wrong files and check as usual. That will be because temporary loss of exec bit or on file systems without support for the exec bit. In that case we check as we did before, with the additional overhead of one extra stat call. It is possible that this different test algorithm in some cases on odd file systems will give different behaviour. Again, I think it will be rare and special cases and I think it is worth the risk. test-clone.t happens to show the situation where checkisexec is left behind from the old style check, while checknoexec only will be created next time a exec check will be performed.
Wed, 14 Jan 2015 01:15:26 +0100 posix: simplify checkexec check
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30445
posix: simplify checkexec check Use a slightly simpler logic that in some cases can avoid an unnecessary chmod and stat. Instead of flipping the X bits, make it more clear that we rely on no X bits being set on initial file creation, and that at least some of them stick after they all have been set.
Thu, 17 Nov 2016 12:59:36 +0100 posix: move checkexec test file to .hg/cache
Mads Kiilerich <madski@unity3d.com> [Thu, 17 Nov 2016 12:59:36 +0100] rev 30444
posix: move checkexec test file to .hg/cache This avoids unnecessary churn in the working directory. It is not necessarily a fully valid assumption that .hg/cache is on the same filesystem as the working directory, but I think it is an acceptable approximation. It could also be the case that different parts of the working directory is on different mount points so checking in the root folder could also be wrong.
Thu, 17 Nov 2016 15:31:19 -0800 manifest: move manifestctx creation into manifestlog.get()
Durham Goode <durham@fb.com> [Thu, 17 Nov 2016 15:31:19 -0800] rev 30443
manifest: move manifestctx creation into manifestlog.get() Most manifestctx creation already happened in manifestlog.get(), but there was one spot in the manifestctx class itself that created an instance manually. This patch makes that one instance go through the manifestlog. This means extensions can just wrap manifestlog.get() and it will cover all manifestctx creations. It also means this code path now hits the manifestlog cache.
Fri, 11 Nov 2016 01:10:07 -0800 util: implement zstd compression engine
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 11 Nov 2016 01:10:07 -0800] rev 30442
util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
Thu, 10 Nov 2016 23:38:41 -0800 hghave: add check for zstd support
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:38:41 -0800] rev 30441
hghave: add check for zstd support Not all configurations will support zstd. Add a check so we can conditionalize tests.
Thu, 10 Nov 2016 23:34:15 -0800 exchange: obtain compression engines from the registrar
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:34:15 -0800] rev 30440
exchange: obtain compression engines from the registrar util.compengines has knowledge of all registered compression engines and the metadata that associates them with various bundle types. This patch removes the now redundant declaration of this metadata from exchange.py and obtains it from the new source. The effect of this patch is that once a new compression engine is registered with util.compengines, `hg bundle -t <engine>` will just work.
Thu, 10 Nov 2016 23:29:01 -0800 bundle2: equate 'UN' with no compression
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:29:01 -0800] rev 30439
bundle2: equate 'UN' with no compression An upcoming patch will change the "alg" argument passed to this function from None to "UN" when no compression is wanted. The existing implementation of bundle2 does not set a "Compression" parameter if no compression is used. In theory, setting "Compression=UN" should work. But I haven't audited the code to see if all client versions supporting bundle2 will accept this. Rather than take the risk, avoid the BC breakage and treat "UN" the same as None.
Thu, 10 Nov 2016 23:15:02 -0800 util: check for compression engine availability before returning
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:15:02 -0800] rev 30438
util: check for compression engine availability before returning If a requested compression engine is registered but not available, requesting it will now abort. To be honest, I'm not sure if this is the appropriate mechanism for handling optional compression engines. I won't know until all uses of compression (bundles, wire protocol, revlogs, etc) are using the new API and zstd (our planned optional engine) is implemented. So this API could change.
Thu, 10 Nov 2016 23:03:48 -0800 util: expose an "available" API on compression engines
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:03:48 -0800] rev 30437
util: expose an "available" API on compression engines When the zstd compression engine is introduced, it won't work in all installations, namely pure Python installs. So, we need a mechanism to declare whether a compression engine is available. We don't want to conditionally register the compression engine because it is sometimes useful to know when a compression engine name or encountered data is valid but just not available versus unknown.
Thu, 10 Nov 2016 22:26:35 -0800 setup: compile zstd C extension
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:26:35 -0800] rev 30436
setup: compile zstd C extension Now that zstd and python-zstandard are vendored, we can start compiling them as part of the install. python-zstandard provides a self-contained Python function that returns a distutils.extension.Extension, so it is really easy to add zstd to our setup.py without having to worry about defining source files, include paths, etc. The function even allows specifying the module name the extension should be compiled as. This conveniently allows us to compile the module into the "mercurial" package so "our" version won't collide with a version installed under the canonical "zstd" module name.
Thu, 10 Nov 2016 22:15:58 -0800 zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:15:58 -0800] rev 30435
zstd: vendor python-zstandard 0.5.0 As the commit message for the previous changeset says, we wish for zstd to be a 1st class citizen in Mercurial. To make that happen, we need to enable Python to talk to the zstd C API. And that requires bindings. This commit vendors a copy of existing Python bindings. Why do we need to vendor? As the commit message of the previous commit says, relying on systems in the wild to have the bindings or zstd present is a losing proposition. By distributing the zstd and bindings with Mercurial, we significantly increase our chances that zstd will work. Since zstd will deliver a better end-user experience by achieving better performance, this benefits our users. Another reason is that the Python bindings still aren't stable and the API is somewhat fluid. While Mercurial could be coded to target multiple versions of the Python bindings, it is safer to bundle an explicit, known working version. The added Python bindings are mostly a fully-featured interface to the zstd C API. They allow one-shot operations, streaming, reading and writing from objects implements the file object protocol, dictionary compression, control over low-level compression parameters, and more. The Python bindings work on Python 2.6, 2.7, and 3.3+ and have been tested on Linux and Windows. There are CFFI bindings, but they are lacking compared to the C extension. Upstream work will be needed before we can support zstd with PyPy. But it will be possible. The files added in this commit come from Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard and are added without modifications. Some files from the upstream repository have been omitted, namely files related to continuous integration. In the spirit of full disclosure, I'm the maintainer of the "python-zstandard" project and have authored 100% of the code added in this commit. Unfortunately, the Python bindings have not been formally code reviewed by anyone. While I've tested much of the code thoroughly (I even have tests that fuzz APIs), there's a good chance there are bugs, memory leaks, not well thought out APIs, etc. If someone wants to review the code and send feedback to the GitHub project, it would be greatly appreciated. Despite my involvement with both projects, my opinions of code style differ from Mercurial's. The code in this commit introduces numerous code style violations in Mercurial's linters. So, the code is excluded from most lints. However, some violations I agree with. These have been added to the known violations ignore list for now.
(0) -30000 -10000 -3000 -1000 -300 -100 -16 +16 +100 +300 +1000 +3000 +10000 tip