Mon, 02 Jan 2017 12:39:03 -0800 util: compression APIs to support revlog compression
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 12:39:03 -0800] rev 30794
util: compression APIs to support revlog compression As part of "zstd all of the things," we need to teach revlogs to use non-zlib compression formats. Because we're routing all compression via the "compression manager" and "compression engine" APIs, we need to introduction functionality there for performing revlog operations. Ideally, revlog compression and decompression operations would be implemented in terms of simple "compress" and "decompress" primitives. However, there are a few considerations that make us want to have a specialized primitive for handling revlogs: 1) Performance. Revlogs tend to do compression and especially decompression operations in batches. Any overhead for e.g. instantiating a "context" for performing an operation can be noticed. For this reason, our "revlog compressor" primitive is reusable. For zstd, we reuse the same compression "context" for multiple operations. I've measured this to have a performance impact versus constructing new contexts for each operation. 2) Specialization. By having a primitive dedicated to revlog use, we can make revlog-specific choices and leave the door open for more functionality in the future. For example, the zstd revlog compressor may one day make use of dictionary compression. A future patch will introduce a decompress() on the compressor object. The code for the zlib compressor is basically copied from revlog.compress(). Although it doesn't handle the empty input case, the null first byte case, and the 'u' prefix case. These cases will continue to be handled in revlog.py once that code is ported to use this API.
Mon, 02 Jan 2017 13:00:16 -0800 revlog: move decompress() from module to revlog class (API)
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 13:00:16 -0800] rev 30793
revlog: move decompress() from module to revlog class (API) Upcoming patches will convert revlogs to use the compression engine APIs to perform all things compression. The yet-to-be-introduced APIs support a persistent "compressor" object so the same object can be reused for multiple compression operations, leading to better performance. In addition, compression engines like zstd may wish to tweak compression engine state based on the revlog (e.g. per-revlog compression dictionaries). A global and shared decompress() function will shortly no longer make much sense. So, we move decompress() to be a method of the revlog class. It joins compress() there. On the mozilla-unified repo, we can measure the impact of this change on reading performance: $ hg perfrevlogchunks -c ! chunk ! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6) ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6 ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) "chunk" appeared to become slower but "chunk batch" got faster. Upon further examination by running both sets multiple times, the numbers appear to converge across all runs. This tells me that there is no perceived performance impact to this refactor.
Mon, 02 Jan 2017 11:50:17 -0800 revlog: make compressed size comparisons consistent
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:50:17 -0800] rev 30792
revlog: make compressed size comparisons consistent revlog.compress() compares the compressed size to the input size and throws away the compressed data if it is larger than the input. This is the correct thing to do, as storing compressed data that is larger than the input takes up more storage space and makes reading slower. However, the comparison was implemented inconsistently. For the streaming compression mode, we threw away the result if it was greater than or equal to the input size. But for the one-shot compression, we threw away the compression only if it was greater than the input size! This patch changes the comparison for the simple case so it is consistent with the streaming case. As a few tests demonstrate, this adds 1 byte to some revlog entries. This is because of an added 'u' header on the chunk. It seems somewhat wrong to increase the revlog size here. However, IMO the cost of 1 byte in storage is insignificant compared to the performance gains of avoiding decompression. This patch should invite questions around the heuristic for throwing away compressed data. For example, I'd argue we should be more liberal about rejecting compressed data, additionally doing so where the number of bytes saved fails to reach a threshold. But we can have this discussion another time.
Sat, 07 Jan 2017 20:43:49 -0800 similar: rename local variable to not collide with previous
Sean Farley <sean@farley.io> [Sat, 07 Jan 2017 20:43:49 -0800] rev 30791
similar: rename local variable to not collide with previous Future patches will move the score function to the module level, so let's not shadow that.
Mon, 09 Jan 2017 10:59:45 -0800 patch: add label for coloring the index extended header
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 10:59:45 -0800] rev 30790
patch: add label for coloring the index extended header Just like the summary says, this will colorize the: index 3d3ba4b65e11..57274a0f46b2 100644 line in the diff output.
Sat, 31 Dec 2016 15:41:57 -0600 patch: add index line for diff output
Sean Farley <sean@farley.io> [Sat, 31 Dec 2016 15:41:57 -0600] rev 30789
patch: add index line for diff output This helps highlighting in third-party diff coloring (which assumes git output) and maintains pedantic correctness with diff --git. Tests will be added at the end of the series.
Mon, 09 Jan 2017 11:13:47 -0800 patch: add config knob for displaying the index header
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:13:47 -0800] rev 30788
patch: add config knob for displaying the index header This config knob can take an integer between 0 and 40 or a keyword ('none', 'short', 'full') to control the length of hash to output. It will display diffs with the git index header as such, diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py index 112edf7..d6b52c5 100644 We'll put this in the experimental section for now.
Thu, 12 Jan 2017 12:05:23 -0800 bisect: refer directly to bisect() revset predicate in help
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 12:05:23 -0800] rev 30787
bisect: refer directly to bisect() revset predicate in help We have specific syntax for displaying the help text for a particular revset predicate, so let's refer directly to the bisect() revset in the verbose bisect help. It seems likely that the user doesn't care about other revsets at that point, so they will probably not miss the text about the other revset predicates.
Thu, 12 Jan 2017 11:43:26 -0800 tests: use "hg help revisions.<predicate>" instead of grepping
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:43:26 -0800] rev 30786
tests: use "hg help revisions.<predicate>" instead of grepping We have specific syntax for displaying the help text for a particular revset predicate, so use that instead of grepping through the full output.
Thu, 12 Jan 2017 11:52:05 -0800 help: remove now-redundant pointer to revsets help
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:52:05 -0800] rev 30785
help: remove now-redundant pointer to revsets help "hg help revisions" and "hg help revsets" now point to the same text, so drop the revsets reference.
(0) -30000 -10000 -3000 -1000 -300 -100 -10 +10 +100 +300 +1000 +3000 +10000 tip