hg-stable: Changelog

Mercurial Mercurial > hg-stable / changelog

Mon, 02 Jan 2017 11:22:52 -0800 revlog: use compression engine API for compression

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:22:52 -0800] rev 30795

revlog: use compression engine API for compression This commit swaps in the just-added revlog compressor API into the revlog class. Instead of implementing zlib compression inline in compress(), we now store a cached-on-first-use revlog compressor on each revlog instance and invoke its "compress()" method. As part of this, revlog.compress() has been refactored a bit to use a cleaner code flow and modern formatting (e.g. avoiding parenthesis around returned tuples). On a mozilla-unified repo, here are the "compress" times for a few commands: $ hg perfrevlogchunks -c ! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3) ! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3) $ hg perfrevlogchunks -m ! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3) ! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3) Compression times did seem to slow down just a little. There are 360,210 changelog revisions and 359,342 manifest revisions. For the changelog, mean time to compress a revision increased from ~16.025us to ~16.088us. That's basically a function call or an attribute lookup. I suppose this is the price you pay for abstraction. It's so low that I'm not concerned.

Mon, 02 Jan 2017 12:39:03 -0800 util: compression APIs to support revlog compression

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 12:39:03 -0800] rev 30794

util: compression APIs to support revlog compression As part of "zstd all of the things," we need to teach revlogs to use non-zlib compression formats. Because we're routing all compression via the "compression manager" and "compression engine" APIs, we need to introduction functionality there for performing revlog operations. Ideally, revlog compression and decompression operations would be implemented in terms of simple "compress" and "decompress" primitives. However, there are a few considerations that make us want to have a specialized primitive for handling revlogs: 1) Performance. Revlogs tend to do compression and especially decompression operations in batches. Any overhead for e.g. instantiating a "context" for performing an operation can be noticed. For this reason, our "revlog compressor" primitive is reusable. For zstd, we reuse the same compression "context" for multiple operations. I've measured this to have a performance impact versus constructing new contexts for each operation. 2) Specialization. By having a primitive dedicated to revlog use, we can make revlog-specific choices and leave the door open for more functionality in the future. For example, the zstd revlog compressor may one day make use of dictionary compression. A future patch will introduce a decompress() on the compressor object. The code for the zlib compressor is basically copied from revlog.compress(). Although it doesn't handle the empty input case, the null first byte case, and the 'u' prefix case. These cases will continue to be handled in revlog.py once that code is ported to use this API.

Mon, 02 Jan 2017 13:00:16 -0800 revlog: move decompress() from module to revlog class (API)

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 13:00:16 -0800] rev 30793

revlog: move decompress() from module to revlog class (API) Upcoming patches will convert revlogs to use the compression engine APIs to perform all things compression. The yet-to-be-introduced APIs support a persistent "compressor" object so the same object can be reused for multiple compression operations, leading to better performance. In addition, compression engines like zstd may wish to tweak compression engine state based on the revlog (e.g. per-revlog compression dictionaries). A global and shared decompress() function will shortly no longer make much sense. So, we move decompress() to be a method of the revlog class. It joins compress() there. On the mozilla-unified repo, we can measure the impact of this change on reading performance: $ hg perfrevlogchunks -c ! chunk ! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6) ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6 ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) "chunk" appeared to become slower but "chunk batch" got faster. Upon further examination by running both sets multiple times, the numbers appear to converge across all runs. This tells me that there is no perceived performance impact to this refactor.

Mon, 02 Jan 2017 11:50:17 -0800 revlog: make compressed size comparisons consistent

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:50:17 -0800] rev 30792

revlog: make compressed size comparisons consistent revlog.compress() compares the compressed size to the input size and throws away the compressed data if it is larger than the input. This is the correct thing to do, as storing compressed data that is larger than the input takes up more storage space and makes reading slower. However, the comparison was implemented inconsistently. For the streaming compression mode, we threw away the result if it was greater than or equal to the input size. But for the one-shot compression, we threw away the compression only if it was greater than the input size! This patch changes the comparison for the simple case so it is consistent with the streaming case. As a few tests demonstrate, this adds 1 byte to some revlog entries. This is because of an added 'u' header on the chunk. It seems somewhat wrong to increase the revlog size here. However, IMO the cost of 1 byte in storage is insignificant compared to the performance gains of avoiding decompression. This patch should invite questions around the heuristic for throwing away compressed data. For example, I'd argue we should be more liberal about rejecting compressed data, additionally doing so where the number of bytes saved fails to reach a threshold. But we can have this discussion another time.

Sat, 07 Jan 2017 20:43:49 -0800 similar: rename local variable to not collide with previous

changeset

Sean Farley <sean@farley.io> [Sat, 07 Jan 2017 20:43:49 -0800] rev 30791

similar: rename local variable to not collide with previous Future patches will move the score function to the module level, so let's not shadow that.

Mon, 09 Jan 2017 10:59:45 -0800 patch: add label for coloring the index extended header

changeset

Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 10:59:45 -0800] rev 30790

patch: add label for coloring the index extended header Just like the summary says, this will colorize the: index 3d3ba4b65e11..57274a0f46b2 100644 line in the diff output.

Sat, 31 Dec 2016 15:41:57 -0600 patch: add index line for diff output

changeset

Sean Farley <sean@farley.io> [Sat, 31 Dec 2016 15:41:57 -0600] rev 30789

patch: add index line for diff output This helps highlighting in third-party diff coloring (which assumes git output) and maintains pedantic correctness with diff --git. Tests will be added at the end of the series.

Mon, 09 Jan 2017 11:13:47 -0800 patch: add config knob for displaying the index header

changeset

Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:13:47 -0800] rev 30788

patch: add config knob for displaying the index header This config knob can take an integer between 0 and 40 or a keyword ('none', 'short', 'full') to control the length of hash to output. It will display diffs with the git index header as such, diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py index 112edf7..d6b52c5 100644 We'll put this in the experimental section for now.

Thu, 12 Jan 2017 12:05:23 -0800 bisect: refer directly to bisect() revset predicate in help

changeset

Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 12:05:23 -0800] rev 30787

bisect: refer directly to bisect() revset predicate in help We have specific syntax for displaying the help text for a particular revset predicate, so let's refer directly to the bisect() revset in the verbose bisect help. It seems likely that the user doesn't care about other revsets at that point, so they will probably not miss the text about the other revset predicates.

Thu, 12 Jan 2017 11:43:26 -0800 tests: use "hg help revisions.<predicate>" instead of grepping

changeset

Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:43:26 -0800] rev 30786

tests: use "hg help revisions.<predicate>" instead of grepping We have specific syntax for displaying the help text for a particular revset predicate, so use that instead of grepping through the full output.

Thu, 12 Jan 2017 11:52:05 -0800 help: remove now-redundant pointer to revsets help

changeset

Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:52:05 -0800] rev 30785

help: remove now-redundant pointer to revsets help "hg help revisions" and "hg help revsets" now point to the same text, so drop the revsets reference.

Sat, 07 Jan 2017 23:35:35 -0500 help: eliminate duplicate text for revset string patterns

changeset

Matt Harbison <matt_harbison@yahoo.com> [Sat, 07 Jan 2017 23:35:35 -0500] rev 30784

help: eliminate duplicate text for revset string patterns There's no reason to duplicate this so many times, and it's likely an instance will be missed if support for a new pattern is added and documented. The stringmatcher is mostly used by revsets, though it is also used for the 'tag' related templates, and namespace filtering in the journal extension. So maybe there's a better place to document it. `hg help patterns` seems inappropriate, because that is all file pattern matching. While here, indicate how to perform case insensitive regex searches.

Sat, 07 Jan 2017 21:26:32 -0500 revset: add regular expression support to 'desc'

changeset

Matt Harbison <matt_harbison@yahoo.com> [Sat, 07 Jan 2017 21:26:32 -0500] rev 30783

revset: add regular expression support to 'desc' This is a case insensitive predicate like 'author', so it conforms to the existing behavior of performing a case insensitive regex.

Wed, 11 Jan 2017 22:42:10 -0500 revset: stop lowercasing the regex pattern for 'author'

changeset

Matt Harbison <matt_harbison@yahoo.com> [Wed, 11 Jan 2017 22:42:10 -0500] rev 30782

revset: stop lowercasing the regex pattern for 'author' It was probably unintentional for regex, as the meaning of some sequences like \S and \s is actually inverted by changing the case. For backward compatibility however, the matching is forced to case insensitive.

Thu, 24 Nov 2016 18:45:29 -0800 repair: clean up stale lock file from store backup

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Thu, 24 Nov 2016 18:45:29 -0800] rev 30781

repair: clean up stale lock file from store backup Since we did a directory rename on the stores, the source repository's lock path now references the dest repository's lock path and the dest repository's lock path now references a non-existent filename. So releasing the lock on the source will unlock the dest and releasing the lock on the dest will no-op because it fails due to file not found. So we clean up the dest's lock manually.

Thu, 24 Nov 2016 18:34:50 -0800 repair: copy non-revlog store files during upgrade

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Thu, 24 Nov 2016 18:34:50 -0800] rev 30780

repair: copy non-revlog store files during upgrade The store contains more than just revlogs. This patch teaches the upgrade code to copy regular files as well. As the test changes demonstrate, the phaseroots file is now copied.

(0) -30000 -10000 -3000 -1000 -300 -100 -16 +16 +100 +300 +1000 +3000 +10000 tip

hg-stable

RSS Atom