Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:22:52 -0800] rev 30795
revlog: use compression engine API for compression
This commit swaps in the just-added revlog compressor API into
the revlog class.
Instead of implementing zlib compression inline in compress(), we
now store a cached-on-first-use revlog compressor on each revlog
instance and invoke its "compress()" method.
As part of this, revlog.compress() has been refactored a bit to use
a cleaner code flow and modern formatting (e.g. avoiding
parenthesis around returned tuples).
On a mozilla-unified repo, here are the "compress" times for a few
commands:
$ hg perfrevlogchunks -c
! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3)
! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3)
$ hg perfrevlogchunks -m
! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3)
! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3)
Compression times did seem to slow down just a little. There are
360,210 changelog revisions and 359,342 manifest revisions. For the
changelog, mean time to compress a revision increased from ~16.025us to
~16.088us. That's basically a function call or an attribute lookup. I
suppose this is the price you pay for abstraction. It's so low that
I'm not concerned.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 12:39:03 -0800] rev 30794
util: compression APIs to support revlog compression
As part of "zstd all of the things," we need to teach revlogs to
use non-zlib compression formats. Because we're routing all compression
via the "compression manager" and "compression engine" APIs, we need to
introduction functionality there for performing revlog operations.
Ideally, revlog compression and decompression operations would be
implemented in terms of simple "compress" and "decompress" primitives.
However, there are a few considerations that make us want to have a
specialized primitive for handling revlogs:
1) Performance. Revlogs tend to do compression and especially
decompression operations in batches. Any overhead for e.g.
instantiating a "context" for performing an operation can be
noticed. For this reason, our "revlog compressor" primitive is
reusable. For zstd, we reuse the same compression "context" for
multiple operations. I've measured this to have a performance
impact versus constructing new contexts for each operation.
2) Specialization. By having a primitive dedicated to revlog use,
we can make revlog-specific choices and leave the door open for
more functionality in the future. For example, the zstd revlog
compressor may one day make use of dictionary compression.
A future patch will introduce a decompress() on the compressor
object.
The code for the zlib compressor is basically copied from
revlog.compress(). Although it doesn't handle the empty input
case, the null first byte case, and the 'u' prefix case. These
cases will continue to be handled in revlog.py once that code is
ported to use this API.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 13:00:16 -0800] rev 30793
revlog: move decompress() from module to revlog class (API)
Upcoming patches will convert revlogs to use the compression engine
APIs to perform all things compression. The yet-to-be-introduced
APIs support a persistent "compressor" object so the same object
can be reused for multiple compression operations, leading to
better performance. In addition, compression engines like zstd
may wish to tweak compression engine state based on the revlog
(e.g. per-revlog compression dictionaries).
A global and shared decompress() function will shortly no longer
make much sense. So, we move decompress() to be a method of the
revlog class. It joins compress() there.
On the mozilla-unified repo, we can measure the impact of this change
on reading performance:
$ hg perfrevlogchunks -c
! chunk
! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6)
! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6
! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)
"chunk" appeared to become slower but "chunk batch" got faster. Upon
further examination by running both sets multiple times, the numbers
appear to converge across all runs. This tells me that there is no
perceived performance impact to this refactor.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:50:17 -0800] rev 30792
revlog: make compressed size comparisons consistent
revlog.compress() compares the compressed size to the input size
and throws away the compressed data if it is larger than the input.
This is the correct thing to do, as storing compressed data that
is larger than the input takes up more storage space and makes reading
slower.
However, the comparison was implemented inconsistently. For the
streaming compression mode, we threw away the result if it was
greater than or equal to the input size. But for the one-shot
compression, we threw away the compression only if it was greater
than the input size!
This patch changes the comparison for the simple case so it is
consistent with the streaming case.
As a few tests demonstrate, this adds 1 byte to some revlog entries.
This is because of an added 'u' header on the chunk. It seems
somewhat wrong to increase the revlog size here. However, IMO the cost
of 1 byte in storage is insignificant compared to the performance gains
of avoiding decompression. This patch should invite questions around
the heuristic for throwing away compressed data. For example, I'd argue
we should be more liberal about rejecting compressed data, additionally
doing so where the number of bytes saved fails to reach a threshold.
But we can have this discussion another time.
Sean Farley <sean@farley.io> [Sat, 07 Jan 2017 20:43:49 -0800] rev 30791
similar: rename local variable to not collide with previous
Future patches will move the score function to the module level, so
let's not shadow that.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 10:59:45 -0800] rev 30790
patch: add label for coloring the index extended header
Just like the summary says, this will colorize the:
index 3d3ba4b65e11..57274a0f46b2 100644
line in the diff output.
Sean Farley <sean@farley.io> [Sat, 31 Dec 2016 15:41:57 -0600] rev 30789
patch: add index line for diff output
This helps highlighting in third-party diff coloring (which assumes git
output) and maintains pedantic correctness with diff --git.
Tests will be added at the end of the series.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:13:47 -0800] rev 30788
patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,
diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
index 112edf7..d6b52c5 100644
We'll put this in the experimental section for now.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 12:05:23 -0800] rev 30787
bisect: refer directly to bisect() revset predicate in help
We have specific syntax for displaying the help text for a particular
revset predicate, so let's refer directly to the bisect() revset in
the verbose bisect help. It seems likely that the user doesn't care
about other revsets at that point, so they will probably not miss the
text about the other revset predicates.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:43:26 -0800] rev 30786
tests: use "hg help revisions.<predicate>" instead of grepping
We have specific syntax for displaying the help text for a particular
revset predicate, so use that instead of grepping through the full
output.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:52:05 -0800] rev 30785
help: remove now-redundant pointer to revsets help
"hg help revisions" and "hg help revsets" now point to the same text,
so drop the revsets reference.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 07 Jan 2017 23:35:35 -0500] rev 30784
help: eliminate duplicate text for revset string patterns
There's no reason to duplicate this so many times, and it's likely an instance
will be missed if support for a new pattern is added and documented. The
stringmatcher is mostly used by revsets, though it is also used for the 'tag'
related templates, and namespace filtering in the journal extension. So maybe
there's a better place to document it. `hg help patterns` seems inappropriate,
because that is all file pattern matching.
While here, indicate how to perform case insensitive regex searches.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 07 Jan 2017 21:26:32 -0500] rev 30783
revset: add regular expression support to 'desc'
This is a case insensitive predicate like 'author', so it conforms to the
existing behavior of performing a case insensitive regex.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 11 Jan 2017 22:42:10 -0500] rev 30782
revset: stop lowercasing the regex pattern for 'author'
It was probably unintentional for regex, as the meaning of some sequences like
\S and \s is actually inverted by changing the case. For backward compatibility
however, the matching is forced to case insensitive.
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 24 Nov 2016 18:45:29 -0800] rev 30781
repair: clean up stale lock file from store backup
Since we did a directory rename on the stores, the source
repository's lock path now references the dest repository's
lock path and the dest repository's lock path now references
a non-existent filename.
So releasing the lock on the source will unlock the dest and
releasing the lock on the dest will no-op because it fails due
to file not found. So we clean up the dest's lock manually.
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 24 Nov 2016 18:34:50 -0800] rev 30780
repair: copy non-revlog store files during upgrade
The store contains more than just revlogs. This patch teaches the
upgrade code to copy regular files as well.
As the test changes demonstrate, the phaseroots file is now copied.