Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 13:00:16 -0800] rev 30793
revlog: move decompress() from module to revlog class (API)
Upcoming patches will convert revlogs to use the compression engine
APIs to perform all things compression. The yet-to-be-introduced
APIs support a persistent "compressor" object so the same object
can be reused for multiple compression operations, leading to
better performance. In addition, compression engines like zstd
may wish to tweak compression engine state based on the revlog
(e.g. per-revlog compression dictionaries).
A global and shared decompress() function will shortly no longer
make much sense. So, we move decompress() to be a method of the
revlog class. It joins compress() there.
On the mozilla-unified repo, we can measure the impact of this change
on reading performance:
$ hg perfrevlogchunks -c
! chunk
! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6)
! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6
! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)
"chunk" appeared to become slower but "chunk batch" got faster. Upon
further examination by running both sets multiple times, the numbers
appear to converge across all runs. This tells me that there is no
perceived performance impact to this refactor.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 11:50:17 -0800] rev 30792
revlog: make compressed size comparisons consistent
revlog.compress() compares the compressed size to the input size
and throws away the compressed data if it is larger than the input.
This is the correct thing to do, as storing compressed data that
is larger than the input takes up more storage space and makes reading
slower.
However, the comparison was implemented inconsistently. For the
streaming compression mode, we threw away the result if it was
greater than or equal to the input size. But for the one-shot
compression, we threw away the compression only if it was greater
than the input size!
This patch changes the comparison for the simple case so it is
consistent with the streaming case.
As a few tests demonstrate, this adds 1 byte to some revlog entries.
This is because of an added 'u' header on the chunk. It seems
somewhat wrong to increase the revlog size here. However, IMO the cost
of 1 byte in storage is insignificant compared to the performance gains
of avoiding decompression. This patch should invite questions around
the heuristic for throwing away compressed data. For example, I'd argue
we should be more liberal about rejecting compressed data, additionally
doing so where the number of bytes saved fails to reach a threshold.
But we can have this discussion another time.
Sean Farley <sean@farley.io> [Sat, 07 Jan 2017 20:43:49 -0800] rev 30791
similar: rename local variable to not collide with previous
Future patches will move the score function to the module level, so
let's not shadow that.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 10:59:45 -0800] rev 30790
patch: add label for coloring the index extended header
Just like the summary says, this will colorize the:
index
3d3ba4b65e11..
57274a0f46b2 100644
line in the diff output.
Sean Farley <sean@farley.io> [Sat, 31 Dec 2016 15:41:57 -0600] rev 30789
patch: add index line for diff output
This helps highlighting in third-party diff coloring (which assumes git
output) and maintains pedantic correctness with diff --git.
Tests will be added at the end of the series.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:13:47 -0800] rev 30788
patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,
diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
index 112edf7..d6b52c5 100644
We'll put this in the experimental section for now.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 12:05:23 -0800] rev 30787
bisect: refer directly to bisect() revset predicate in help
We have specific syntax for displaying the help text for a particular
revset predicate, so let's refer directly to the bisect() revset in
the verbose bisect help. It seems likely that the user doesn't care
about other revsets at that point, so they will probably not miss the
text about the other revset predicates.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:43:26 -0800] rev 30786
tests: use "hg help revisions.<predicate>" instead of grepping
We have specific syntax for displaying the help text for a particular
revset predicate, so use that instead of grepping through the full
output.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 12 Jan 2017 11:52:05 -0800] rev 30785
help: remove now-redundant pointer to revsets help
"hg help revisions" and "hg help revsets" now point to the same text,
so drop the revsets reference.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 07 Jan 2017 23:35:35 -0500] rev 30784
help: eliminate duplicate text for revset string patterns
There's no reason to duplicate this so many times, and it's likely an instance
will be missed if support for a new pattern is added and documented. The
stringmatcher is mostly used by revsets, though it is also used for the 'tag'
related templates, and namespace filtering in the journal extension. So maybe
there's a better place to document it. `hg help patterns` seems inappropriate,
because that is all file pattern matching.
While here, indicate how to perform case insensitive regex searches.