Denis Laxalde <denis.laxalde@logilab.fr> [Tue, 17 Jan 2017 09:17:29 +0100] rev 30825
hgweb: restore ascending iteration on revs in filelog web command
Follow-up on
96f811bceb85. Adjust back the "parity" generator's offset to keep
rendering the same.
Denis Laxalde <denis.laxalde@logilab.fr> [Mon, 16 Jan 2017 09:22:32 +0100] rev 30824
context: extract _changesinrange() out of blockancestors()
We'll need it to write a blockdescendants function in next changeset.
Pulkit Goyal <7895pulkit@gmail.com> [Sat, 14 Jan 2017 01:23:07 +0530] rev 30823
shelve: allow multiple shelves with --patch and --stat
Before this patch, there was a single way to see multiple shelves using
`--patch --list` which show all the shelves. Doing `--patch s1 s2` returns an
error. This patch allows to show multiple shelves using `--patch` and `--stat`.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 14 Jan 2017 19:41:43 -0800] rev 30822
zstd: vendor python-zstandard 0.6.0
Commit
63c68d6f5fc8de4afd9bde81b13b537beb4e47e8 from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).
This includes minor performance and feature improvements. It also
changes the vendored zstd library from 1.1.1 to 1.1.2.
# no-check-commit
Pulkit Goyal <7895pulkit@gmail.com> [Sat, 14 Jan 2017 20:05:15 +0530] rev 30821
util: add length argument to util.buffer()
util.buffer() either returns inbuilt buffer function or defines a new one which
slices. The inbuilt buffer() also has a length argument which is missing from
the ones we defined. This patch adds that length argument.
Pulkit Goyal <7895pulkit@gmail.com> [Sun, 15 Jan 2017 13:17:05 +0530] rev 30820
py3: replace pycompat.getenv with encoding.environ.get
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
Yuya Nishihara <yuya@tcha.org> [Sun, 15 Jan 2017 16:33:15 +0900] rev 30819
patch: check length of git index header only if integer is specified
Otherwise TypeError would be raised. Follows up
d1901c4c8ec0.
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 13 Jan 2017 20:16:56 -0800] rev 30818
localrepo: experimental support for non-zlib revlog compression
The final part of integrating the compression manager APIs into
revlog storage is the plumbing for repositories to advertise they
are using non-zlib storage and for revlogs to instantiate a non-zlib
compression engine.
The main intent of the compression manager work was to zstd all
of the things. Adding zstd to revlogs has proved to be more involved
than other places because revlogs are... special. Very small inputs
and the use of delta chains (which are themselves a form of
compression) are a completely different use case from streaming
compression, which bundles and the wire protocol employ. I've
conducted numerous experiments with zstd in revlogs and have yet
to formalize compression settings and a storage architecture that
I'm confident I won't regret later. In other words, I'm not yet
ready to commit to a new mechanism for using zstd - or any other
compression format - in revlogs.
That being said, having some support for zstd (and other compression
formats) in revlogs in core is beneficial. It can allow others to
conduct experiments.
This patch introduces *highly experimental* support for non-zlib
compression formats in revlogs. Introduced is a config option to
control which compression engine to use. Also introduced is a namespace
of "exp-compression-*" requirements to denote support for non-zlib
compression in revlogs. I've prefixed the namespace with "exp-"
(short for "experimental") because I'm not confident of the
requirements "schema" and in no way want to give the illusion of
supporting these requirements in the future. I fully intend to drop
support for these requirements once we figure out what we're doing
with zstd in revlogs.
A good portion of the patch is teaching the requirements system
about registered compression engines and passing the requested
compression engine as an opener option so revlogs can instantiate
the proper compression engine for new operations.
That's a verbose way of saying "we can now use zstd in revlogs!"
On an `hg pull` conversion of the mozilla-unified repo with no extra
redelta settings (like aggressivemergedeltas), we can see the impact
of zstd vs zlib in revlogs:
$ hg perfrevlogchunks -c
! chunk
! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5)
! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6)
! chunk batch
! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6)
! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6)
$ hg perfrevlogchunks -m
! chunk
! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4)
! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5)
! chunk batch
! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4)
! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6)
$ hg perfrevlog -c -d 1
! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3)
! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3)
$ hg perfrevlog -m -d 1000
! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3)
! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3)
Only modest wins for the changelog. But manifest reading is
significantly faster. What's going on?
One reason might be data volume. zstd decompresses faster. So given
more bytes, it will put more distance between it and zlib.
Another reason is size. In the current design, zstd revlogs are
*larger*:
debugcreatestreamclonebundle (size in bytes)
zlib: 1,638,852,492
zstd: 1,680,601,332
I haven't investigated this fully, but I reckon a significant cause of
larger revlogs is that the zstd frame/header has more bytes than
zlib's. For very small inputs or data that doesn't compress well, we'll
tend to store more uncompressed chunks than with zlib (because the
compressed size isn't smaller than original). This will make revlog
reading faster because it is doing less decompression.
Moving on to bundle performance:
$ hg bundle -a -t none-v2 (total CPU time)
zlib: 102.79s
zstd: 97.75s
So, marginal CPU decrease for reading all chunks in all revlogs
(this is somewhat disappointing).
$ hg bundle -a -t <engine>-v2 (total CPU time)
zlib: 191.59s
zstd: 115.36s
This last test effectively measures the difference between zlib->zlib
and zstd->zstd for revlogs to bundle. This is a rough approximation of
what a server does during `hg clone`.
There are some promising results for zstd. But not enough for me to
feel comfortable advertising it to users. We'll get there...
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 13 Jan 2017 19:58:00 -0800] rev 30817
revlog: use compression engine APIs for decompression
Now that compression engines declare their header in revlog chunks
and can decompress revlog chunks, we refactor revlog.decompress()
to use them.
Making full use of the property that revlog compressor objects are
reusable, revlog instances now maintain a dict mapping an engine's
revlog header to a compressor object. This is not only a performance
optimization for engines where compressor object reuse can result in
better performance, but it also serves as a cache of header values
so we don't need to perform redundant lookups against the compression
engine manager. (Yes, I measured and the overhead of a function call
versus a dict lookup was observed.)
Replacing the previous inline lookup table with a dict lookup was
measured to make chunk reading ~2.5% slower on changelogs and ~4.5%
slower on manifests. So, the inline lookup table has been mostly
preserved so we don't lose performance. This is unfortunate. But
many decompression operations complete in microseconds, so Python
attribute lookup, dict lookup, and function calls do matter.
The impact of this change on mozilla-unified is as follows:
$ hg perfrevlogchunks -c
! chunk
! wall 1.953663 comb 1.950000 user 1.920000 sys 0.030000 (best of 6)
! wall 1.946000 comb 1.940000 user 1.910000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.791075 comb 1.800000 user 1.760000 sys 0.040000 (best of 6)
! wall 1.785690 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)
$ hg perfrevlogchunks -m
! chunk
! wall 2.587262 comb 2.580000 user 2.550000 sys 0.030000 (best of 4)
! wall 2.616330 comb 2.610000 user 2.560000 sys 0.050000 (best of 4)
! chunk batch
! wall 2.427092 comb 2.420000 user 2.400000 sys 0.020000 (best of 5)
! wall 2.462061 comb 2.460000 user 2.400000 sys 0.060000 (best of 4)
Changelog chunk reading is slightly faster but manifest reading is
slower. What gives?
On this repo, 99.85% of changelog entries are zlib compressed (the 'x'
header). On the manifest, 67.5% are zlib and 32.4% are '\0'. This patch
swapped the test order of 'x' and '\0' so now 'x' is tested first. This
makes changelogs faster since they almost always hit the first branch.
This makes a significant percentage of manifest '\0' chunks slower
because that code path now performs an extra test. Yes, I too can't
believe we're able to measure the impact of an if..elif with simple
string compares. I reckon this code would benefit from being written
in C...
Denis Laxalde <denis.laxalde@logilab.fr> [Fri, 13 Jan 2017 10:22:25 +0100] rev 30816
hgweb: build the "entries" list directly in filelog command
There's no apparent reason to have this "entries" generator function that
builds a list and then yields its elements in reverse order and which is only
called to build the "entries" list. So just build the list directly, in
reverse order.
Adjust "parity" generator's offset to keep rendering the same.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 14 Jan 2017 10:11:19 -0800] rev 30815
convert: remove "replacecommitter" action
As pointed out by Yuya, this action doesn't add much (any?) value.
Yuya Nishihara <yuya@tcha.org> [Sat, 14 Jan 2017 20:31:35 +0900] rev 30814
ui: check EOF of getpass() response read from command-server channel
readline() returns '' only when EOF is encountered, in which case, Python's
getpass() raises EOFError. We should do the same to abort the session as
"response expected."
This bug was reported to
https://bitbucket.org/tortoisehg/thg/issues/4659/
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 13 Jan 2017 23:21:10 -0800] rev 30813
convert: config option to control Git committer actions
When converting a Git repository to Mercurial at Mozilla, I encountered
a scenario where I didn't want `hg convert` to automatically add the
"committer: <committer>" line to commit messages. While I can hack around
this by rewriting the Git commit before it is fed into `hg convert`,
I figured it would be a useful knob to control.
This patch introduces a config option that allows lots of control
over the committer value. I initially implemented this as a single
boolean flag to control whether to save the committer message. But
then there was feedback that it would be useful to save the committer
in extra data. While this patch doesn't implement support for saving
in extra data, it does add a mechanism for extending which actions
to take on the committer field. We should be able to easily add
actions to save in extra data.
Some of the implemented features weren't asked for. But I figured they
could be useful. If nothing else they demonstrate the extensibility
of this mechanism.
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 13 Jan 2017 21:21:02 -0800] rev 30812
help: make "mergetool" an alias for "merge-tools"
I've probably typed `hg help mergetool` dozens of times. I'm tired
of it not working.
Matthieu Laneuville <mlaneuville@protonmail.com> [Thu, 12 Jan 2017 21:06:55 +0900] rev 30811
templatekw: force noprefix=False to insure diffstat consistency (
issue4755)
The result of diffstatdata should not depend on having noprefix set or not, as
was reported in issue 4755. Forcing noprefix to false on call makes sure the
parser receives the diff in the correct format and returns the proper result.
Another way to fix this would have been to change the regular expressions in
path.diffstatdata(), but that would have introduced many unecessary special
cases.
Martin von Zweigbergk <martinvonz@google.com> [Fri, 13 Jan 2017 10:11:37 -0800] rev 30810
check-code: reject module-level @cachefunc
Module-level @cachefunc usage is risky because it can easily create a
memory "leak". Let's reject it completely for now. If a valid usage
comes up in the future, we can always improve the check or reconsider.
Pierre-Yves David <pierre-yves.david@ens-lyon.org> [Fri, 13 Jan 2017 11:42:36 -0800] rev 30809
similar: remove caching from the module level
To prevent Bad Thingsā¢ from happening, let's rework the logic to not use
util.cachefunc.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:01:45 -0800] rev 30808
patch: add label for coloring the similarity extended header
Just like the summary says, this will colorize the:
similarity index 88%
line in the diff output.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 11:24:18 -0800] rev 30807
patch: use opt.showsimilarity to calculate and show the similarity
Tests have been added.
Sean Farley <sean@farley.io> [Mon, 09 Jan 2017 10:51:44 -0800] rev 30806
patch: add similarity config knob in experimental section
This config knob will control whether or not to show the similarity
calculation in the diff output:
diff --git a/README.md b/foo.md
similarity index 88%
rename from README.md
rename to foo.md
--- a/README.md
+++ b/foo.md
Sean Farley <sean@farley.io> [Sat, 07 Jan 2017 20:47:57 -0800] rev 30805
similar: move score function to module level
Future patches will use this to report the similarity of a rename / copy
in the patch output.
Yuya Nishihara <yuya@tcha.org> [Mon, 09 Jan 2017 17:58:19 +0900] rev 30804
revset: abuse x:y syntax to specify line range of followlines()
This slightly complicates the parsing (see the previous patch), but the
overall result seems not bad.
I keep x:, :y and : for future extension.
Yuya Nishihara <yuya@tcha.org> [Mon, 09 Jan 2017 16:55:56 +0900] rev 30803
revset: do not transform range* operators in parsed tree
This allows us to handle x:y range as a general range object. A primary user
of it is followlines().
Yuya Nishihara <yuya@tcha.org> [Mon, 09 Jan 2017 17:45:11 +0900] rev 30802
revset: add default value to getinteger() helper
This seems handy.
Yuya Nishihara <yuya@tcha.org> [Mon, 09 Jan 2017 17:39:44 +0900] rev 30801
revset: factor out getinteger() helper
We have 4 revset functions that take integer arguments, and they handle
their arguments in slightly different ways. This patch unifies them:
- getstring() in place of getsymbol(), which is more consistent with the
handling of integer revisions (both 1 and '1' are valid)
- say "expects" instead of "requires" for type errors
We don't need to catch TypeError since getstring() must return a string.
Yuya Nishihara <yuya@tcha.org> [Mon, 09 Jan 2017 16:16:26 +0900] rev 30800
revset: rename rev argument of followlines() to startrev
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.
followlines() is new function, we can make BC now.
Yuya Nishihara <yuya@tcha.org> [Fri, 13 Jan 2017 23:48:21 +0900] rev 30799
help: use :hg: role and canonical name to point to revset string patterns
Follows up
5dd67f0993ce. Now revisions.txt and revsets.txt has been merged,
so use revisions.* as a pointer.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 13:27:20 -0800] rev 30798
util: compression APIs to support revlog decompression
Previously, compression engines had APIs for performing revlog
compression but no mechanism to perform revlog decompression. This
patch changes that.
Revlog decompression is slightly more complicated than compression
because in the compression case there is (currently) only a single
engine that can be used at a time. However for decompression, a
revlog could contain chunks from multiple compression engines. This
means decompression needs to map to multiple engines and
decompressors. This functionality is outside the scope of this patch.
But it drives the decision for engines to declare a byte header
sequence that identifies revlog data as belonging to an engine and
an API for obtaining an engine from a revlog header.
Anton Shestakov <av6@dwimlabs.net> [Sun, 08 Jan 2017 10:08:29 +0800] rev 30797
crecord: add an experimental option for space key to move cursor down
I really want to have an option of toggling a selection on a line and also
moving cursor down as a single keystroke. It also kinda makes sense for space
key to do this, because some other curses UIs in the wild do this (e.g. various
file managers, htop). So I got an idea to make a config option that defaults to
False for compatibility, but allows making crecord UI a lot more useful for
people with big hunks.
We add this an experimental option to experiment with this behavior.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 02 Jan 2017 12:02:08 -0800] rev 30796
perf: support multiple compression engines in perfrevlogchunks
Now that the revlog has a reference to a compressor, it is
possible to swap in other compression engines. So, teach
`hg perfrevlogchunks` to do that.
The default behavior of `hg perfrevlogchunks` is now to measure the
compression performance of all compression engines implementing the
revlog compressor API. This effectively adds the no-op "none"
compressor and zstd (when available) into the default set.
While we can't yet plug alternate compressors into revlogs, this
command gives us a preview of the performance. On the mozilla-unified
repository:
$ hg perfrevlogchunks -c
! compress w/ none
! wall 0.115159 comb 0.110000 user 0.110000 sys 0.000000 (best of 86)
! compress w/ zlib
! wall 5.681406 comb 5.680000 user 5.680000 sys 0.000000 (best of 3)
! compress w/ zstd
! wall 2.624781 comb 2.620000 user 2.620000 sys 0.000000 (best of 4)
$ hg perfrevlogchunks -m
! compress w/ none
! wall 0.124486 comb 0.120000 user 0.120000 sys 0.000000 (best of 79)
! compress w/ zlib
! wall 10.144701 comb 10.150000 user 10.150000 sys 0.000000 (best of 3)
! compress w/ zstd
! wall 4.383118 comb 4.390000 user 4.390000 sys 0.000000 (best of 3)
Those numbers for zstd look promising. But they aren't the full story.
For that, we'll need to look at decompression times and storage sizes.
Stay tuned...