Pulkit Goyal <7895pulkit@gmail.com> [Sun, 13 Mar 2016 01:08:39 +0530] rev 28509
check-code: use absolute_import and print_function
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 11 Mar 2016 21:27:26 -0800] rev 28508
encoding: use range() instead of xrange()
Python 3 doesn't have xrange(). Instead, range() on Python 3
is a generator, like xrange() is on Python 2.
The benefits of xrange() over range() are when there are very
large ranges that are too expensive to pre-allocate. The code
here is only creating <128 values, so the benefits of xrange()
should be negligible.
With this patch, encoding.py imports safely on Python 3.
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 11 Mar 2016 21:23:34 -0800] rev 28507
encoding: make HFS+ ignore code Python 3 compatible
unichr() doesn't exist in Python 3. chr() is the equivalent there.
Unfortunately, we can't use chr() outright because Python 2 only
accepts values smaller than 256.
Also, Python 3 returns an int when accessing a character of a
bytes type (s[x]). So, we have to ord() the values in the assert
statement.
Pierre-Yves David <pierre-yves.david@fb.com> [Fri, 11 Mar 2016 10:28:58 +0000] rev 28506
extensions: factor import error reporting out
To clarify third party extensions lookup, we are about to add a third place
where extensions are searched for. So we factor the error reporting logic out to
be able to easily reuse it in the next patch.
Pierre-Yves David <pierre-yves.david@fb.com> [Fri, 11 Mar 2016 10:24:54 +0000] rev 28505
extensions: extract the 'importh' closure as normal function
There is no reason for this to be a closure so we extract it for clarity.
Martin von Zweigbergk <martinvonz@google.com> [Fri, 11 Mar 2016 15:40:58 -0800] rev 28504
zeroconf: remove leftover camelcase identifier
eb9d0e828c30 (zeroconf: remove camelcase in identifiers, 2016-03-01)
forgot one occurrence of "numAuthorities", which makes test-paths.t
fail for me. I don't even know what zeroconf is, but this patch seems
obviously correct and it fixes the failing test case.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Sat, 12 Mar 2016 04:35:42 +0900] rev 28503
hg: acquire wlock while updating the working directory via updatetotally
updatetotally() might be invoked outside wlock scope (e.g. invocation
via postincoming() at "hg unbundle" or "hg pull").
In such case, acquisition of wlock is needed for consistent view,
because parallel "hg update" and/or "hg bookmarks" might change
working directory status while executing updatetotally().
Strictly speaking, truly consistent updating should acquire also store
lock, because active bookmark might be moved to another one outside
wlock scope (e.g. pulling from other repository causes updating
current active one).
Acquisition of wlock in this patch ensures consistency in as same
level as past "hg update".
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Sat, 12 Mar 2016 04:35:42 +0900] rev 28502
commands: add postincoming docstring for explanation of arguments
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Sat, 12 Mar 2016 04:35:42 +0900] rev 28501
commands: centralize code to update with extra care for non-file components
This patch centralizes similar code paths to update the working
directory with extra care for non-file components (e.g. bookmark) into
newly added function updatetotally().
'if True' at the beginning of updatetotally() is redundant at this
patch, but useful to reduce amount of changes in subsequent patch.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Sat, 12 Mar 2016 04:35:42 +0900] rev 28500
update: omit redundant activating message for already active bookmark
This patch also adds "hg bookmarks" invocation into tests, where
redundant message is omitted but bookmark activity isn't clear from
context.
Martin von Zweigbergk <martinvonz@google.com> [Fri, 11 Mar 2016 11:44:03 -0800] rev 28499
tests: make test-verify-repo-operations.py not run by default
test-verify-repo-operations.py currently starts way too late and
extends the running time with -j50 on my machine from around 3:48 min
to 6:30 min. We could of course make it run earlier, but the test case
seems unlikely to find bugs not covered by other tests, so let's mark
it "slow" instead. I think this type of test is better suited to
running separately in a long-running job.
timeless <timeless@mozdev.org> [Fri, 29 Jan 2016 14:37:16 +0000] rev 28498
ui: log devel warnings
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 17:22:04 +0000] rev 28497
util: refactor getstackframes
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 16:50:14 +0000] rev 28496
util: reword debugstacktrace comment
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 15:40:20 -0800] rev 28495
changelog: avoid slicing raw data until needed
Before, we were slicing the original raw text and storing individual
variables with values corresponding to each field. This is avoidable
overhead.
With this patch, we store the offsets of the fields at construction
time and perform the slice when a property is accessed.
This appears to show a very marginal performance win on its own and
the gains are so small as to not be worth reporting. However, this
patch marks the end of our parsing refactor, so it is worth reporting
the gains from the entire series:
author(mpm)
0.896565
0.795987 89%
desc(bug)
0.887169
0.803438 90%
date(2015)
0.878797
0.773961 88%
extra(rebase_source)
0.865446
0.761603 88%
author(mpm) or author(greg)
1.801832
1.576025 87%
author(mpm) or desc(bug)
1.812438
1.593335 88%
date(2015) or branch(default)
0.968276
0.875270 90%
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.183104 87%
Pretty consistent speed-up across the board for any revset accessing
changelog revision data. Not bad!
It's also worth noting that PyPy appears to experience a similar to
marginally greater speed-up as well!
According to statprof, revsets accessing changelog revision data
are now clearly dominated by zlib decompression (16-17% of execution
time). Surprisingly, it appears the most expensive part of revision
parsing are the various text.index() calls to search for newlines!
These appear to cumulatively add up to 5+% of execution time. I
reckon implementing the parsing in C would make things marginally
faster.
If accessing larger strings (such as the commit message),
encoding.tolocal() is the most expensive procedure outside of
decompression.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:13:54 -0800] rev 28494
changelog: parse description last
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.
This patch updates the parsing logic to parse the text linearly and
deal with the description field last.
Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:31:06 -0800] rev 28493
changelog: lazily parse files
More of the same.
Again, modest revset performance wins:
author(mpm)
0.896565
0.822961
0.805156
desc(bug)
0.887169
0.847054
0.798101
date(2015)
0.878797
0.811613
0.786689
extra(rebase_source)
0.865446
0.797756
0.777408
author(mpm) or author(greg)
1.801832
1.668172
1.626547
author(mpm) or desc(bug)
1.812438
1.677608
1.613941
date(2015) or branch(default)
0.968276
0.896032
0.869017
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:30:25 -0800] rev 28492
changelog: lazily parse date/extra field
This is probably the most complicated patch in the parsing
refactor.
Because the date and extras are encoded in the same field, we
stuff the entire field into a dedicated variable and add a
property for accessing the sub-components of each. There is
some duplicated code here. But the code is relatively simple,
so it shouldn't be a big deal.
We see revset performance wins across the board:
author(mpm)
0.896565
0.876713
0.822961
desc(bug)
0.887169
0.895514
0.847054
date(2015)
0.878797
0.820987
0.811613
extra(rebase_source)
0.865446
0.823811
0.797756
author(mpm) or author(greg)
1.801832
1.784160
1.668172
author(mpm) or desc(bug)
1.812438
1.822756
1.677608
date(2015) or branch(default)
0.968276
0.910981
0.896032
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.516788
3.265024
We see a speed-up on revsets accessing date and extras because the new
parsing code only parses what you access. Even though they are stored
the same text field, we avoid parsing dates when accessing extras and
vice-versa.
But strangely revsets accessing both date and extras appeared to speed
up as well! I'm not sure if this is due to refactoring the parsing
code or due to an optimization in revsets. You can't argue with the
results!
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:46 -0800] rev 28491
changelog: lazily parse user
Same strategy as before.
Revsets not accessing the user demonstrate a slight performance win:
desc(bug)
0.887169
0.910400
0.895514
date(2015)
0.878797
0.870697
0.820987
extra(rebase_source)
0.865446
0.841644
0.823811
date(2015) or branch(default)
0.968276
0.945792
0.910981
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:13 -0800] rev 28490
changelog: lazily parse manifest node
Like the description, we store the raw bytes and convert from
hex on access.
This patch also marks the beginning of our new parsing method,
which is based on newline offsets and doesn't rely on
str.split().
Many revsets showed a performance improvement:
author(mpm)
0.896565
0.869085
0.868598
desc(bug)
0.887169
0.928164
0.910400
extra(rebase_source)
0.865446
0.871500
0.841644
author(mpm) or author(greg)
1.801832
1.791589
1.731503
author(mpm) or desc(bug)
1.812438
1.851003
1.798764
date(2015) or branch(default)
0.968276
0.974027
0.945792
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:28:46 -0800] rev 28489
changelog: lazily parse description
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.
We see a revset speedup for revsets that don't access the description:
author(mpm)
0.896565
0.914234
0.869085
date(2015)
0.878797
0.891980
0.862525
extra(rebase_source)
0.865446
0.912514
0.871500
author(mpm) or author(greg)
1.801832
1.860402
1.791589
date(2015) or branch(default)
0.968276
0.994673
0.974027
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593
As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:26:37 -0800] rev 28488
context: use changelogrevision
Upcoming patches will make the changelogrevision object perform
lazy parsing. Let's switch to it.
Because we're switching from a tuple to an object, everthing that
accesses the internal cached attribute needs to be updated to access
via attributes. A nice side-effect is this makes the code easier to
read!
Surprisingly, this appears to make revsets accessing this data
slightly faster (values are before series, p1, this patch):
author(mpm)
0.896565
0.929984
0.914234
desc(bug)
0.887169
0.935642
0.921073
date(2015)
0.878797
0.908094
0.891980
extra(rebase_source)
0.865446
0.922624
0.912514
author(mpm) or author(greg)
1.801832
1.902112
1.860402
author(mpm) or desc(bug)
1.812438
1.860977
1.844850
date(2015) or branch(default)
0.968276
1.005824
0.994673
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
3.721032
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:28:02 -0800] rev 28487
changelog: add class to represent parsed changelog revisions
Currently, changelog entries are parsed into their respective
components at read time. Many operations are only interested
in a subset of fields of a changelog entry. The parsing and
storing of all the fields adds avoidable overhead.
This patch introduces the "changelogrevision" class. It takes
changelog raw text and exposes the parsed results as attributes.
The code for parsing changelog entries has been moved into its
construction function. changelog.read() has been modified to use
the new class internally while maintaining its existing API.
Future patches will make revision parsing lazy.
We implement the construction function of the new class with
__new__ instead of __init__ so we can use a named tuple to
represent the empty revision. This saves overhead and complexity
of coercing later versions of this class to represent an empty
instance.
While we are here, we add a method on changelog to obtain an
instance of the new type.
The overhead of constructing the new class regresses performance
of revsets accessing this data:
author(mpm)
0.896565
0.929984
desc(bug)
0.887169
0.935642 105%
date(2015)
0.878797
0.908094
extra(rebase_source)
0.865446
0.922624 106%
author(mpm) or author(greg)
1.801832
1.902112 105%
author(mpm) or desc(bug)
1.812438
1.860977
date(2015) or branch(default)
0.968276
1.005824
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
Once lazy parsing is implemented, these revsets will all be faster
than before. There is no performance change on revsets that do not
access this data. There /could/ be a performance regression on
operations that perform several changelog reads. However, I can't
think of anything outside of revsets and `hg log` (basically the
same as a revset) that would be impacted.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:51:22 -0500] rev 28486
httppeer: compute header names only once
This also helps make the code a little more readable.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:33:43 -0500] rev 28485
httppeer: indent existing argument handling with if True
I'm about to add another case, so it makes sense to split this out to make the
semantic changes in the next change more obvious.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:26:12 -0500] rev 28484
httppeer: move size computation later in _callstream
A future change will alter some of the arg-sending logic in a way that matters
for request body size. Centralizing the logic now will make later patches
easier to review.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:24:50 -0500] rev 28483
httppeer: do less splitting on httpheader value
We only care about the first value split off, so only split off the first
value.
Yuya Nishihara <yuya@tcha.org> [Sat, 27 Feb 2016 21:17:37 +0900] rev 28482
chgserver: remove outdated comment about setvbuf()
I've replaced setvbuf() by fdopen() before merging chg into the core, so
this comment is wrong.
https://bitbucket.org/yuja/chg/commits/5c24e72bc447
timeless <timeless@mozdev.org> [Tue, 01 Mar 2016 04:53:43 +0000] rev 28481
transplant: use absolute_import
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 16:07:22 +0000] rev 28480
transplant: switch to using nodemod for hex+short
timeless <timeless@mozdev.org> [Wed, 02 Mar 2016 16:32:52 +0000] rev 28479
convert: bzr use absolute_import
Jun Wu <quark@fb.com> [Fri, 11 Mar 2016 13:00:20 +0000] rev 28478
chgserver: include [extdiff] in confighash
extdiff's uisetup will register new commands. If we do not include it in
confighash, changes to [extdiff] will not get new commands registered.
This patch adds extdiff to confighash and makes it possible for chg to pass
test-extdiff.t.
Jun Wu <quark@fb.com> [Fri, 11 Mar 2016 02:52:06 +0000] rev 28477
chg: silently inherit server exit code
If chgserver aborts during startup, for example, error.ParseError when parsing
a config file, chg client probably just wants to exit with a same exit code
without printing other unrelated text. This patch changes the text "cmdserver
exited with status" from abortmsg to debugmsg and exits with a same exit code.
Pulkit Goyal <7895pulkit@gmail.com> [Sun, 06 Mar 2016 03:19:08 +0530] rev 28476
debugshell: use absolute_import
Yuya Nishihara <yuya@tcha.org> [Fri, 11 Mar 2016 10:26:58 +0900] rev 28475
test: make check-py3-compat.py ignore empty code more reliably
It couldn't exclude an empty file containing comments. That's why
hgext/__init__.py had been listed in test-check-py3-compat.t before
"hgext: officially turn 'hgext' into a namespace package".
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:43 +0800] rev 28474
patchbomb: specify unit for ui.progress when sending emails
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:28:27 +0800] rev 28473
streamclone: specify unit for ui.progress when handling data
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:29 +0800] rev 28472
churn: specify unit for ui.progress when analyzing revisions
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:04 +0800] rev 28471
convert: specify unit for ui.progress when scanning paths
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:29:51 +0800] rev 28470
convert: specify unit for ui.progress when operating on files
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 05:47:39 +0000] rev 28469
tests: stabilize svn output
With 1.9.3 extra bits were appearing...
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:29:20 +0800] rev 28468
similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:18:41 +0800] rev 28467
verify: specify unit for ui.progress when checking files
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:44:40 +0800] rev 28466
repair: specify unit for ui.progress in rebuildfncache()
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:39:29 +0800] rev 28465
repair: use 'rebuilding' progress topic in rebuildfncache()
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:26:06 +0800] rev 28464
largefiles: use revisions as a ui.progress unit
Using plural form is consistent with other progress units, and "1 out of 5
revisions" sounds more correct. Also, tests don't show this, but if you have
'speed' item in progress.format config, it shows e.g. '100 revisions/sec',
which also seems better.
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:21:43 +0800] rev 28463
largefiles: specify unit for ui.progress when operating on files
Also make it available for translation. It could already be translated, because
it's used as a unit in archival.py and subrepo.py, for example.
Yuya Nishihara <yuya@tcha.org> [Wed, 09 Mar 2016 23:59:26 +0900] rev 28462
templater: make label() just fail if ui object isn't available
Silent failure hides bugs and makes it harder to track down the issue. It's
worse than raising exception.
In future patches, I plan to sort out template functions that require 'ui',
'ctx', 'fctx', etc. so that incompatible functions are excluded and the doc can
say in which context these functions are usable.
@templatefunc('label', requires=('ui',))
def label(context, mapping, args):
...
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Fri, 11 Mar 2016 21:55:44 +0900] rev 28461
convert: fix "stdlib import follows local import" problem in transport
Before this patch, import-checker reports error below for importing
subversion python binding libraries.
stdlib import "svn.*" follows local import: mercurial
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Fri, 11 Mar 2016 21:55:44 +0900] rev 28460
convert: fix relative import of stdlib module in subversion
Before this patch, import-checker reports "relative import of stdlib
module" error for importing Pool and SubversionException from svn.core
in subversion.py.
To fix this relative import of stdlib module, this patch adds prefix
'svn.core.' to Pool and SubversionException in source.
These 'svn.core.' relative accessing shouldn't cause performance
impact, because there are much more code paths accessing to
'svn.core.' relative properties.
BTW, in transport.py, this error is avoided by assignment below.
SubversionException = svn.core.SubversionException
But this can't be used in subversion.py case, because:
- such assignment in indented code block causes "don't use camelcase
in identifiers" error of check-code.py
- but it should be placed in indented block, because svn is None at
failure of importing subversion python binding libraries (=
examination of 'svn' is needed)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Fri, 11 Mar 2016 21:55:44 +0900] rev 28459
convert: make subversion import transport locally
Introducing "absolute_import" feature in previous patch caused failure
of importing local module "transport".
Christian Ebert <blacktrash@gmx.net> [Fri, 11 Mar 2016 08:27:11 +0000] rev 28458
keyword: compact writing of temporary kwdemo hgrc
Anton Shestakov <av6@dwimlabs.net> [Thu, 10 Mar 2016 23:46:19 +0800] rev 28457
dockerdeb: add Ubuntu Trusty
One problem reported by lintian is "bad-distribution-in-changes-file unstable"
in changelog, but the current changelog for the official package in Ubuntu also
uses that distribution name (unstable), because they import from Debian. This
certainly doesn't stop the build process.
Nathan Goldbaum <ngoldbau@illinois.edu> [Thu, 10 Mar 2016 17:31:38 -0600] rev 28456
pushoperation: fix language issues in docstring
Jun Wu <quark@fb.com> [Thu, 10 Mar 2016 00:19:55 +0000] rev 28455
chg: do not write pidfile
Current pidfile logic will only keep the pid of the newest server, which is
not very useful if we want to kill all servers, and will become outdated if
the server auto exits after being idle for too long.
Besides, the server-side pidfile writing logic runs before chgserver gets
confighash so it's not trivial to append confighash to pidfile basename like
we did for socket file.
This patch removes --pidfile from the command starting chgserver and switches
to an alternative way (unlink socket file) to stop the server.
Jun Wu <quark@fb.com> [Thu, 10 Mar 2016 00:12:33 +0000] rev 28454
chg: remove manual reload logic
chgserver now validates and reloads configs automatically. Manually reloading
is no longer necessary. Besides, we are deprecating pid files since the server
will periodically check its ownership of the socket file and exit if it does
not own the socket file any longer, which works more reliable than a pid file.
This patch removes the SIGHUP reload logic from both chg server and client.
Jun Wu <quark@fb.com> [Wed, 09 Mar 2016 01:20:57 +0000] rev 28453
chg: use --daemon-postexec chdir:/ instead of --cwd /
The chgserver is designed to load repo config from current directory. "--cwd /"
will prevent chgserver from loading repo config and generate a wrong
confighash, which will result in a redirect loop. This patch removes "--cwd /"
and uses "--daemon-postexec chdir:/" instead.
Jun Wu <quark@fb.com> [Wed, 09 Mar 2016 01:17:02 +0000] rev 28452
serve: add chdir command for --daemon-postexec
For chgserver, it probably needs a chdir to /. This patch adds chdir command
support for --daemon-postexec so chg client can make use of it.
Jun Wu <quark@fb.com> [Wed, 09 Mar 2016 02:07:40 +0000] rev 28451
serve: accept multiple values for --daemon-postexec
The next patch will add another postexec command: chdir, which can be used
together with unlink. This patch changes the option type of --daemon-postexec
from string to list to accept multiple commands. The error message of invalid
--daemon-postexec value is also changed to include the actual invalid value.
Pierre-Yves David <pierre-yves.david@fb.com> [Sat, 27 Feb 2016 12:56:26 +0100] rev 28450
hgext: officially turn 'hgext' into a namespace package
Actually since Python 2.3, there is some way to turn top level package into
"namespace package" so that multiple subpackage installed in different part of
the path can still be imported transparently. This feature was previously
thought (at least by myself) to be only provided by some setuptool black magic.
Turning hgext into such namespace package allows third extensions to install
themselves inside the "hgext" namespace package to avoid polluting the global
python module namespace. They will now be able to do so without making it a pain
to use a Mercurial "installed" in a different way/location than these
extensions.
The only constrains is that the extension ship a 'hgext/__init__.py' containing
the same call to 'pkgutil.extend_path' and nothing else. This seems realistic.
The main question that remains is: should we introduce a dedicated namespace for
third party extension (hgext3rd?) to make a clearer distinction between what is
officially supported and what is not? If so, this will be introduced in a follow
up patch.