Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:13:54 -0800] rev 28494
changelog: parse description last
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.
This patch updates the parsing logic to parse the text linearly and
deal with the description field last.
Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:31:06 -0800] rev 28493
changelog: lazily parse files
More of the same.
Again, modest revset performance wins:
author(mpm)
0.896565
0.822961
0.805156
desc(bug)
0.887169
0.847054
0.798101
date(2015)
0.878797
0.811613
0.786689
extra(rebase_source)
0.865446
0.797756
0.777408
author(mpm) or author(greg)
1.801832
1.668172
1.626547
author(mpm) or desc(bug)
1.812438
1.677608
1.613941
date(2015) or branch(default)
0.968276
0.896032
0.869017
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:30:25 -0800] rev 28492
changelog: lazily parse date/extra field
This is probably the most complicated patch in the parsing
refactor.
Because the date and extras are encoded in the same field, we
stuff the entire field into a dedicated variable and add a
property for accessing the sub-components of each. There is
some duplicated code here. But the code is relatively simple,
so it shouldn't be a big deal.
We see revset performance wins across the board:
author(mpm)
0.896565
0.876713
0.822961
desc(bug)
0.887169
0.895514
0.847054
date(2015)
0.878797
0.820987
0.811613
extra(rebase_source)
0.865446
0.823811
0.797756
author(mpm) or author(greg)
1.801832
1.784160
1.668172
author(mpm) or desc(bug)
1.812438
1.822756
1.677608
date(2015) or branch(default)
0.968276
0.910981
0.896032
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.516788
3.265024
We see a speed-up on revsets accessing date and extras because the new
parsing code only parses what you access. Even though they are stored
the same text field, we avoid parsing dates when accessing extras and
vice-versa.
But strangely revsets accessing both date and extras appeared to speed
up as well! I'm not sure if this is due to refactoring the parsing
code or due to an optimization in revsets. You can't argue with the
results!
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:46 -0800] rev 28491
changelog: lazily parse user
Same strategy as before.
Revsets not accessing the user demonstrate a slight performance win:
desc(bug)
0.887169
0.910400
0.895514
date(2015)
0.878797
0.870697
0.820987
extra(rebase_source)
0.865446
0.841644
0.823811
date(2015) or branch(default)
0.968276
0.945792
0.910981
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:13 -0800] rev 28490
changelog: lazily parse manifest node
Like the description, we store the raw bytes and convert from
hex on access.
This patch also marks the beginning of our new parsing method,
which is based on newline offsets and doesn't rely on
str.split().
Many revsets showed a performance improvement:
author(mpm)
0.896565
0.869085
0.868598
desc(bug)
0.887169
0.928164
0.910400
extra(rebase_source)
0.865446
0.871500
0.841644
author(mpm) or author(greg)
1.801832
1.791589
1.731503
author(mpm) or desc(bug)
1.812438
1.851003
1.798764
date(2015) or branch(default)
0.968276
0.974027
0.945792
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:28:46 -0800] rev 28489
changelog: lazily parse description
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.
We see a revset speedup for revsets that don't access the description:
author(mpm)
0.896565
0.914234
0.869085
date(2015)
0.878797
0.891980
0.862525
extra(rebase_source)
0.865446
0.912514
0.871500
author(mpm) or author(greg)
1.801832
1.860402
1.791589
date(2015) or branch(default)
0.968276
0.994673
0.974027
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593
As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:26:37 -0800] rev 28488
context: use changelogrevision
Upcoming patches will make the changelogrevision object perform
lazy parsing. Let's switch to it.
Because we're switching from a tuple to an object, everthing that
accesses the internal cached attribute needs to be updated to access
via attributes. A nice side-effect is this makes the code easier to
read!
Surprisingly, this appears to make revsets accessing this data
slightly faster (values are before series, p1, this patch):
author(mpm)
0.896565
0.929984
0.914234
desc(bug)
0.887169
0.935642
0.921073
date(2015)
0.878797
0.908094
0.891980
extra(rebase_source)
0.865446
0.922624
0.912514
author(mpm) or author(greg)
1.801832
1.902112
1.860402
author(mpm) or desc(bug)
1.812438
1.860977
1.844850
date(2015) or branch(default)
0.968276
1.005824
0.994673
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
3.721032
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:28:02 -0800] rev 28487
changelog: add class to represent parsed changelog revisions
Currently, changelog entries are parsed into their respective
components at read time. Many operations are only interested
in a subset of fields of a changelog entry. The parsing and
storing of all the fields adds avoidable overhead.
This patch introduces the "changelogrevision" class. It takes
changelog raw text and exposes the parsed results as attributes.
The code for parsing changelog entries has been moved into its
construction function. changelog.read() has been modified to use
the new class internally while maintaining its existing API.
Future patches will make revision parsing lazy.
We implement the construction function of the new class with
__new__ instead of __init__ so we can use a named tuple to
represent the empty revision. This saves overhead and complexity
of coercing later versions of this class to represent an empty
instance.
While we are here, we add a method on changelog to obtain an
instance of the new type.
The overhead of constructing the new class regresses performance
of revsets accessing this data:
author(mpm)
0.896565
0.929984
desc(bug)
0.887169
0.935642 105%
date(2015)
0.878797
0.908094
extra(rebase_source)
0.865446
0.922624 106%
author(mpm) or author(greg)
1.801832
1.902112 105%
author(mpm) or desc(bug)
1.812438
1.860977
date(2015) or branch(default)
0.968276
1.005824
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
Once lazy parsing is implemented, these revsets will all be faster
than before. There is no performance change on revsets that do not
access this data. There /could/ be a performance regression on
operations that perform several changelog reads. However, I can't
think of anything outside of revsets and `hg log` (basically the
same as a revset) that would be impacted.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:51:22 -0500] rev 28486
httppeer: compute header names only once
This also helps make the code a little more readable.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:33:43 -0500] rev 28485
httppeer: indent existing argument handling with if True
I'm about to add another case, so it makes sense to split this out to make the
semantic changes in the next change more obvious.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:26:12 -0500] rev 28484
httppeer: move size computation later in _callstream
A future change will alter some of the arg-sending logic in a way that matters
for request body size. Centralizing the logic now will make later patches
easier to review.
Augie Fackler <augie@google.com> [Fri, 11 Mar 2016 11:24:50 -0500] rev 28483
httppeer: do less splitting on httpheader value
We only care about the first value split off, so only split off the first
value.
Yuya Nishihara <yuya@tcha.org> [Sat, 27 Feb 2016 21:17:37 +0900] rev 28482
chgserver: remove outdated comment about setvbuf()
I've replaced setvbuf() by fdopen() before merging chg into the core, so
this comment is wrong.
https://bitbucket.org/yuja/chg/commits/5c24e72bc447
timeless <timeless@mozdev.org> [Tue, 01 Mar 2016 04:53:43 +0000] rev 28481
transplant: use absolute_import
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 16:07:22 +0000] rev 28480
transplant: switch to using nodemod for hex+short
timeless <timeless@mozdev.org> [Wed, 02 Mar 2016 16:32:52 +0000] rev 28479
convert: bzr use absolute_import
Jun Wu <quark@fb.com> [Fri, 11 Mar 2016 13:00:20 +0000] rev 28478
chgserver: include [extdiff] in confighash
extdiff's uisetup will register new commands. If we do not include it in
confighash, changes to [extdiff] will not get new commands registered.
This patch adds extdiff to confighash and makes it possible for chg to pass
test-extdiff.t.
Jun Wu <quark@fb.com> [Fri, 11 Mar 2016 02:52:06 +0000] rev 28477
chg: silently inherit server exit code
If chgserver aborts during startup, for example, error.ParseError when parsing
a config file, chg client probably just wants to exit with a same exit code
without printing other unrelated text. This patch changes the text "cmdserver
exited with status" from abortmsg to debugmsg and exits with a same exit code.
Pulkit Goyal <7895pulkit@gmail.com> [Sun, 06 Mar 2016 03:19:08 +0530] rev 28476
debugshell: use absolute_import
Yuya Nishihara <yuya@tcha.org> [Fri, 11 Mar 2016 10:26:58 +0900] rev 28475
test: make check-py3-compat.py ignore empty code more reliably
It couldn't exclude an empty file containing comments. That's why
hgext/__init__.py had been listed in test-check-py3-compat.t before
"hgext: officially turn 'hgext' into a namespace package".
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:43 +0800] rev 28474
patchbomb: specify unit for ui.progress when sending emails
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:28:27 +0800] rev 28473
streamclone: specify unit for ui.progress when handling data
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:29 +0800] rev 28472
churn: specify unit for ui.progress when analyzing revisions
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:30:04 +0800] rev 28471
convert: specify unit for ui.progress when scanning paths
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:29:51 +0800] rev 28470
convert: specify unit for ui.progress when operating on files
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 05:47:39 +0000] rev 28469
tests: stabilize svn output
With 1.9.3 extra bits were appearing...
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 22:29:20 +0800] rev 28468
similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:18:41 +0800] rev 28467
verify: specify unit for ui.progress when checking files
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:44:40 +0800] rev 28466
repair: specify unit for ui.progress in rebuildfncache()
Anton Shestakov <av6@dwimlabs.net> [Fri, 11 Mar 2016 20:39:29 +0800] rev 28465
repair: use 'rebuilding' progress topic in rebuildfncache()