timeless <timeless@mozdev.org> [Fri, 29 Jan 2016 14:37:16 +0000] rev 28498
ui: log devel warnings
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 17:22:04 +0000] rev 28497
util: refactor getstackframes
timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 16:50:14 +0000] rev 28496
util: reword debugstacktrace comment
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 15:40:20 -0800] rev 28495
changelog: avoid slicing raw data until needed
Before, we were slicing the original raw text and storing individual
variables with values corresponding to each field. This is avoidable
overhead.
With this patch, we store the offsets of the fields at construction
time and perform the slice when a property is accessed.
This appears to show a very marginal performance win on its own and
the gains are so small as to not be worth reporting. However, this
patch marks the end of our parsing refactor, so it is worth reporting
the gains from the entire series:
author(mpm)
0.896565
0.795987 89%
desc(bug)
0.887169
0.803438 90%
date(2015)
0.878797
0.773961 88%
extra(rebase_source)
0.865446
0.761603 88%
author(mpm) or author(greg)
1.801832
1.576025 87%
author(mpm) or desc(bug)
1.812438
1.593335 88%
date(2015) or branch(default)
0.968276
0.875270 90%
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.183104 87%
Pretty consistent speed-up across the board for any revset accessing
changelog revision data. Not bad!
It's also worth noting that PyPy appears to experience a similar to
marginally greater speed-up as well!
According to statprof, revsets accessing changelog revision data
are now clearly dominated by zlib decompression (16-17% of execution
time). Surprisingly, it appears the most expensive part of revision
parsing are the various text.index() calls to search for newlines!
These appear to cumulatively add up to 5+% of execution time. I
reckon implementing the parsing in C would make things marginally
faster.
If accessing larger strings (such as the commit message),
encoding.tolocal() is the most expensive procedure outside of
decompression.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:13:54 -0800] rev 28494
changelog: parse description last
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.
This patch updates the parsing logic to parse the text linearly and
deal with the description field last.
Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:31:06 -0800] rev 28493
changelog: lazily parse files
More of the same.
Again, modest revset performance wins:
author(mpm)
0.896565
0.822961
0.805156
desc(bug)
0.887169
0.847054
0.798101
date(2015)
0.878797
0.811613
0.786689
extra(rebase_source)
0.865446
0.797756
0.777408
author(mpm) or author(greg)
1.801832
1.668172
1.626547
author(mpm) or desc(bug)
1.812438
1.677608
1.613941
date(2015) or branch(default)
0.968276
0.896032
0.869017
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:30:25 -0800] rev 28492
changelog: lazily parse date/extra field
This is probably the most complicated patch in the parsing
refactor.
Because the date and extras are encoded in the same field, we
stuff the entire field into a dedicated variable and add a
property for accessing the sub-components of each. There is
some duplicated code here. But the code is relatively simple,
so it shouldn't be a big deal.
We see revset performance wins across the board:
author(mpm)
0.896565
0.876713
0.822961
desc(bug)
0.887169
0.895514
0.847054
date(2015)
0.878797
0.820987
0.811613
extra(rebase_source)
0.865446
0.823811
0.797756
author(mpm) or author(greg)
1.801832
1.784160
1.668172
author(mpm) or desc(bug)
1.812438
1.822756
1.677608
date(2015) or branch(default)
0.968276
0.910981
0.896032
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.516788
3.265024
We see a speed-up on revsets accessing date and extras because the new
parsing code only parses what you access. Even though they are stored
the same text field, we avoid parsing dates when accessing extras and
vice-versa.
But strangely revsets accessing both date and extras appeared to speed
up as well! I'm not sure if this is due to refactoring the parsing
code or due to an optimization in revsets. You can't argue with the
results!
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:46 -0800] rev 28491
changelog: lazily parse user
Same strategy as before.
Revsets not accessing the user demonstrate a slight performance win:
desc(bug)
0.887169
0.910400
0.895514
date(2015)
0.878797
0.870697
0.820987
extra(rebase_source)
0.865446
0.841644
0.823811
date(2015) or branch(default)
0.968276
0.945792
0.910981
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:29:13 -0800] rev 28490
changelog: lazily parse manifest node
Like the description, we store the raw bytes and convert from
hex on access.
This patch also marks the beginning of our new parsing method,
which is based on newline offsets and doesn't rely on
str.split().
Many revsets showed a performance improvement:
author(mpm)
0.896565
0.869085
0.868598
desc(bug)
0.887169
0.928164
0.910400
extra(rebase_source)
0.865446
0.871500
0.841644
author(mpm) or author(greg)
1.801832
1.791589
1.731503
author(mpm) or desc(bug)
1.812438
1.851003
1.798764
date(2015) or branch(default)
0.968276
0.974027
0.945792
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:28:46 -0800] rev 28489
changelog: lazily parse description
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.
We see a revset speedup for revsets that don't access the description:
author(mpm)
0.896565
0.914234
0.869085
date(2015)
0.878797
0.891980
0.862525
extra(rebase_source)
0.865446
0.912514
0.871500
author(mpm) or author(greg)
1.801832
1.860402
1.791589
date(2015) or branch(default)
0.968276
0.994673
0.974027
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593
As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!