timeless <timeless@mozdev.org> [Fri, 11 Mar 2016 16:50:14 +0000] rev 28496
util: reword debugstacktrace comment
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 15:40:20 -0800] rev 28495
changelog: avoid slicing raw data until needed
Before, we were slicing the original raw text and storing individual
variables with values corresponding to each field. This is avoidable
overhead.
With this patch, we store the offsets of the fields at construction
time and perform the slice when a property is accessed.
This appears to show a very marginal performance win on its own and
the gains are so small as to not be worth reporting. However, this
patch marks the end of our parsing refactor, so it is worth reporting
the gains from the entire series:
author(mpm)
0.896565
0.795987 89%
desc(bug)
0.887169
0.803438 90%
date(2015)
0.878797
0.773961 88%
extra(rebase_source)
0.865446
0.761603 88%
author(mpm) or author(greg)
1.801832
1.576025 87%
author(mpm) or desc(bug)
1.812438
1.593335 88%
date(2015) or branch(default)
0.968276
0.875270 90%
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.183104 87%
Pretty consistent speed-up across the board for any revset accessing
changelog revision data. Not bad!
It's also worth noting that PyPy appears to experience a similar to
marginally greater speed-up as well!
According to statprof, revsets accessing changelog revision data
are now clearly dominated by zlib decompression (16-17% of execution
time). Surprisingly, it appears the most expensive part of revision
parsing are the various text.index() calls to search for newlines!
These appear to cumulatively add up to 5+% of execution time. I
reckon implementing the parsing in C would make things marginally
faster.
If accessing larger strings (such as the commit message),
encoding.tolocal() is the most expensive procedure outside of
decompression.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 13:13:54 -0800] rev 28494
changelog: parse description last
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.
This patch updates the parsing logic to parse the text linearly and
deal with the description field last.
Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 06 Mar 2016 14:31:06 -0800] rev 28493
changelog: lazily parse files
More of the same.
Again, modest revset performance wins:
author(mpm)
0.896565
0.822961
0.805156
desc(bug)
0.887169
0.847054
0.798101
date(2015)
0.878797
0.811613
0.786689
extra(rebase_source)
0.865446
0.797756
0.777408
author(mpm) or author(greg)
1.801832
1.668172
1.626547
author(mpm) or desc(bug)
1.812438
1.677608
1.613941
date(2015) or branch(default)
0.968276
0.896032
0.869017