Mercurial > hg-stable
view tests/test-status.t @ 15141:16dc9a32ca04
mdiff: speed up showfunc for large diffs
This addresses the following issues with showfunc:
- Silly usage of regular expressions.
- Doing str.rstrip() needlessly in an inner loop.
- Doing catastrophic backtracking when trying to find a function line.
Finding function text is now at worst O(n lines in the old file), and
at best close to O(n hunks).
Given a diff like this[1]:
src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g | 4 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java | 2 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java | 29189 +++++----
3 files changed, 14741 insertions(+), 14454 deletions(-)
[1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw
Without this change, hg log --stat --config diff.showfunc=1 takes an
absurdly long time to complete:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
32813 0 80.3546 40.6086 mercurial.mdiff:160(yieldhunk)
+65062746 0 25.7227 25.7227 +<method 'match' of '_sre.SRE_Pattern' objects>
+65062746 0 14.0221 14.0221 +<method 'rstrip' of 'str' objects>
+1809 0 0.0009 0.0009 +mercurial.mdiff:148(contextend)
+1809 0 0.0003 0.0003 +<len>
65062746 0 25.7227 25.7227 <method 'match' of '_sre.SRE_Pattern' objects>
65062763 0 14.0221 14.0221 <method 'rstrip' of 'str' objects>
543 0 0.1631 0.1631 <zlib.decompress>
3 0 0.0505 0.0505 <mercurial.bdiff.blocks>
31007 0 80.4564 0.0477 mercurial.mdiff:147(_unidiff)
+32813 0 80.3546 40.6086 +mercurial.mdiff:160(yieldhunk)
+3 0 0.0505 0.0505 +<mercurial.bdiff.blocks>
+3618 0 0.0022 0.0022 +mercurial.mdiff:154(contextstart)
+5427 0 0.0013 0.0013 +<len>
+3 0 0.0001 0.0000 +re:188(compile)
1 0 80.8381 0.0322 mercurial.patch:1777(diffstatdata)
+107499 0 0.0235 0.0235 +<method 'startswith' of 'str' objects>
+31014 0 80.7820 0.0071 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+3 0 0.0000 0.0000 +<method 'group' of '_sre.SRE_Match' objects>
6 0 0.0444 0.0283 mercurial.mdiff:12(splitnewlines)
+6 0 0.0160 0.0160 +<method 'split' of 'str' objects>
32 0 0.0246 0.0246 <method 'update' of '_hashlib.HASH' objects>
11 0 0.0236 0.0236 <method 'read' of 'file' objects>
Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000)
With this change, it's almost as fast as not using showfunc at all:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
543 0 0.1699 0.1699 <zlib.decompress>
3 0 0.0501 0.0501 <mercurial.bdiff.blocks>
32813 0 0.0415 0.0348 mercurial.mdiff:161(yieldhunk)
+70837 0 0.0058 0.0058 +<method 'isalnum' of 'str' objects>
+1809 0 0.0006 0.0006 +mercurial.mdiff:148(contextend)
+1809 0 0.0002 0.0002 +<len>
1 0 0.4879 0.0310 mercurial.patch:1777(diffstatdata)
+107499 0 0.0230 0.0230 +<method 'startswith' of 'str' objects>
+31014 0 0.4335 0.0065 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+1 0 0.0004 0.0000 +re:188(compile)
32 0 0.0293 0.0293 <method 'update' of '_hashlib.HASH' objects>
6 0 0.0427 0.0279 mercurial.mdiff:12(splitnewlines)
+6 0 0.0147 0.0147 +<method 'split' of 'str' objects>
31007 0 0.1169 0.0235 mercurial.mdiff:147(_unidiff)
+3 0 0.0501 0.0501 +<mercurial.bdiff.blocks>
+32813 0 0.0415 0.0348 +mercurial.mdiff:161(yieldhunk)
+3618 0 0.0012 0.0012 +mercurial.mdiff:154(contextstart)
+5427 0 0.0006 0.0006 +<len>
107597 0 0.0230 0.0230 <method 'startswith' of 'str' objects>
16 0 0.0213 0.0213 <mercurial.mpatch.patches>
194 0 0.0149 0.0149 <method 'split' of 'str' objects>
Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
author | Brodie Rao <brodie@bitheap.org> |
---|---|
date | Mon, 19 Sep 2011 15:58:03 -0700 |
parents | 921683f14ad7 |
children | 117f9190c1ba 012b285cf643 |
line wrap: on
line source
$ hg init repo1 $ cd repo1 $ mkdir a b a/1 b/1 b/2 $ touch in_root a/in_a b/in_b a/1/in_a_1 b/1/in_b_1 b/2/in_b_2 hg status in repo root: $ hg status ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root hg status . in repo root: $ hg status . ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd a ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd a . ? 1/in_a_1 ? in_a $ hg status --cwd a .. ? 1/in_a_1 ? in_a ? ../b/1/in_b_1 ? ../b/2/in_b_2 ? ../b/in_b ? ../in_root $ hg status --cwd b ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd b . ? 1/in_b_1 ? 2/in_b_2 ? in_b $ hg status --cwd b .. ? ../a/1/in_a_1 ? ../a/in_a ? 1/in_b_1 ? 2/in_b_2 ? in_b ? ../in_root $ hg status --cwd a/1 ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd a/1 . ? in_a_1 $ hg status --cwd a/1 .. ? in_a_1 ? ../in_a $ hg status --cwd b/1 ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd b/1 . ? in_b_1 $ hg status --cwd b/1 .. ? in_b_1 ? ../2/in_b_2 ? ../in_b $ hg status --cwd b/2 ? a/1/in_a_1 ? a/in_a ? b/1/in_b_1 ? b/2/in_b_2 ? b/in_b ? in_root $ hg status --cwd b/2 . ? in_b_2 $ hg status --cwd b/2 .. ? ../1/in_b_1 ? in_b_2 ? ../in_b $ cd .. $ hg init repo2 $ cd repo2 $ touch modified removed deleted ignored $ echo "^ignored$" > .hgignore $ hg ci -A -m 'initial checkin' adding .hgignore adding deleted adding modified adding removed $ touch modified added unknown ignored $ hg add added $ hg remove removed $ rm deleted hg status: $ hg status A added R removed ! deleted ? unknown hg status modified added removed deleted unknown never-existed ignored: $ hg status modified added removed deleted unknown never-existed ignored never-existed: No such file or directory A added R removed ! deleted ? unknown $ hg copy modified copied hg status -C: $ hg status -C A added A copied modified R removed ! deleted ? unknown hg status -A: $ hg status -A A added A copied modified R removed ! deleted ? unknown I ignored C .hgignore C modified $ echo "^ignoreddir$" > .hgignore $ mkdir ignoreddir $ touch ignoreddir/file hg status ignoreddir/file: $ hg status ignoreddir/file hg status -i ignoreddir/file: $ hg status -i ignoreddir/file I ignoreddir/file $ cd .. Check 'status -q' and some combinations $ hg init repo3 $ cd repo3 $ touch modified removed deleted ignored $ echo "^ignored$" > .hgignore $ hg commit -A -m 'initial checkin' adding .hgignore adding deleted adding modified adding removed $ touch added unknown ignored $ hg add added $ echo "test" >> modified $ hg remove removed $ rm deleted $ hg copy modified copied Run status with 2 different flags. Check if result is the same or different. If result is not as expected, raise error $ assert() { > hg status $1 > ../a > hg status $2 > ../b > if diff ../a ../b > /dev/null; then > out=0 > else > out=1 > fi > if [ $3 -eq 0 ]; then > df="same" > else > df="different" > fi > if [ $out -ne $3 ]; then > echo "Error on $1 and $2, should be $df." > fi > } Assert flag1 flag2 [0-same | 1-different] $ assert "-q" "-mard" 0 $ assert "-A" "-marduicC" 0 $ assert "-qA" "-mardcC" 0 $ assert "-qAui" "-A" 0 $ assert "-qAu" "-marducC" 0 $ assert "-qAi" "-mardicC" 0 $ assert "-qu" "-u" 0 $ assert "-q" "-u" 1 $ assert "-m" "-a" 1 $ assert "-r" "-d" 1 $ cd .. $ hg init repo4 $ cd repo4 $ touch modified removed deleted $ hg ci -q -A -m 'initial checkin' $ touch added unknown $ hg add added $ hg remove removed $ rm deleted $ echo x > modified $ hg copy modified copied $ hg ci -m 'test checkin' -d "1000001 0" $ rm * $ touch unrelated $ hg ci -q -A -m 'unrelated checkin' -d "1000002 0" hg status --change 1: $ hg status --change 1 M modified A added A copied R removed hg status --change 1 unrelated: $ hg status --change 1 unrelated hg status -C --change 1 added modified copied removed deleted: $ hg status -C --change 1 added modified copied removed deleted M modified A added A copied modified R removed hg status -A --change 1: $ hg status -A --change 1 M modified A added A copied modified R removed C deleted