mdiff: speed up showfunc for large diffs
This addresses the following issues with showfunc:
- Silly usage of regular expressions.
- Doing str.rstrip() needlessly in an inner loop.
- Doing catastrophic backtracking when trying to find a function line.
Finding function text is now at worst O(n lines in the old file), and
at best close to O(n hunks).
Given a diff like this[1]:
src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g | 4 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java | 2 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java | 29189 +++++----
3 files changed, 14741 insertions(+), 14454 deletions(-)
[1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw
Without this change, hg log --stat --config diff.showfunc=1 takes an
absurdly long time to complete:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
32813 0 80.3546 40.6086 mercurial.mdiff:160(yieldhunk)
+65062746 0 25.7227 25.7227 +<method 'match' of '_sre.SRE_Pattern' objects>
+65062746 0 14.0221 14.0221 +<method 'rstrip' of 'str' objects>
+1809 0 0.0009 0.0009 +mercurial.mdiff:148(contextend)
+1809 0 0.0003 0.0003 +<len>
65062746 0 25.7227 25.7227 <method 'match' of '_sre.SRE_Pattern' objects>
65062763 0 14.0221 14.0221 <method 'rstrip' of 'str' objects>
543 0 0.1631 0.1631 <zlib.decompress>
3 0 0.0505 0.0505 <mercurial.bdiff.blocks>
31007 0 80.4564 0.0477 mercurial.mdiff:147(_unidiff)
+32813 0 80.3546 40.6086 +mercurial.mdiff:160(yieldhunk)
+3 0 0.0505 0.0505 +<mercurial.bdiff.blocks>
+3618 0 0.0022 0.0022 +mercurial.mdiff:154(contextstart)
+5427 0 0.0013 0.0013 +<len>
+3 0 0.0001 0.0000 +re:188(compile)
1 0 80.8381 0.0322 mercurial.patch:1777(diffstatdata)
+107499 0 0.0235 0.0235 +<method 'startswith' of 'str' objects>
+31014 0 80.7820 0.0071 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+3 0 0.0000 0.0000 +<method 'group' of '_sre.SRE_Match' objects>
6 0 0.0444 0.0283 mercurial.mdiff:12(splitnewlines)
+6 0 0.0160 0.0160 +<method 'split' of 'str' objects>
32 0 0.0246 0.0246 <method 'update' of '_hashlib.HASH' objects>
11 0 0.0236 0.0236 <method 'read' of 'file' objects>
Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000)
With this change, it's almost as fast as not using showfunc at all:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
543 0 0.1699 0.1699 <zlib.decompress>
3 0 0.0501 0.0501 <mercurial.bdiff.blocks>
32813 0 0.0415 0.0348 mercurial.mdiff:161(yieldhunk)
+70837 0 0.0058 0.0058 +<method 'isalnum' of 'str' objects>
+1809 0 0.0006 0.0006 +mercurial.mdiff:148(contextend)
+1809 0 0.0002 0.0002 +<len>
1 0 0.4879 0.0310 mercurial.patch:1777(diffstatdata)
+107499 0 0.0230 0.0230 +<method 'startswith' of 'str' objects>
+31014 0 0.4335 0.0065 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+1 0 0.0004 0.0000 +re:188(compile)
32 0 0.0293 0.0293 <method 'update' of '_hashlib.HASH' objects>
6 0 0.0427 0.0279 mercurial.mdiff:12(splitnewlines)
+6 0 0.0147 0.0147 +<method 'split' of 'str' objects>
31007 0 0.1169 0.0235 mercurial.mdiff:147(_unidiff)
+3 0 0.0501 0.0501 +<mercurial.bdiff.blocks>
+32813 0 0.0415 0.0348 +mercurial.mdiff:161(yieldhunk)
+3618 0 0.0012 0.0012 +mercurial.mdiff:154(contextstart)
+5427 0 0.0006 0.0006 +<len>
107597 0 0.0230 0.0230 <method 'startswith' of 'str' objects>
16 0 0.0213 0.0213 <mercurial.mpatch.patches>
194 0 0.0149 0.0149 <method 'split' of 'str' objects>
Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
$ echo "[extensions]" >> $HGRCPATH
$ echo "color=" >> $HGRCPATH
$ echo "[color]" >> $HGRCPATH
$ echo "mode=ansi" >> $HGRCPATH
Terminfo codes compatibility fix
$ echo "color.none=0" >> $HGRCPATH
$ hg init repo1
$ cd repo1
$ mkdir a b a/1 b/1 b/2
$ touch in_root a/in_a b/in_b a/1/in_a_1 b/1/in_b_1 b/2/in_b_2
hg status in repo root:
$ hg status --color=always
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
hg status . in repo root:
$ hg status --color=always .
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd a
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd a .
\x1b[0;35;1;4m? 1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? in_a\x1b[0m (esc)
$ hg status --color=always --cwd a ..
\x1b[0;35;1;4m? 1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? in_a\x1b[0m (esc)
\x1b[0;35;1;4m? ../b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? ../b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? ../b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? ../in_root\x1b[0m (esc)
$ hg status --color=always --cwd b
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd b .
\x1b[0;35;1;4m? 1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? 2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? in_b\x1b[0m (esc)
$ hg status --color=always --cwd b ..
\x1b[0;35;1;4m? ../a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? ../a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? 1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? 2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? in_b\x1b[0m (esc)
\x1b[0;35;1;4m? ../in_root\x1b[0m (esc)
$ hg status --color=always --cwd a/1
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd a/1 .
\x1b[0;35;1;4m? in_a_1\x1b[0m (esc)
$ hg status --color=always --cwd a/1 ..
\x1b[0;35;1;4m? in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? ../in_a\x1b[0m (esc)
$ hg status --color=always --cwd b/1
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd b/1 .
\x1b[0;35;1;4m? in_b_1\x1b[0m (esc)
$ hg status --color=always --cwd b/1 ..
\x1b[0;35;1;4m? in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? ../2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? ../in_b\x1b[0m (esc)
$ hg status --color=always --cwd b/2
\x1b[0;35;1;4m? a/1/in_a_1\x1b[0m (esc)
\x1b[0;35;1;4m? a/in_a\x1b[0m (esc)
\x1b[0;35;1;4m? b/1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? b/2/in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? b/in_b\x1b[0m (esc)
\x1b[0;35;1;4m? in_root\x1b[0m (esc)
$ hg status --color=always --cwd b/2 .
\x1b[0;35;1;4m? in_b_2\x1b[0m (esc)
$ hg status --color=always --cwd b/2 ..
\x1b[0;35;1;4m? ../1/in_b_1\x1b[0m (esc)
\x1b[0;35;1;4m? in_b_2\x1b[0m (esc)
\x1b[0;35;1;4m? ../in_b\x1b[0m (esc)
$ cd ..
$ hg init repo2
$ cd repo2
$ touch modified removed deleted ignored
$ echo "^ignored$" > .hgignore
$ hg ci -A -m 'initial checkin'
adding .hgignore
adding deleted
adding modified
adding removed
$ touch modified added unknown ignored
$ hg add added
$ hg remove removed
$ rm deleted
hg status:
$ hg status --color=always
\x1b[0;32;1mA added\x1b[0m (esc)
\x1b[0;31;1mR removed\x1b[0m (esc)
\x1b[0;36;1;4m! deleted\x1b[0m (esc)
\x1b[0;35;1;4m? unknown\x1b[0m (esc)
hg status modified added removed deleted unknown never-existed ignored:
$ hg status --color=always modified added removed deleted unknown never-existed ignored
never-existed: No such file or directory
\x1b[0;32;1mA added\x1b[0m (esc)
\x1b[0;31;1mR removed\x1b[0m (esc)
\x1b[0;36;1;4m! deleted\x1b[0m (esc)
\x1b[0;35;1;4m? unknown\x1b[0m (esc)
$ hg copy modified copied
hg status -C:
$ hg status --color=always -C
\x1b[0;32;1mA added\x1b[0m (esc)
\x1b[0;32;1mA copied\x1b[0m (esc)
\x1b[0;0m modified\x1b[0m (esc)
\x1b[0;31;1mR removed\x1b[0m (esc)
\x1b[0;36;1;4m! deleted\x1b[0m (esc)
\x1b[0;35;1;4m? unknown\x1b[0m (esc)
hg status -A:
$ hg status --color=always -A
\x1b[0;32;1mA added\x1b[0m (esc)
\x1b[0;32;1mA copied\x1b[0m (esc)
\x1b[0;0m modified\x1b[0m (esc)
\x1b[0;31;1mR removed\x1b[0m (esc)
\x1b[0;36;1;4m! deleted\x1b[0m (esc)
\x1b[0;35;1;4m? unknown\x1b[0m (esc)
\x1b[0;30;1mI ignored\x1b[0m (esc)
\x1b[0;0mC .hgignore\x1b[0m (esc)
\x1b[0;0mC modified\x1b[0m (esc)
hg status -A (with terminfo color):
$ mkdir $TESTTMP/terminfo
$ TERMINFO=$TESTTMP/terminfo tic $TESTDIR/hgterm.ti
$ TERM=hgterm TERMINFO=$TESTTMP/terminfo hg status --config color.mode=terminfo --color=always -A
\x1b[30m\x1b[32m\x1b[1mA added\x1b[30m (esc)
\x1b[30m\x1b[32m\x1b[1mA copied\x1b[30m (esc)
\x1b[30m\x1b[30m modified\x1b[30m (esc)
\x1b[30m\x1b[31m\x1b[1mR removed\x1b[30m (esc)
\x1b[30m\x1b[36m\x1b[1m\x1b[4m! deleted\x1b[30m (esc)
\x1b[30m\x1b[35m\x1b[1m\x1b[4m? unknown\x1b[30m (esc)
\x1b[30m\x1b[30m\x1b[1mI ignored\x1b[30m (esc)
\x1b[30m\x1b[30mC .hgignore\x1b[30m (esc)
\x1b[30m\x1b[30mC modified\x1b[30m (esc)
$ echo "^ignoreddir$" > .hgignore
$ mkdir ignoreddir
$ touch ignoreddir/file
hg status ignoreddir/file:
$ hg status --color=always ignoreddir/file
hg status -i ignoreddir/file:
$ hg status --color=always -i ignoreddir/file
\x1b[0;30;1mI ignoreddir/file\x1b[0m (esc)
$ cd ..
check 'status -q' and some combinations
$ hg init repo3
$ cd repo3
$ touch modified removed deleted ignored
$ echo "^ignored$" > .hgignore
$ hg commit -A -m 'initial checkin'
adding .hgignore
adding deleted
adding modified
adding removed
$ touch added unknown ignored
$ hg add added
$ echo "test" >> modified
$ hg remove removed
$ rm deleted
$ hg copy modified copied
test unknown color
$ hg --config color.status.modified=periwinkle status --color=always
ignoring unknown color/effect 'periwinkle' (configured in color.status.modified)
M modified
\x1b[0;32;1mA added\x1b[0m (esc)
\x1b[0;32;1mA copied\x1b[0m (esc)
\x1b[0;31;1mR removed\x1b[0m (esc)
\x1b[0;36;1;4m! deleted\x1b[0m (esc)
\x1b[0;35;1;4m? unknown\x1b[0m (esc)
Run status with 2 different flags.
Check if result is the same or different.
If result is not as expected, raise error
$ assert() {
> hg status --color=always $1 > ../a
> hg status --color=always $2 > ../b
> if diff ../a ../b > /dev/null; then
> out=0
> else
> out=1
> fi
> if [ $3 -eq 0 ]; then
> df="same"
> else
> df="different"
> fi
> if [ $out -ne $3 ]; then
> echo "Error on $1 and $2, should be $df."
> fi
> }
assert flag1 flag2 [0-same | 1-different]
$ assert "-q" "-mard" 0
$ assert "-A" "-marduicC" 0
$ assert "-qA" "-mardcC" 0
$ assert "-qAui" "-A" 0
$ assert "-qAu" "-marducC" 0
$ assert "-qAi" "-mardicC" 0
$ assert "-qu" "-u" 0
$ assert "-q" "-u" 1
$ assert "-m" "-a" 1
$ assert "-r" "-d" 1
$ cd ..
test 'resolve -l'
$ hg init repo4
$ cd repo4
$ echo "file a" > a
$ echo "file b" > b
$ hg add a b
$ hg commit -m "initial"
$ echo "file a change 1" > a
$ echo "file b change 1" > b
$ hg commit -m "head 1"
$ hg update 0
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ echo "file a change 2" > a
$ echo "file b change 2" > b
$ hg commit -m "head 2"
created new head
$ hg merge
merging a
warning: conflicts during merge.
merging a failed!
merging b
warning: conflicts during merge.
merging b failed!
0 files updated, 0 files merged, 0 files removed, 2 files unresolved
use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon
[1]
$ hg resolve -m b
hg resolve with one unresolved, one resolved:
$ hg resolve --color=always -l
\x1b[0;31;1mU a\x1b[0m (esc)
\x1b[0;32;1mR b\x1b[0m (esc)