mdiff: speed up showfunc for large diffs
This addresses the following issues with showfunc:
- Silly usage of regular expressions.
- Doing str.rstrip() needlessly in an inner loop.
- Doing catastrophic backtracking when trying to find a function line.
Finding function text is now at worst O(n lines in the old file), and
at best close to O(n hunks).
Given a diff like this[1]:
src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g | 4 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java | 2 +-
src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java | 29189 +++++----
3 files changed, 14741 insertions(+), 14454 deletions(-)
[1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw
Without this change, hg log --stat --config diff.showfunc=1 takes an
absurdly long time to complete:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
32813 0 80.3546 40.6086 mercurial.mdiff:160(yieldhunk)
+65062746 0 25.7227 25.7227 +<method 'match' of '_sre.SRE_Pattern' objects>
+65062746 0 14.0221 14.0221 +<method 'rstrip' of 'str' objects>
+1809 0 0.0009 0.0009 +mercurial.mdiff:148(contextend)
+1809 0 0.0003 0.0003 +<len>
65062746 0 25.7227 25.7227 <method 'match' of '_sre.SRE_Pattern' objects>
65062763 0 14.0221 14.0221 <method 'rstrip' of 'str' objects>
543 0 0.1631 0.1631 <zlib.decompress>
3 0 0.0505 0.0505 <mercurial.bdiff.blocks>
31007 0 80.4564 0.0477 mercurial.mdiff:147(_unidiff)
+32813 0 80.3546 40.6086 +mercurial.mdiff:160(yieldhunk)
+3 0 0.0505 0.0505 +<mercurial.bdiff.blocks>
+3618 0 0.0022 0.0022 +mercurial.mdiff:154(contextstart)
+5427 0 0.0013 0.0013 +<len>
+3 0 0.0001 0.0000 +re:188(compile)
1 0 80.8381 0.0322 mercurial.patch:1777(diffstatdata)
+107499 0 0.0235 0.0235 +<method 'startswith' of 'str' objects>
+31014 0 80.7820 0.0071 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+3 0 0.0000 0.0000 +<method 'group' of '_sre.SRE_Match' objects>
6 0 0.0444 0.0283 mercurial.mdiff:12(splitnewlines)
+6 0 0.0160 0.0160 +<method 'split' of 'str' objects>
32 0 0.0246 0.0246 <method 'update' of '_hashlib.HASH' objects>
11 0 0.0236 0.0236 <method 'read' of 'file' objects>
Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000)
With this change, it's almost as fast as not using showfunc at all:
CallCount Recursive Total(ms) Inline(ms) module:lineno(function)
543 0 0.1699 0.1699 <zlib.decompress>
3 0 0.0501 0.0501 <mercurial.bdiff.blocks>
32813 0 0.0415 0.0348 mercurial.mdiff:161(yieldhunk)
+70837 0 0.0058 0.0058 +<method 'isalnum' of 'str' objects>
+1809 0 0.0006 0.0006 +mercurial.mdiff:148(contextend)
+1809 0 0.0002 0.0002 +<len>
1 0 0.4879 0.0310 mercurial.patch:1777(diffstatdata)
+107499 0 0.0230 0.0230 +<method 'startswith' of 'str' objects>
+31014 0 0.4335 0.0065 +mercurial.util:1284(iterlines)
+3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects>
+4 0 0.0000 0.0000 +mercurial.patch:1783(addresult)
+1 0 0.0004 0.0000 +re:188(compile)
32 0 0.0293 0.0293 <method 'update' of '_hashlib.HASH' objects>
6 0 0.0427 0.0279 mercurial.mdiff:12(splitnewlines)
+6 0 0.0147 0.0147 +<method 'split' of 'str' objects>
31007 0 0.1169 0.0235 mercurial.mdiff:147(_unidiff)
+3 0 0.0501 0.0501 +<mercurial.bdiff.blocks>
+32813 0 0.0415 0.0348 +mercurial.mdiff:161(yieldhunk)
+3618 0 0.0012 0.0012 +mercurial.mdiff:154(contextstart)
+5427 0 0.0006 0.0006 +<len>
107597 0 0.0230 0.0230 <method 'startswith' of 'str' objects>
16 0 0.0213 0.0213 <mercurial.mpatch.patches>
194 0 0.0149 0.0149 <method 'split' of 'str' objects>
Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
$ echo "[extensions]" >> $HGRCPATH
$ echo "mq=" >> $HGRCPATH
$ echo "graphlog=" >> $HGRCPATH
make a test repository that looks like this:
o 2:28bc7b1afd6a
|
| @ 1:d7fe2034f71b
|/
o 0/62ecad8b70e5
$ hg init r0
$ cd r0
$ touch f0
$ hg ci -m0 -Aq
$ touch f1
$ hg ci -m1 -Aq
$ hg update 0 -q
$ touch f2
$ hg ci -m2 -Aq
$ hg update 1 -q
make some patches with a parent: 1:d7fe2034f71b -> p0 -> p1
$ echo cp0 >> fp0
$ hg add fp0
$ hg ci -m p0 -d "0 0"
$ hg export -r. > p0
$ hg strip -qn .
$ hg qimport p0
adding p0 to series file
$ hg qpush
applying p0
now at: p0
$ echo cp1 >> fp1
$ hg add fp1
$ hg qnew p1 -d "0 0"
$ hg qpop -aq
patch queue now empty
qpush --exact when at the parent
$ hg update 1 -q
$ hg qpush -e
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg qpush -e p0
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg qpush -e p1
applying p0
applying p1
now at: p1
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
qpush --exact when at another rev
$ hg update 0 -q
$ hg qpush -e
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 0 -q
$ hg qpush -e p0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 0 -q
$ hg qpush -e p1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
applying p0
applying p1
now at: p1
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 0 -q
$ hg qpush -ea
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
applying p0
applying p1
now at: p1
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
qpush --exact while crossing branches
$ hg update 2 -q
$ hg qpush -e
1 files updated, 0 files merged, 1 files removed, 0 files unresolved
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 2 -q
$ hg qpush -e p0
1 files updated, 0 files merged, 1 files removed, 0 files unresolved
applying p0
now at: p0
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 2 -q
$ hg qpush -e p1
1 files updated, 0 files merged, 1 files removed, 0 files unresolved
applying p0
applying p1
now at: p1
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
$ hg update 2 -q
$ hg qpush -ea
1 files updated, 0 files merged, 1 files removed, 0 files unresolved
applying p0
applying p1
now at: p1
$ hg parents -qr qbase
1:d7fe2034f71b
$ hg qpop -aq
patch queue now empty
qpush --exact --force with changes to an unpatched file
$ hg update 1 -q
$ echo c0 >> f0
$ hg qpush -e
abort: local changes found
[255]
$ hg qpush -ef
applying p0
now at: p0
$ cat f0
c0
$ rm f0
$ touch f0
$ hg qpop -aq
patch queue now empty
$ hg update 1 -q
$ echo c0 >> f0
$ hg qpush -e p1
abort: local changes found
[255]
$ hg qpush -e p1 -f
applying p0
applying p1
now at: p1
$ cat f0
c0
$ rm f0
$ touch f0
$ hg qpop -aq
patch queue now empty
qpush --exact --force with changes to a patched file
$ hg update 1 -q
$ echo cp0-bad >> fp0
$ hg add fp0
$ hg qpush -e
abort: local changes found
[255]
$ hg qpush -ef
applying p0
file fp0 already exists
1 out of 1 hunks FAILED -- saving rejects to file fp0.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh p0
[2]
$ cat fp0
cp0-bad
$ cat fp0.rej
--- fp0
+++ fp0
@@ -0,0 +1,1 @@
+cp0
$ hg qpop -aqf
patch queue now empty
$ rm fp0
$ rm fp0.rej
$ hg update 1 -q
$ echo cp1-bad >> fp1
$ hg add fp1
$ hg qpush -e p1
abort: local changes found
[255]
$ hg qpush -e p1 -f
applying p0
applying p1
file fp1 already exists
1 out of 1 hunks FAILED -- saving rejects to file fp1.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh p1
[2]
$ cat fp1
cp1-bad
$ cat fp1.rej
--- fp1
+++ fp1
@@ -0,0 +1,1 @@
+cp1
$ hg qpop -aqf
patch queue now empty
$ rm fp1
$ rm fp1.rej
qpush --exact when already at a patch
$ hg update 1
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ hg qpush -e p0
applying p0
now at: p0
$ hg qpush -e p1
abort: cannot push --exact with applied patches
[255]
$ hg qpop -aq
patch queue now empty
qpush --exact --move should fail
$ hg qpush -e --move p1
abort: cannot use --exact and --move together
[255]
qpush --exact a patch without a parent recorded
$ hg qpush -q
now at: p0
$ grep -v '# Parent' .hg/patches/p0 > p0.new
$ mv p0.new .hg/patches/p0
$ hg qpop -aq
patch queue now empty
$ hg qpush -e
abort: p0 does not have a parent recorded
[255]
$ hg qpush -e p0
abort: p0 does not have a parent recorded
[255]
$ hg qpush -e p1
abort: p0 does not have a parent recorded
[255]
$ hg qpush -ea
abort: p0 does not have a parent recorded
[255]