view tests/test-convert-cvs-synthetic.t @ 15141:16dc9a32ca04

mdiff: speed up showfunc for large diffs This addresses the following issues with showfunc: - Silly usage of regular expressions. - Doing str.rstrip() needlessly in an inner loop. - Doing catastrophic backtracking when trying to find a function line. Finding function text is now at worst O(n lines in the old file), and at best close to O(n hunks). Given a diff like this[1]: src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g | 4 +- src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java | 2 +- src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java | 29189 +++++---- 3 files changed, 14741 insertions(+), 14454 deletions(-) [1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw Without this change, hg log --stat --config diff.showfunc=1 takes an absurdly long time to complete: CallCount Recursive Total(ms) Inline(ms) module:lineno(function) 32813 0 80.3546 40.6086 mercurial.mdiff:160(yieldhunk) +65062746 0 25.7227 25.7227 +<method 'match' of '_sre.SRE_Pattern' objects> +65062746 0 14.0221 14.0221 +<method 'rstrip' of 'str' objects> +1809 0 0.0009 0.0009 +mercurial.mdiff:148(contextend) +1809 0 0.0003 0.0003 +<len> 65062746 0 25.7227 25.7227 <method 'match' of '_sre.SRE_Pattern' objects> 65062763 0 14.0221 14.0221 <method 'rstrip' of 'str' objects> 543 0 0.1631 0.1631 <zlib.decompress> 3 0 0.0505 0.0505 <mercurial.bdiff.blocks> 31007 0 80.4564 0.0477 mercurial.mdiff:147(_unidiff) +32813 0 80.3546 40.6086 +mercurial.mdiff:160(yieldhunk) +3 0 0.0505 0.0505 +<mercurial.bdiff.blocks> +3618 0 0.0022 0.0022 +mercurial.mdiff:154(contextstart) +5427 0 0.0013 0.0013 +<len> +3 0 0.0001 0.0000 +re:188(compile) 1 0 80.8381 0.0322 mercurial.patch:1777(diffstatdata) +107499 0 0.0235 0.0235 +<method 'startswith' of 'str' objects> +31014 0 80.7820 0.0071 +mercurial.util:1284(iterlines) +3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects> +4 0 0.0000 0.0000 +mercurial.patch:1783(addresult) +3 0 0.0000 0.0000 +<method 'group' of '_sre.SRE_Match' objects> 6 0 0.0444 0.0283 mercurial.mdiff:12(splitnewlines) +6 0 0.0160 0.0160 +<method 'split' of 'str' objects> 32 0 0.0246 0.0246 <method 'update' of '_hashlib.HASH' objects> 11 0 0.0236 0.0236 <method 'read' of 'file' objects> Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000) With this change, it's almost as fast as not using showfunc at all: CallCount Recursive Total(ms) Inline(ms) module:lineno(function) 543 0 0.1699 0.1699 <zlib.decompress> 3 0 0.0501 0.0501 <mercurial.bdiff.blocks> 32813 0 0.0415 0.0348 mercurial.mdiff:161(yieldhunk) +70837 0 0.0058 0.0058 +<method 'isalnum' of 'str' objects> +1809 0 0.0006 0.0006 +mercurial.mdiff:148(contextend) +1809 0 0.0002 0.0002 +<len> 1 0 0.4879 0.0310 mercurial.patch:1777(diffstatdata) +107499 0 0.0230 0.0230 +<method 'startswith' of 'str' objects> +31014 0 0.4335 0.0065 +mercurial.util:1284(iterlines) +3 0 0.0000 0.0000 +<method 'search' of '_sre.SRE_Pattern' objects> +4 0 0.0000 0.0000 +mercurial.patch:1783(addresult) +1 0 0.0004 0.0000 +re:188(compile) 32 0 0.0293 0.0293 <method 'update' of '_hashlib.HASH' objects> 6 0 0.0427 0.0279 mercurial.mdiff:12(splitnewlines) +6 0 0.0147 0.0147 +<method 'split' of 'str' objects> 31007 0 0.1169 0.0235 mercurial.mdiff:147(_unidiff) +3 0 0.0501 0.0501 +<mercurial.bdiff.blocks> +32813 0 0.0415 0.0348 +mercurial.mdiff:161(yieldhunk) +3618 0 0.0012 0.0012 +mercurial.mdiff:154(contextstart) +5427 0 0.0006 0.0006 +<len> 107597 0 0.0230 0.0230 <method 'startswith' of 'str' objects> 16 0 0.0213 0.0213 <mercurial.mpatch.patches> 194 0 0.0149 0.0149 <method 'split' of 'str' objects> Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
author Brodie Rao <brodie@bitheap.org>
date Mon, 19 Sep 2011 15:58:03 -0700
parents c4f271293134
children ed923a2d5ae9
line wrap: on
line source

This feature requires use of builtin cvsps!

  $ "$TESTDIR/hghave" cvs || exit 80
  $ echo "[extensions]" >> $HGRCPATH
  $ echo "convert = " >> $HGRCPATH
  $ echo "graphlog = " >> $HGRCPATH

create cvs repository with one project

  $ mkdir cvsrepo
  $ cd cvsrepo
  $ CVSROOT=`pwd`
  $ export CVSROOT
  $ CVS_OPTIONS=-f
  $ export CVS_OPTIONS
  $ cd ..
  $ cvscall()
  > {
  >     cvs -f "$@"
  > }

output of 'cvs ci' varies unpredictably, so just discard it

  $ cvsci()
  > {
  >     sleep 1
  >     cvs -f ci "$@" >/dev/null
  > }
  $ cvscall -d "$CVSROOT" init
  $ mkdir cvsrepo/proj
  $ cvscall -q co proj

create file1 on the trunk

  $ cd proj
  $ touch file1
  $ cvscall -Q add file1
  $ cvsci -m"add file1 on trunk" file1

create two branches

  $ cvscall -q tag -b v1_0
  T file1
  $ cvscall -q tag -b v1_1
  T file1

create file2 on branch v1_0

  $ cvscall -Q up -rv1_0
  $ touch file2
  $ cvscall -Q add file2
  $ cvsci -m"add file2" file2

create file3, file4 on branch v1_1

  $ cvscall -Q up -rv1_1
  $ touch file3
  $ touch file4
  $ cvscall -Q add file3 file4
  $ cvsci -m"add file3, file4 on branch v1_1" file3 file4

merge file2 from v1_0 to v1_1

  $ cvscall -Q up -jv1_0
  $ cvsci -m"MERGE from v1_0: add file2"
  cvs commit: Examining .

Step things up a notch: now we make the history really hairy, with
changes bouncing back and forth between trunk and v1_2 and merges
going both ways.  (I.e., try to model the real world.)
create branch v1_2

  $ cvscall -Q up -A
  $ cvscall -q tag -b v1_2
  T file1

create file5 on branch v1_2

  $ cvscall -Q up -rv1_2
  $ touch file5
  $ cvs -Q add file5
  $ cvsci -m"add file5 on v1_2"
  cvs commit: Examining .

create file6 on trunk post-v1_2

  $ cvscall -Q up -A
  $ touch file6
  $ cvscall -Q add file6
  $ cvsci -m"add file6 on trunk post-v1_2"
  cvs commit: Examining .

merge file5 from v1_2 to trunk

  $ cvscall -Q up -A
  $ cvscall -Q up -jv1_2 file5
  $ cvsci -m"MERGE from v1_2: add file5"
  cvs commit: Examining .

merge file6 from trunk to v1_2

  $ cvscall -Q up -rv1_2
  $ cvscall up -jHEAD file6
  U file6
  $ cvsci -m"MERGE from HEAD: add file6"
  cvs commit: Examining .

cvs rlog output

  $ cvscall -q rlog proj | egrep '^(RCS file|revision)'
  RCS file: $TESTTMP/cvsrepo/proj/file1,v
  revision 1.1
  RCS file: $TESTTMP/cvsrepo/proj/Attic/file2,v
  revision 1.1
  revision 1.1.4.2
  revision 1.1.4.1
  revision 1.1.2.1
  RCS file: $TESTTMP/cvsrepo/proj/Attic/file3,v
  revision 1.1
  revision 1.1.2.1
  RCS file: $TESTTMP/cvsrepo/proj/Attic/file4,v
  revision 1.1
  revision 1.1.2.1
  RCS file: $TESTTMP/cvsrepo/proj/file5,v
  revision 1.2
  revision 1.1
  revision 1.1.2.1
  RCS file: $TESTTMP/cvsrepo/proj/file6,v
  revision 1.1
  revision 1.1.2.2
  revision 1.1.2.1

convert to hg (#1)

  $ cd ..
  $ hg convert --datesort proj proj.hg
  initializing destination proj.hg repository
  connecting to $TESTTMP/cvsrepo
  scanning source...
  collecting CVS rlog
  15 log entries
  creating changesets
  8 changeset entries
  sorting...
  converting...
  7 add file1 on trunk
  6 add file2
  5 add file3, file4 on branch v1_1
  4 MERGE from v1_0: add file2
  3 add file5 on v1_2
  2 add file6 on trunk post-v1_2
  1 MERGE from v1_2: add file5
  0 MERGE from HEAD: add file6

hg glog output (#1)

  $ hg -R proj.hg glog --template "{rev} {desc}\n"
  o  7 MERGE from HEAD: add file6
  |
  | o  6 MERGE from v1_2: add file5
  | |
  | o  5 add file6 on trunk post-v1_2
  | |
  o |  4 add file5 on v1_2
  |/
  | o  3 MERGE from v1_0: add file2
  | |
  | o  2 add file3, file4 on branch v1_1
  |/
  | o  1 add file2
  |/
  o  0 add file1 on trunk
  

convert to hg (#2: with merge detection)

  $ hg convert \
  >   --config convert.cvsps.mergefrom='"^MERGE from (\S+):"' \
  >   --datesort \
  >   proj proj.hg2
  initializing destination proj.hg2 repository
  connecting to $TESTTMP/cvsrepo
  scanning source...
  collecting CVS rlog
  15 log entries
  creating changesets
  8 changeset entries
  sorting...
  converting...
  7 add file1 on trunk
  6 add file2
  5 add file3, file4 on branch v1_1
  4 MERGE from v1_0: add file2
  3 add file5 on v1_2
  2 add file6 on trunk post-v1_2
  1 MERGE from v1_2: add file5
  0 MERGE from HEAD: add file6

hg glog output (#2)

  $ hg -R proj.hg2 glog --template "{rev} {desc}\n"
  o  7 MERGE from HEAD: add file6
  |
  | o  6 MERGE from v1_2: add file5
  | |
  | o  5 add file6 on trunk post-v1_2
  | |
  o |  4 add file5 on v1_2
  |/
  | o  3 MERGE from v1_0: add file2
  | |
  | o  2 add file3, file4 on branch v1_1
  |/
  | o  1 add file2
  |/
  o  0 add file1 on trunk