annotate tests/test-issue4074.t @ 29014:f1ca249696ed stable

bdiff: balance recursion to avoid quadratic behavior (issue4704) For highly structured files like JSON or XML dumps with large numbers of duplicate lines (eg braces) and isolated matching lines, bdiff could find large numbers of equally good spans. Because it prefers earlier matches, this would result in pathologically unbalance recursion that resulted in quadratic performance. This patch makes it prefer matches closer to the middle that tend to balance recursion. This change improves the speed of a pathological test case from 1100s to 9s. Included is a smaller test that has a roughly 50x safety margin on the performance it accepts. It's likely to fail on pure builds because difflib also has a recursion-balancing problem.
author Matt Mackall <mpm@selenic.com>
date Thu, 21 Apr 2016 22:04:11 -0500
parents
children 75be14993fda
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
29014
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
1 #require no-pure
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
2
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
3 A script to generate nasty diff worst-case scenarios:
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
4
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
5 $ cat > s.py <<EOF
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
6 > import random
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
7 > for x in xrange(100000):
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
8 > print
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
9 > if random.randint(0, 100) >= 50:
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
10 > x += 1
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
11 > print hex(x)
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
12 > EOF
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
13
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
14 $ hg init a
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
15 $ cd a
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
16
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
17 Check in a big file:
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
18
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
19 $ python ../s.py > a
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
20 $ hg ci -qAm0
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
21
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
22 Modify it:
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
23
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
24 $ python ../s.py > a
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
25
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
26 Time a check-in, should never take more than 10 seconds user time:
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
27
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
28 $ hg ci --time -m1
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
29 time: real .* secs .user [0-9][.].* sys .* (re)