annotate tests/test-bdiff.py @ 29013:9a8363d23419 stable

bdiff: deal better with duplicate lines The longest_match code compares all the possible positions in two files to find the best match. Given a pair of sequences, it effectively searches a grid like this: a b b b c . d e . f 0 1 2 3 4 5 6 7 8 9 a 1 - - - - - - - - - b - 2 1 1 - - - - - - b - 1 3 2 - - - - - - b - 1 2 4 - - - - - - . - - - - - 1 - - 1 - Here, the 4 in the middle says "the first four lines of the file match", which it can compute be comparing the fourth lines and then adding one to the result found when comparing the third lines in the entry to the upper left. We generally avoid the quadratic worst case by only looking at lines that match, which is precomputed. We also avoid quadratic storage by only keeping a single column vector and then keeping track of the best match. Unfortunately, this can get us into trouble with the sequences above. Because we want to reuse the '3' value when calculating the '4', we need to be careful not to overwrite it with the '2' we calculate immediately before. If we scan left to right, top to bottom, we're going to have a problem: we'll overwrite our 3 before we use it and calculate a suboptimal best match. To address this, we can either keep two column vectors and swap between them (which significantly complicates bookkeeping), or change our scanning order. If we instead scan from left to right, bottom to top, we'll avoid ever overwriting values we'll need in the future. This unfortunately needs several changes to be made simultaneously: - change the order we build the initial hash chains for the b sequence - change the sentinel values from INT_MAX to -1 - change the visit order in the longest_match inner loop - add a tie-breaker preference for earlier matches This last is needed because we previously had an implicit tie-breaker from our visitation order that our test suite relies on. Later matches can also trigger a bug in the normalization code in diff().
author Matt Mackall <mpm@selenic.com>
date Thu, 21 Apr 2016 21:05:26 -0500
parents 4e51f9d1683e
children ede7bc45bf0a
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
1 from __future__ import absolute_import, print_function
8656
284fda4cd093 removed unused imports
Martin Geisler <mg@lazybytes.net>
parents: 8449
diff changeset
2 import struct
28733
2e54aaa65afc py3: use absolute_import in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 15530
diff changeset
3 from mercurial import (
2e54aaa65afc py3: use absolute_import in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 15530
diff changeset
4 bdiff,
2e54aaa65afc py3: use absolute_import in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 15530
diff changeset
5 mpatch,
2e54aaa65afc py3: use absolute_import in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 15530
diff changeset
6 )
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
7
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
8 def test1(a, b):
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
9 d = bdiff.bdiff(a, b)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
10 c = a
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
11 if d:
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
12 c = mpatch.patches(a, [d])
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
13 if c != b:
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
14 print("***", repr(a), repr(b))
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
15 print("bad:")
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
16 print(repr(c)[:200])
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
17 print(repr(d))
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
18
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
19 def test(a, b):
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
20 print("***", repr(a), repr(b))
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
21 test1(a, b)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
22 test1(b, a)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
23
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
24 test("a\nc\n\n\n\n", "a\nb\n\n\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
25 test("a\nb\nc\n", "a\nc\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
26 test("", "")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
27 test("a\nb\nc", "a\nb\nc")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
28 test("a\nb\nc\nd\n", "a\nd\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
29 test("a\nb\nc\nd\n", "a\nc\ne\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
30 test("a\nb\nc\n", "a\nc\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
31 test("a\n", "c\na\nb\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
32 test("a\n", "")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
33 test("a\n", "b\nc\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
34 test("a\n", "c\na\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
35 test("", "adjfkjdjksdhfksj")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
36 test("", "ab")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
37 test("", "abc")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
38 test("a", "a")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
39 test("ab", "ab")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
40 test("abc", "abc")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
41 test("a\n", "a\n")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
42 test("a\nb", "a\nb")
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
43
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
44 #issue1295
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
45 def showdiff(a, b):
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
46 bin = bdiff.bdiff(a, b)
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
47 pos = 0
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
48 while pos < len(bin):
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
49 p1, p2, l = struct.unpack(">lll", bin[pos:pos + 12])
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
50 pos += 12
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
51 print(p1, p2, repr(bin[pos:pos + l]))
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
52 pos += l
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
53 showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\nx\n\nz\n")
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
54 showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n")
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 28734
diff changeset
55 # we should pick up abbbc. rather than bc.de as the longest match
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 28734
diff changeset
56 showdiff("a\nb\nb\nb\nc\n.\nd\ne\n.\nf\n",
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 28734
diff changeset
57 "a\nb\nb\na\nb\nb\nb\nc\n.\nb\nc\n.\nd\ne\nf\n")
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 814
diff changeset
58
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
59 print("done")
15530
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
60
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
61 def testfixws(a, b, allws):
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
62 c = bdiff.fixws(a, allws)
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
63 if c != b:
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
64 print("*** fixws", repr(a), repr(b), allws)
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
65 print("got:")
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
66 print(repr(c))
15530
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
67
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
68 testfixws(" \ta\r b\t\n", "ab\n", 1)
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
69 testfixws(" \ta\r b\t\n", " a b\n", 0)
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
70 testfixws("", "", 1)
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
71 testfixws("", "", 0)
eeac5e179243 mdiff: replace wscleanup() regexps with C loops
Patrick Mezard <pmezard@gmail.com>
parents: 12865
diff changeset
72
28734
4e51f9d1683e py3: use print_function in test-bdiff.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 28733
diff changeset
73 print("done")