annotate mercurial/bdiff.c @ 32697:19b9fc40cc51

revlog: skeleton support for version 2 revlogs There are a number of improvements we want to make to revlogs that will require a new version - version 2. It is unclear what the full set of improvements will be or when we'll be done with them. What I do know is that the process will likely take longer than a single release, will require input from various stakeholders to evaluate changes, and will have many contentious debates and bikeshedding. It is unrealistic to develop revlog version 2 up front: there are just too many uncertainties that we won't know until things are implemented and experiments are run. Some changes will also be invasive and prone to bit rot, so sitting on dozens of patches is not practical. This commit introduces skeleton support for version 2 revlogs in a way that is flexible and not bound by backwards compatibility concerns. An experimental repo requirement for denoting revlog v2 has been added. The requirement string has a sub-version component to it. This will allow us to declare multiple requirements in the course of developing revlog v2. Whenever we change the in-development revlog v2 format, we can tweak the string, creating a new requirement and locking out old clients. This will allow us to make as many backwards incompatible changes and experiments to revlog v2 as we want. In other words, we can land code and make meaningful progress towards revlog v2 while still maintaining extreme format flexibility up until the point we freeze the format and remove the experimental labels. To enable the new repo requirement, you must supply an experimental and undocumented config option. But not just any boolean flag will do: you need to explicitly use a value that no sane person should ever type. This is an additional guard against enabling revlog v2 on an installation it shouldn't be enabled on. The specific scenario I'm trying to prevent is say a user with a 4.4 client with a frozen format enabling the option but then downgrading to 4.3 and accidentally creating repos with an outdated and unsupported repo format. Requiring a "challenge" string should prevent this. Because the format is not yet finalized and I don't want to take any chances, revlog v2's version is currently 0xDEAD. I figure squatting on a value we're likely never to use as an actual revlog version to mean "internal testing only" is acceptable. And "dead" is easily recognized as something meaningful. There is a bunch of cleanup that is needed before work on revlog v2 begins in earnest. I plan on doing that work once this patch is accepted and we're comfortable with the idea of starting down this path.
author Gregory Szorc <gregory.szorc@gmail.com>
date Fri, 19 May 2017 20:29:11 -0700
parents d195fa651b51
children ff4c9c6263de
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
1 /*
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
2 bdiff.c - efficient binary diff extension for Mercurial
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
3
2859
345bac2bc4ec update copyrights.
Vadim Gelfer <vadim.gelfer@gmail.com>
parents: 2600
diff changeset
4 Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
5
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
6 This software may be used and distributed according to the terms of
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
7 the GNU General Public License, incorporated herein by reference.
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
8
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
9 Based roughly on Python difflib
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
10 */
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
11
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
12 #include <stdlib.h>
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
13 #include <string.h>
5341
458acf92b49e bdiff: use INT_MAX to avoid some inner loop comparisons
Matt Mackall <mpm@selenic.com>
parents: 5340
diff changeset
14 #include <limits.h>
867
0cd2ee61b10a Allow Mercurial to build on HP-UX 11
tksoh@users.sourceforge.net
parents: 839
diff changeset
15
29539
666832b9e154 bdiff: use ssize_t in favor of Py_ssize_t in cpython-unaware locations
Maciej Fijalkowski <fijall@gmail.com>
parents: 29444
diff changeset
16 #include "compat.h"
29444
284d742e5611 internals: move the bitmanipulation routines into its own file
Maciej Fijalkowski <fijall@gmail.com>
parents: 29323
diff changeset
17 #include "bitmanipulation.h"
29541
9631ff5ebbeb bdiff: split bdiff into cpy-aware and cpy-agnostic part
Maciej Fijalkowski <fijall@gmail.com>
parents: 29540
diff changeset
18 #include "bdiff.h"
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
19
30318
e1d6aa0e4c3a bdiff: replace hash algorithm
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30308
diff changeset
20 /* Hash implementation from diffutils */
e1d6aa0e4c3a bdiff: replace hash algorithm
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30308
diff changeset
21 #define ROL(v, n) ((v) << (n) | (v) >> (sizeof(v) * CHAR_BIT - (n)))
e1d6aa0e4c3a bdiff: replace hash algorithm
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30308
diff changeset
22 #define HASH(h, c) ((c) + ROL(h ,7))
e1d6aa0e4c3a bdiff: replace hash algorithm
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30308
diff changeset
23
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
24 struct pos {
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
25 int pos, len;
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
26 };
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
27
29541
9631ff5ebbeb bdiff: split bdiff into cpy-aware and cpy-agnostic part
Maciej Fijalkowski <fijall@gmail.com>
parents: 29540
diff changeset
28 int bdiff_splitlines(const char *a, ssize_t len, struct bdiff_line **lr)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
29 {
13732
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
30 unsigned hash;
13731
5d0cdf4ec338 bdiff.c: use unsigned arithmetic for hash computation
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13730
diff changeset
31 int i;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
32 const char *p, *b = a;
5340
5737845fd974 bdiff: simple splitlines optimization
Christoph Spiel <cspiel@freenet.de>
parents: 5339
diff changeset
33 const char * const plast = a + len - 1;
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
34 struct bdiff_line *l;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
35
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
36 /* count the lines */
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
37 i = 1; /* extra line for sentinel */
30308
d500ddae7494 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29541
diff changeset
38 for (p = a; p < plast; p++)
d500ddae7494 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29541
diff changeset
39 if (*p == '\n')
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
40 i++;
30308
d500ddae7494 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29541
diff changeset
41 if (p == plast)
d500ddae7494 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29541
diff changeset
42 i++;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
43
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
44 *lr = l = (struct bdiff_line *)malloc(sizeof(struct bdiff_line) * i);
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
45 if (!l)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
46 return -1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
47
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
48 /* build the line array and calculate hashes */
13732
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
49 hash = 0;
30461
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
50 for (p = a; p < plast; p++) {
30318
e1d6aa0e4c3a bdiff: replace hash algorithm
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30308
diff changeset
51 hash = HASH(hash, *p);
5342
d0c48891dd4a bdiff: switch to lyhash
Matt Mackall <mpm@selenic.com>
parents: 5341
diff changeset
52
30461
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
53 if (*p == '\n') {
13732
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
54 l->hash = hash;
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
55 hash = 0;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
56 l->len = p - b + 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
57 l->l = b;
5341
458acf92b49e bdiff: use INT_MAX to avoid some inner loop comparisons
Matt Mackall <mpm@selenic.com>
parents: 5340
diff changeset
58 l->n = INT_MAX;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
59 l++;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
60 b = p + 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
61 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
62 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
63
30461
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
64 if (p == plast) {
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
65 hash = HASH(hash, *p);
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
66 l->hash = hash;
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
67 l->len = p - b + 1;
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
68 l->l = b;
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
69 l->n = INT_MAX;
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
70 l++;
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
71 }
d195fa651b51 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com>
parents: 30433
diff changeset
72
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
73 /* set up a sentinel */
13732
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
74 l->hash = 0;
13731
5d0cdf4ec338 bdiff.c: use unsigned arithmetic for hash computation
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13730
diff changeset
75 l->len = 0;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
76 l->l = a + len;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
77 return i - 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
78 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
79
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
80 static inline int cmp(struct bdiff_line *a, struct bdiff_line *b)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
81 {
13732
afe9269dccec bdiff.c: rename all variables which hold a hash value to "hash"
Markus F.X.J. Oberhumer <markus@oberhumer.com>
parents: 13731
diff changeset
82 return a->hash != b->hash || a->len != b->len || memcmp(a->l, b->l, a->len);
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
83 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
84
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
85 static int equatelines(struct bdiff_line *a, int an, struct bdiff_line *b,
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
86 int bn)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
87 {
5452
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
88 int i, j, buckets = 1, t, scale;
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
89 struct pos *h = NULL;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
90
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
91 /* build a hash table of the next highest power of 2 */
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
92 while (buckets < bn + 1)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
93 buckets *= 2;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
94
5339
058e93c3d07d I have spotted the biggest bottleneck in "bdiff.c". Actually it was
Christoph Spiel <cspiel@freenet.de>
parents: 4134
diff changeset
95 /* try to allocate a large hash table to avoid collisions */
5452
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
96 for (scale = 4; scale; scale /= 2) {
5339
058e93c3d07d I have spotted the biggest bottleneck in "bdiff.c". Actually it was
Christoph Spiel <cspiel@freenet.de>
parents: 4134
diff changeset
97 h = (struct pos *)malloc(scale * buckets * sizeof(struct pos));
5452
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
98 if (h)
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
99 break;
82b4ff3abbcd bdiff: tweaks for large files
Matt Mackall <mpm@selenic.com>
parents: 5342
diff changeset
100 }
5339
058e93c3d07d I have spotted the biggest bottleneck in "bdiff.c". Actually it was
Christoph Spiel <cspiel@freenet.de>
parents: 4134
diff changeset
101
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
102 if (!h)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
103 return 0;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
104
5339
058e93c3d07d I have spotted the biggest bottleneck in "bdiff.c". Actually it was
Christoph Spiel <cspiel@freenet.de>
parents: 4134
diff changeset
105 buckets = buckets * scale - 1;
058e93c3d07d I have spotted the biggest bottleneck in "bdiff.c". Actually it was
Christoph Spiel <cspiel@freenet.de>
parents: 4134
diff changeset
106
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
107 /* clear the hash table */
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
108 for (i = 0; i <= buckets; i++) {
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
109 h[i].pos = -1;
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
110 h[i].len = 0;
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
111 }
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
112
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
113 /* add lines to the hash table chains */
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
114 for (i = 0; i < bn; i++) {
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
115 /* find the equivalence class */
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
116 for (j = b[i].hash & buckets; h[j].pos != -1;
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
117 j = (j + 1) & buckets)
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
118 if (!cmp(b + i, b + h[j].pos))
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
119 break;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
120
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
121 /* add to the head of the equivalence class */
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
122 b[i].n = h[j].pos;
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
123 b[i].e = j;
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
124 h[j].pos = i;
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
125 h[j].len++; /* keep track of popularity */
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
126 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
127
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
128 /* compute popularity threshold */
9534
8e202431d620 bdiff: gradually enable the popularity hack
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 8858
diff changeset
129 t = (bn >= 31000) ? bn / 1000 : 1000000 / (bn + 1);
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
130
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
131 /* match items in a to their equivalence class in b */
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
132 for (i = 0; i < an; i++) {
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
133 /* find the equivalence class */
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
134 for (j = a[i].hash & buckets; h[j].pos != -1;
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
135 j = (j + 1) & buckets)
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
136 if (!cmp(a + i, b + h[j].pos))
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
137 break;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
138
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
139 a[i].e = j; /* use equivalence class for quick compare */
1542
8e80eefb3de6 made C src formatting more consistent
twaldmann@thinkmo.de
parents: 1397
diff changeset
140 if (h[j].len <= t)
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
141 a[i].n = h[j].pos; /* point to head of match list */
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
142 else
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
143 a[i].n = -1; /* too popular */
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
144 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
145
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
146 /* discard hash tables */
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
147 free(h);
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
148 return 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
149 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
150
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
151 static int longest_match(struct bdiff_line *a, struct bdiff_line *b,
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
152 struct pos *pos,
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
153 int a1, int a2, int b1, int b2, int *omi, int *omj)
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
154 {
30431
8c0c75aa3ff4 bdiff: give slight preference to longest matches in the middle of the B side
Mads Kiilerich <madski@unity3d.com>
parents: 30430
diff changeset
155 int mi = a1, mj = b1, mk = 0, i, j, k, half, bhalf;
29015
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
156
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
157 /* window our search on large regions to better bound
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
158 worst-case performance. by choosing a window at the end, we
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
159 reduce skipping overhead on the b chains. */
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
160 if (a2 - a1 > 30000)
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
161 a1 = a2 - 30000;
87d4a6c5567e bdiff: further restrain potential quadratic performance
Matt Mackall <mpm@selenic.com>
parents: 29014
diff changeset
162
30429
38ed54888617 bdiff: adjust criteria for getting optimal longest match in the A side middle
Mads Kiilerich <madski@unity3d.com>
parents: 30318
diff changeset
163 half = (a1 + a2 - 1) / 2;
30431
8c0c75aa3ff4 bdiff: give slight preference to longest matches in the middle of the B side
Mads Kiilerich <madski@unity3d.com>
parents: 30430
diff changeset
164 bhalf = (b1 + b2 - 1) / 2;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
165
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
166 for (i = a1; i < a2; i++) {
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
167 /* skip all lines in b after the current block */
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
168 for (j = a[i].n; j >= b2; j = b[j].n)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
169 ;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
170
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
171 /* loop through all lines match a[i] in b */
29013
9a8363d23419 bdiff: deal better with duplicate lines
Matt Mackall <mpm@selenic.com>
parents: 29012
diff changeset
172 for (; j >= b1; j = b[j].n) {
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
173 /* does this extend an earlier match? */
29322
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
174 for (k = 1; j - k >= b1 && i - k >= a1; k++) {
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
175 /* reached an earlier match? */
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
176 if (pos[j - k].pos == i - k) {
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
177 k += pos[j - k].len;
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
178 break;
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
179 }
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
180 /* previous line mismatch? */
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
181 if (a[i - k].e != b[j - k].e)
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
182 break;
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
183 }
66dbdd3cc2b9 bdiff: extend matches across popular lines
Matt Mackall <mpm@selenic.com>
parents: 29015
diff changeset
184
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
185 pos[j].pos = i;
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
186 pos[j].len = k;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
187
29014
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents: 29013
diff changeset
188 /* best match so far? we prefer matches closer
f1ca249696ed bdiff: balance recursion to avoid quadratic behavior (issue4704)
Matt Mackall <mpm@selenic.com>
parents: 29013
diff changeset
189 to the middle to balance recursion */
30430
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
190 if (k > mk) {
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
191 /* a longer match */
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
192 mi = i;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
193 mj = j;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
194 mk = k;
30430
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
195 } else if (k == mk) {
30433
96f2f50d923f bdiff: give slight preference to removing trailing lines
Mads Kiilerich <madski@unity3d.com>
parents: 30432
diff changeset
196 if (i > mi && i <= half && j > b1) {
30430
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
197 /* same match but closer to half */
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
198 mi = i;
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
199 mj = j;
30432
3633403888ae bdiff: give slight preference to appending lines
Mads Kiilerich <madski@unity3d.com>
parents: 30431
diff changeset
200 } else if (i == mi && (mj > bhalf || i == a1)) {
30431
8c0c75aa3ff4 bdiff: give slight preference to longest matches in the middle of the B side
Mads Kiilerich <madski@unity3d.com>
parents: 30430
diff changeset
201 /* same i but best earlier j */
30430
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
202 mj = j;
5c4e2636c1a9 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com>
parents: 30429
diff changeset
203 }
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
204 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
205 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
206 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
207
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
208 if (mk) {
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
209 mi = mi - mk + 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
210 mj = mj - mk + 1;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
211 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
212
29323
d29cb5e735e9 bdiff: remove effectively dead code
Matt Mackall <mpm@selenic.com>
parents: 29322
diff changeset
213 /* expand match to include subsequent popular lines */
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
214 while (mi + mk < a2 && mj + mk < b2 &&
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
215 a[mi + mk].e == b[mj + mk].e)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
216 mk++;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
217
29323
d29cb5e735e9 bdiff: remove effectively dead code
Matt Mackall <mpm@selenic.com>
parents: 29322
diff changeset
218 *omi = mi;
d29cb5e735e9 bdiff: remove effectively dead code
Matt Mackall <mpm@selenic.com>
parents: 29322
diff changeset
219 *omj = mj;
5341
458acf92b49e bdiff: use INT_MAX to avoid some inner loop comparisons
Matt Mackall <mpm@selenic.com>
parents: 5340
diff changeset
220
29323
d29cb5e735e9 bdiff: remove effectively dead code
Matt Mackall <mpm@selenic.com>
parents: 29322
diff changeset
221 return mk;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
222 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
223
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
224 static struct bdiff_hunk *recurse(struct bdiff_line *a, struct bdiff_line *b,
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
225 struct pos *pos,
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
226 int a1, int a2, int b1, int b2, struct bdiff_hunk *l)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
227 {
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
228 int i, j, k;
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
229
10500
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
230 while (1) {
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
231 /* find the longest match in this chunk */
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
232 k = longest_match(a, b, pos, a1, a2, b1, b2, &i, &j);
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
233 if (!k)
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
234 return l;
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
235
10500
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
236 /* and recurse on the remaining chunks on either side */
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
237 l = recurse(a, b, pos, a1, i, b1, j, l);
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
238 if (!l)
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
239 return NULL;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
240
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
241 l->next = (struct bdiff_hunk *)malloc(sizeof(struct bdiff_hunk));
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
242 if (!l->next)
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
243 return NULL;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
244
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
245 l = l->next;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
246 l->a1 = i;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
247 l->a2 = i + k;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
248 l->b1 = j;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
249 l->b2 = j + k;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
250 l->next = NULL;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
251
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
252 /* tail-recursion didn't happen, so do equivalent iteration */
10500
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
253 a1 = i + k;
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
254 b1 = j + k;
e96597c8d0ea bdiff: do not use recursion / avoid stackoverflow (issue1940)
Alistair Bell <alistair@bellsonline.com>
parents: 10282
diff changeset
255 }
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
256 }
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
257
29541
9631ff5ebbeb bdiff: split bdiff into cpy-aware and cpy-agnostic part
Maciej Fijalkowski <fijall@gmail.com>
parents: 29540
diff changeset
258 int bdiff_diff(struct bdiff_line *a, int an, struct bdiff_line *b,
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
259 int bn, struct bdiff_hunk *base)
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
260 {
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
261 struct bdiff_hunk *curr;
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
262 struct pos *pos;
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
263 int t, count = 0;
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
264
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
265 /* allocate and fill arrays */
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
266 t = equatelines(a, an, b, bn);
5571
f84bb2e1cc3a fix calloc(0, ...) issue
Jim Hague <jim.hague@acm.org>
parents: 5452
diff changeset
267 pos = (struct pos *)calloc(bn ? bn : 1, sizeof(struct pos));
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
268
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
269 if (pos && t) {
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
270 /* generate the matching block list */
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
271
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
272 curr = recurse(a, b, pos, 0, an, 0, bn, base);
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
273 if (!curr)
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
274 return -1;
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
275
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
276 /* sentinel end hunk */
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
277 curr->next = (struct bdiff_hunk *)malloc(sizeof(struct bdiff_hunk));
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
278 if (!curr->next)
13090
c73745762f33 bdiff: Fix bogus NULL return
Matt Mackall <mpm@selenic.com>
parents: 13089
diff changeset
279 return -1;
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
280 curr = curr->next;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
281 curr->a1 = curr->a2 = an;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
282 curr->b1 = curr->b2 = bn;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
283 curr->next = NULL;
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
284 }
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
285
474
b2ae8283d1a6 Minor speed improvements for bdiff
mpm@selenic.com
parents: 472
diff changeset
286 free(pos);
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
287
7625
930a2be7e875 bdiff: add comment about normalization
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7189
diff changeset
288 /* normalize the hunk list, try to push each hunk towards the end */
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
289 for (curr = base->next; curr; curr = curr->next) {
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
290 struct bdiff_hunk *next = curr->next;
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
291
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
292 if (!next)
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
293 break;
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
294
29010
e868d8ee7c8f bdiff: unify duplicate normalize loops
Matt Mackall <mpm@selenic.com>
parents: 19962
diff changeset
295 if (curr->a2 == next->a1 || curr->b2 == next->b1)
29011
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
296 while (curr->a2 < an && curr->b2 < bn
29012
4bd67ae7d75a bdiff: fix latent normalization bug
Matt Mackall <mpm@selenic.com>
parents: 29011
diff changeset
297 && next->a1 < next->a2
4bd67ae7d75a bdiff: fix latent normalization bug
Matt Mackall <mpm@selenic.com>
parents: 29011
diff changeset
298 && next->b1 < next->b2
29011
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
299 && !cmp(a + curr->a2, b + curr->b2)) {
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
300 curr->a2++;
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
301 next->a1++;
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
302 curr->b2++;
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
303 next->b1++;
8bcda4c76820 bdiff: fold in shift calculation in normalize
Matt Mackall <mpm@selenic.com>
parents: 29010
diff changeset
304 }
7104
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
305 }
9514cbb6e4f6 bdiff: normalize the diff (issue1295)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 7036
diff changeset
306
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
307 for (curr = base->next; curr; curr = curr->next)
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
308 count++;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
309 return count;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
310 }
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
311
29541
9631ff5ebbeb bdiff: split bdiff into cpy-aware and cpy-agnostic part
Maciej Fijalkowski <fijall@gmail.com>
parents: 29540
diff changeset
312 void bdiff_freehunks(struct bdiff_hunk *l)
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
313 {
29540
4ce1fc91e30a bdiff: rename functions and structs to be amenable for later exporting
Maciej Fijalkowski <fijall@gmail.com>
parents: 29539
diff changeset
314 struct bdiff_hunk *n;
13089
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
315 for (; l; l = n) {
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
316 n = l->next;
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
317 free(l);
faee0ffbc24b bdiff: dynamically allocate hunks
Matt Mackall <mpm@selenic.com>
parents: 11364
diff changeset
318 }
433
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
319 }
79c694462294 Add bdiff.blocks / minor performance tweaks
mpm@selenic.com
parents: 411
diff changeset
320
400
8b067bde6679 Add a fast binary diff extension (not yet used)
mpm@selenic.com
parents:
diff changeset
321