comparison mercurial/bdiff.c @ 29015:87d4a6c5567e stable

bdiff: further restrain potential quadratic performance This causes the longest_match search to limit itself to a window of 30000 lines during search (roughly 1MB), thus avoiding a full O(N*M) search that might occur in repetitive structured inputs. For a particular class of many MB pathological test cases, this generated the following timings: size before after 10x 1.25s 1.24s 100x 57s 33s 1000x >8400s 400s The times on the right quickly become much faster and appear more linear. While windowing means deltas are no longer "optimal", the resulting deltas were within a couple percent of expected size. While we've yet to have a report of a file with the level of repetition necessary to hit this case, some JSON/XML database dump scenario is fairly likely to hit it. This may also slightly improve the average-case performance for deltas of large binaries.
author Matt Mackall <mpm@selenic.com>
date Fri, 22 Apr 2016 13:38:02 -0500
parents f1ca249696ed
children 66dbdd3cc2b9
comparison
equal deleted inserted replaced
29014:f1ca249696ed 29015:87d4a6c5567e
146 } 146 }
147 147
148 static int longest_match(struct line *a, struct line *b, struct pos *pos, 148 static int longest_match(struct line *a, struct line *b, struct pos *pos,
149 int a1, int a2, int b1, int b2, int *omi, int *omj) 149 int a1, int a2, int b1, int b2, int *omi, int *omj)
150 { 150 {
151 int mi = a1, mj = b1, mk = 0, mb = 0, i, j, k, half = (a1 + a2) / 2; 151 int mi = a1, mj = b1, mk = 0, mb = 0, i, j, k, half;
152
153 /* window our search on large regions to better bound
154 worst-case performance. by choosing a window at the end, we
155 reduce skipping overhead on the b chains. */
156 if (a2 - a1 > 30000)
157 a1 = a2 - 30000;
158
159 half = (a1 + a2) / 2;
152 160
153 for (i = a1; i < a2; i++) { 161 for (i = a1; i < a2; i++) {
154 /* skip all lines in b after the current block */ 162 /* skip all lines in b after the current block */
155 for (j = a[i].n; j >= b2; j = b[j].n) 163 for (j = a[i].n; j >= b2; j = b[j].n)
156 ; 164 ;