bdiff: extend matches across popular lines
For very large diffs that have large numbers of identical lines (JSON
dumps) that also have large blocks of identical text, bdiff could become
confused about which block matches which because it can only match
very limited regions. The result is very large diffs for small sets of edits.
The earlier recursion rebalancing fix made this behavior more frequent because
it's now more prone to match block 1 to block 2. One frequent user of
large JSON files reported being unable to pass the resulting diffs
through their code review system.
Prior to this change, bdiff would calculate the length of a match at
(i, j) as 1 + length found at (i-1, j-1). With large number of popular
(ignored) lines, this often meant matches couldn't be extended
backwards at all and thus all matching regions were very small.
Disabling the popularity threshold is not an option because it brings
back quadratic behavior.
Instead, we extend a match backwards until we either found a previously
discovered match or we find a mismatching line. This thus successfully
bridges over any popular lines inside and before a matching region.
The larger regions then significant reduce the probability of confusion.
#!/bin/sh -eu
# This function exists to set up the DOCKER variable and verify that
# it's the binary we expect. It also verifies that the docker service
# is running on the system and we can talk to it.
function checkdocker() {
if which docker.io >> /dev/null 2>&1 ; then
DOCKER=docker.io
elif which docker >> /dev/null 2>&1 ; then
DOCKER=docker
else
echo "Error: docker must be installed"
exit 1
fi
$DOCKER -h 2> /dev/null | grep -q Jansens && { echo "Error: $DOCKER is the Docking System Tray - install docker.io instead"; exit 1; }
$DOCKER version | grep -Eq "^Client( version)?:" || { echo "Error: unexpected output from \"$DOCKER version\""; exit 1; }
$DOCKER version | grep -Eq "^Server( version)?:" || { echo "Error: could not get docker server version - check it is running and your permissions"; exit 1; }
}
# Construct a container and leave its name in $CONTAINER for future use.
function initcontainer() {
[ "$1" ] || { echo "Error: platform name must be specified"; exit 1; }
DFILE="$ROOTDIR/contrib/docker/$1"
[ -f "$DFILE" ] || { echo "Error: docker file $DFILE not found"; exit 1; }
CONTAINER="hg-dockerrpm-$1"
DBUILDUSER=build
(
cat $DFILE
if [ $(uname) = "Darwin" ] ; then
# The builder is using boot2docker on OS X, so we're going to
# *guess* the uid of the user inside the VM that is actually
# running docker. This is *very likely* to fail at some point.
echo RUN useradd $DBUILDUSER -u 1000
else
echo RUN groupadd $DBUILDUSER -g `id -g` -o
echo RUN useradd $DBUILDUSER -u `id -u` -g $DBUILDUSER -o
fi
) | $DOCKER build --tag $CONTAINER -
}