perf: add threading capability to perfbdiff
Since we are releasing the GIL during diffing, it is interesting to see how a
thread pool would perform on diffing. We add a new `--threads` argument to
commands. Synchronizing the thread pool is a bit complex because we want to be
able to reuse it from one run to another.
On my computer (i7 with 4 cores + hyperthreading), I get the following data for
about 12000 revisions:
threads wall comb wall gain comb overhead
none 31.596715 31.59 0.00% 0.00%
1 31.621228 31.62 -0.08% 0.09%
2 16.406202 32.8 48.08% 3.83%
3 11.598334 34.76 63.29% 10.03%
4 9.205421 36.77 70.87% 16.40%
5 8.517604 42.51 73.04% 34.57%
6 7.94645 47.58 74.85% 50.62%
7 7.434972 51.92 76.47% 64.36%
8 7.070638 55.34 77.62% 75.18%
Compared to the feature disabled (threads=0), the overhead is negligible with
the threading code (threads=1), and the gain is already 48% with two threads.
#!/bin/sh
# Script to get stable diff output on any platform.
#
# Output of this script is almost equivalent to GNU diff with "-Nru".
#
# Use this script as "hg pdiff" via extdiff extension with preparation
# below in test scripts:
#
# $ cat >> $HGRCPATH <<EOF
# > [extdiff]
# > pdiff = sh "$RUNTESTDIR/pdiff"
# > EOF
filediff(){
# USAGE: filediff file1 file2 [header]
# compare with /dev/null if file doesn't exist (as "-N" option)
file1="$1"
if test ! -f "$file1"; then
file1=/dev/null
fi
file2="$2"
if test ! -f "$file2"; then
file2=/dev/null
fi
if cmp -s "$file1" "$file2" 2> /dev/null; then
# Return immediately, because comparison isn't needed. This
# also avoids redundant message of diff like "No differences
# encountered" (on Solaris)
return
fi
if test -n "$3"; then
# show header only in recursive case
echo "$3"
fi
# replace "/dev/null" by corresponded filename (as "-N" option)
diff -u "$file1" "$file2" |
sed "s@^--- /dev/null\(.*\)\$@--- $1\1@" |
sed "s@^\+\+\+ /dev/null\(.*\)\$@+++ $2\1@"
# in this case, files differ from each other
return 1
}
if test -d "$1" -o -d "$2"; then
# ensure comparison in dictionary order
(
if test -d "$1"; then (cd "$1" && find . -type f); fi
if test -d "$2"; then (cd "$2" && find . -type f); fi
) |
sed 's@^\./@@g' | sort | uniq |
while read file; do
filediff "$1/$file" "$2/$file" "diff -Nru $1/$file $2/$file"
done
# TODO: there is no portable way for current while-read based
# implementation to return 1 at detecting changes.
#
# On bash and dash, assignment to variable inside while-block
# doesn't affect outside, because inside while-block is executed
# in sub-shell. BTW, it affects outside while-block on ksh (as sh
# on Solaris).
else
filediff "$1" "$2"
fi