encoding: fix trim() to be O(n) instead of O(n^2)
`encoding.trim()` iterated over the possible lengths smaller than the
input and created a slice for each. It then calculated the column
width of the result, which is of course O(n), so the overall algorithm
was O(n). This patch rewrites it to iterate over the unicode
characters, keeping track of the length so far. Also, the old
algorithm started from the end of the string, which made it much worse
when the input is large and the limit is small (such as the typical 72
we pass to it).
You can time it by running something like this:
```
time python3 -c 'from mercurial.utils import stringutil; print(stringutil.ellipsis(b"0123456789" * 1000, 5))'
```
That drops from 4.05 s to 83 ms with this patch (and most of that is
of course startup time).
Differential Revision: https://phab.mercurial-scm.org/D12089
$ cat > writepatterns.py <<EOF
> import sys
>
> path = sys.argv[1]
> patterns = sys.argv[2:]
>
> fp = open(path, 'wb')
> for pattern in patterns:
> count = int(pattern[0:-1])
> char = pattern[-1].encode('utf8') + b'\n'
> fp.write(char * count)
> fp.close()
> EOF
prepare repo
$ hg init a
$ cd a
These initial lines of Xs were not in the original file used to generate
the patch. So all the patch hunks need to be applied to a constant offset
within this file. If the offset isn't tracked then the hunks can be
applied to the wrong lines of this file.
$ "$PYTHON" ../writepatterns.py a 34X 10A 1B 10A 1C 10A 1B 10A 1D 10A 1B 10A 1E 10A 1B 10A
$ hg commit -Am adda
adding a
This is a cleaner patch generated via diff
In this case it reproduces the problem when
the output of hg export does not
import patch
$ hg import -v -m 'b' -d '2 0' - <<EOF
> --- a/a 2009-12-08 19:26:17.000000000 -0800
> +++ b/a 2009-12-08 19:26:17.000000000 -0800
> @@ -9,7 +9,7 @@
> A
> A
> B
> -A
> +a
> A
> A
> A
> @@ -53,7 +53,7 @@
> A
> A
> B
> -A
> +a
> A
> A
> A
> @@ -75,7 +75,7 @@
> A
> A
> B
> -A
> +a
> A
> A
> A
> EOF
applying patch from stdin
patching file a
Hunk #1 succeeded at 43 (offset 34 lines).
Hunk #2 succeeded at 87 (offset 34 lines).
Hunk #3 succeeded at 109 (offset 34 lines).
committing files:
a
committing manifest
committing changelog
created 189885cecb41
compare imported changes against reference file
$ "$PYTHON" ../writepatterns.py aref 34X 10A 1B 1a 9A 1C 10A 1B 10A 1D 10A 1B 1a 9A 1E 10A 1B 1a 9A
$ diff aref a
$ cd ..