view tests/test-diff-newlines.t @ 50400:95acba2c29f6

encoding: avoid quadratic time complexity when json-encoding non-UTF8 strings Apparently the code uses "+=" with a bytes object, which is linear-time, so the whole encoding is quadratic-time. This patch makes us use a bytearray object, instead, which has a(n amortized-)constant-time append operation. The encoding is still not particularly fast, but at least a 10MB file takes tens of seconds, not many hours to encode.
author Arseniy Alekseyev <aalekseyev@janestreet.com>
date Mon, 06 Mar 2023 11:27:57 +0000
parents 55c6ebd11cb9
children
line wrap: on
line source

  $ hg init repo
  $ cd repo

  $ "$PYTHON" -c 'open("a", "wb").write(b"confuse str.splitlines\nembedded\rnewline\n")'
  $ hg ci -Ama -d '1 0'
  adding a

  $ echo clean diff >> a
  $ hg ci -mb -d '2 0'

  $ hg diff -r0 -r1
  diff -r 107ba6f817b5 -r 310ce7989cdc a
  --- a/a	Thu Jan 01 00:00:01 1970 +0000
  +++ b/a	Thu Jan 01 00:00:02 1970 +0000
  @@ -1,2 +1,3 @@
   confuse str.splitlines
   embedded\r (no-eol) (esc)
  newline
  +clean diff