cborutil: change buffering strategy
Profiling revealed that we were spending a lot of time on the
line that was concatenating the old buffer with the incoming data
when attempting to decode long byte strings, such as manifest
revisions.
Essentially, we were feeding N chunks of size len(X) << len(Y) into
decode() and continuously allocating a new, larger buffer to hold
the undecoded input. This created substantial memory churn and
slowed down execution.
Changing the code to aggregate pending chunks in a list until we
have enough data to fully decode the next atom makes things much
more efficient.
I don't have exact data, but I recall the old code spending >1s
on manifest fulltexts from the mozilla-unified repo. The new code
doesn't significantly appear in profile output.
Differential Revision: https://phab.mercurial-scm.org/D4854
# reproduce issue2264, issue2516
create test repo
$ cat <<EOF >> $HGRCPATH
> [extensions]
> transplant =
> EOF
$ hg init repo
$ cd repo
$ template="{rev} {desc|firstline} [{branch}]\n"
# we need to start out with two changesets on the default branch
# in order to avoid the cute little optimization where transplant
# pulls rather than transplants
add initial changesets
$ echo feature1 > file1
$ hg ci -Am"feature 1"
adding file1
$ echo feature2 >> file2
$ hg ci -Am"feature 2"
adding file2
# The changes to 'bugfix' are enough to show the bug: in fact, with only
# those changes, it's a very noisy crash ("RuntimeError: nothing
# committed after transplant"). But if we modify a second file in the
# transplanted changesets, the bug is much more subtle: transplant
# silently drops the second change to 'bugfix' on the floor, and we only
# see it when we run 'hg status' after transplanting. Subtle data loss
# bugs are worse than crashes, so reproduce the subtle case here.
commit bug fixes on bug fix branch
$ hg branch fixes
marked working directory as branch fixes
(branches are permanent and global, did you want a bookmark?)
$ echo fix1 > bugfix
$ echo fix1 >> file1
$ hg ci -Am"fix 1"
adding bugfix
$ echo fix2 > bugfix
$ echo fix2 >> file1
$ hg ci -Am"fix 2"
$ hg log -G --template="$template"
@ 3 fix 2 [fixes]
|
o 2 fix 1 [fixes]
|
o 1 feature 2 [default]
|
o 0 feature 1 [default]
transplant bug fixes onto release branch
$ hg update 0
1 files updated, 0 files merged, 2 files removed, 0 files unresolved
$ hg branch release
marked working directory as branch release
$ hg transplant 2 3
applying [0-9a-f]{12} (re)
[0-9a-f]{12} transplanted to [0-9a-f]{12} (re)
applying [0-9a-f]{12} (re)
[0-9a-f]{12} transplanted to [0-9a-f]{12} (re)
$ hg log -G --template="$template"
@ 5 fix 2 [release]
|
o 4 fix 1 [release]
|
| o 3 fix 2 [fixes]
| |
| o 2 fix 1 [fixes]
| |
| o 1 feature 2 [default]
|/
o 0 feature 1 [default]
$ hg status
$ hg status --rev 0:4
M file1
A bugfix
$ hg status --rev 4:5
M bugfix
M file1
now test that we fixed the bug for all scripts/extensions
$ cat > $TESTTMP/committwice.py <<__EOF__
> from mercurial import ui, hg, match, node
> from time import sleep
>
> def replacebyte(fn, b):
> f = open(fn, "rb+")
> f.seek(0, 0)
> f.write(b)
> f.close()
>
> def printfiles(repo, rev):
> repo.ui.status(b"revision %d files: [%s]\n"
> % (rev, b', '.join(b"'%s'" % f
> for f in repo[rev].files())))
>
> repo = hg.repository(ui.ui.load(), b'.')
> assert len(repo) == 6, \
> "initial: len(repo): %d, expected: 6" % len(repo)
>
> replacebyte(b"bugfix", b"u")
> sleep(2)
> try:
> repo.ui.status(b"PRE: len(repo): %d\n" % len(repo))
> wlock = repo.wlock()
> lock = repo.lock()
> replacebyte(b"file1", b"x")
> repo.commit(text=b"x", user=b"test", date=(0, 0))
> replacebyte(b"file1", b"y")
> repo.commit(text=b"y", user=b"test", date=(0, 0))
> repo.ui.status(b"POST: len(repo): %d\n" % len(repo))
> finally:
> lock.release()
> wlock.release()
> printfiles(repo, 6)
> printfiles(repo, 7)
> __EOF__
$ "$PYTHON" $TESTTMP/committwice.py
PRE: len(repo): 6
POST: len(repo): 8
revision 6 files: ['bugfix', 'file1']
revision 7 files: ['file1']
Do a size-preserving modification outside of that process
$ echo abcd > bugfix
$ hg status
M bugfix
$ hg log --template "{rev} {desc} {files}\n" -r5:
5 fix 2 bugfix file1
6 x bugfix file1
7 y file1
$ cd ..