Mercurial > hg
view tests/svn/svndump-branches.sh @ 37711:65a23cc8e75b
cborutil: implement support for streaming encoding, bytestring decoding
The vendored cbor2 package is... a bit disappointing.
On the encoding side, it insists that you pass it something with
a write() to send data to. That means if you want to emit data to
a generator, you have to construct an e.g. io.BytesIO(), write()
to it, then get the data back out. There can be non-trivial overhead
involved.
The encoder also doesn't support indefinite types - bytestrings, arrays,
and maps that don't have a known length. Again, this is really
unfortunate because it requires you to buffer the entire source and
destination in memory to encode large things.
On the decoding side, it supports reading indefinite length types.
But it buffers them completely before returning. More sadness.
This commit implements "streaming" encoders for various CBOR types.
Encoding emits a generator of hunks. So you can efficiently stream
encoded data elsewhere.
It also implements support for emitting indefinite length bytestrings,
arrays, and maps.
On the decoding side, we only implement support for decoding an
indefinite length bytestring from a file object. It will emit a
generator of raw chunks from the source.
I didn't want to reinvent so many wheels. But profiling the wire
protocol revealed that the overhead of constructing io.BytesIO()
instances to temporarily hold results has a non-trivial overhead.
We're talking >15% of execution time for operations like
"transfer the fulltexts of all files in a revision." So I can
justify this effort.
Fortunately, CBOR is a relatively straightforward format. And we have
a reference implementation in the repo we can test against.
Differential Revision: https://phab.mercurial-scm.org/D3303
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sat, 14 Apr 2018 16:36:15 -0700 |
parents | 6798536454e6 |
children |
line wrap: on
line source
#!/bin/sh # # Use this script to generate branches.svndump # mkdir temp cd temp mkdir project-orig cd project-orig mkdir trunk mkdir branches cd .. svnadmin create svn-repo svnurl=file://`pwd`/svn-repo svn import project-orig $svnurl -m "init projA" svn co $svnurl project cd project echo a > trunk/a echo b > trunk/b echo c > trunk/c mkdir trunk/dir echo e > trunk/dir/e # Add a file within branches, used to confuse branch detection echo d > branches/notinbranch svn add trunk/a trunk/b trunk/c trunk/dir branches/notinbranch svn ci -m hello svn up # Branch to old svn copy trunk branches/old svn rm branches/old/c svn rm branches/old/dir svn ci -m "branch trunk, remove c and dir" svn up # Update trunk echo a >> trunk/a svn ci -m "change a" # Update old branch echo b >> branches/old/b svn ci -m "change b" # Create a cross-branch revision svn move trunk/b branches/old/c echo c >> branches/old/c svn ci -m "move and update c" # Update old branch again echo b >> branches/old/b svn ci -m "change b again" # Move back and forth between branch of similar names # This used to generate fake copy records svn up svn move branches/old branches/old2 svn ci -m "move to old2" svn move branches/old2 branches/old svn ci -m "move back to old" # Update trunk again echo a > trunk/a svn ci -m "last change to a" # Branch again from a converted revision svn copy -r 1 $svnurl/trunk branches/old3 svn ci -m "branch trunk@1 into old3" cd .. svnadmin dump svn-repo > ../branches.svndump