Mercurial > hg
comparison contrib/python-zstandard/README.rst @ 40121:73fef626dae3
zstandard: vendor python-zstandard 0.10.1
This was just released.
The upstream source distribution from PyPI was extracted. Unwanted
files were removed.
The clang-format ignore list was updated to reflect the new source
of files.
setup.py was updated to pass a new argument to python-zstandard's
function for returning an Extension instance. Upstream had to change
to use relative paths because Python 3.7's packaging doesn't
seem to like absolute paths when defining sources, includes, etc.
The default relative path calculation is relative to setup_zstd.py
which is different from the directory of Mercurial's setup.py.
The project contains a vendored copy of zstandard 1.3.6. The old
version was 1.3.4.
The API should be backwards compatible and nothing in core should
need adjusted. However, there is a new "chunker" API that we
may find useful in places where we want to emit compressed chunks
of a fixed size.
There are a pair of bug fixes in 0.10.0 with regards to
compressobj() and decompressobj() when block flushing is used. I
actually found these bugs when introducing these APIs in Mercurial!
But existing Mercurial code is not affected because we don't
perform block flushing.
# no-check-commit because 3rd party code has different style guidelines
Differential Revision: https://phab.mercurial-scm.org/D4911
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Mon, 08 Oct 2018 16:27:40 -0700 |
parents | b1fb341d8a61 |
children | 675775c33ab6 |
comparison
equal
deleted
inserted
replaced
40120:89742f1fa6cb | 40121:73fef626dae3 |
---|---|
194 ``stream_reader(source)`` can be used to obtain an object conforming to the | 194 ``stream_reader(source)`` can be used to obtain an object conforming to the |
195 ``io.RawIOBase`` interface for reading compressed output as a stream:: | 195 ``io.RawIOBase`` interface for reading compressed output as a stream:: |
196 | 196 |
197 with open(path, 'rb') as fh: | 197 with open(path, 'rb') as fh: |
198 cctx = zstd.ZstdCompressor() | 198 cctx = zstd.ZstdCompressor() |
199 reader = cctx.stream_reader(fh) | |
200 while True: | |
201 chunk = reader.read(16384) | |
202 if not chunk: | |
203 break | |
204 | |
205 # Do something with compressed chunk. | |
206 | |
207 Instances can also be used as context managers:: | |
208 | |
209 with open(path, 'rb') as fh: | |
199 with cctx.stream_reader(fh) as reader: | 210 with cctx.stream_reader(fh) as reader: |
200 while True: | 211 while True: |
201 chunk = reader.read(16384) | 212 chunk = reader.read(16384) |
202 if not chunk: | 213 if not chunk: |
203 break | 214 break |
204 | 215 |
205 # Do something with compressed chunk. | 216 # Do something with compressed chunk. |
206 | 217 |
207 The stream can only be read within a context manager. When the context | 218 When the context manager exists or ``close()`` is called, the stream is closed, |
208 manager exits, the stream is closed and the underlying resource is | 219 underlying resources are released, and future operations against the compression |
209 released and future operations against the compression stream stream will fail. | 220 stream will fail. |
210 | 221 |
211 The ``source`` argument to ``stream_reader()`` can be any object with a | 222 The ``source`` argument to ``stream_reader()`` can be any object with a |
212 ``read(size)`` method or any object implementing the *buffer protocol*. | 223 ``read(size)`` method or any object implementing the *buffer protocol*. |
213 | 224 |
214 ``stream_reader()`` accepts a ``size`` argument specifying how large the input | 225 ``stream_reader()`` accepts a ``size`` argument specifying how large the input |
416 | 427 |
417 cctx = zstd.ZstdCompressor() | 428 cctx = zstd.ZstdCompressor() |
418 cobj = cctx.compressobj(size=6) | 429 cobj = cctx.compressobj(size=6) |
419 data = cobj.compress(b'foobar') | 430 data = cobj.compress(b'foobar') |
420 data = cobj.flush() | 431 data = cobj.flush() |
432 | |
433 Chunker API | |
434 ^^^^^^^^^^^ | |
435 | |
436 ``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns | |
437 an object that can be used to iteratively feed chunks of data into a compressor | |
438 and produce output chunks of a uniform size. | |
439 | |
440 The object returned by ``chunker()`` exposes the following methods: | |
441 | |
442 ``compress(data)`` | |
443 Feeds new input data into the compressor. | |
444 | |
445 ``flush()`` | |
446 Flushes all data currently in the compressor. | |
447 | |
448 ``finish()`` | |
449 Signals the end of input data. No new data can be compressed after this | |
450 method is called. | |
451 | |
452 ``compress()``, ``flush()``, and ``finish()`` all return an iterator of | |
453 ``bytes`` instances holding compressed data. The iterator may be empty. Callers | |
454 MUST iterate through all elements of the returned iterator before performing | |
455 another operation on the object. | |
456 | |
457 All chunks emitted by ``compress()`` will have a length of ``chunk_size``. | |
458 | |
459 ``flush()`` and ``finish()`` may return a final chunk smaller than | |
460 ``chunk_size``. | |
461 | |
462 Here is how the API should be used:: | |
463 | |
464 cctx = zstd.ZstdCompressor() | |
465 chunker = cctx.chunker(chunk_size=32768) | |
466 | |
467 with open(path, 'rb') as fh: | |
468 while True: | |
469 in_chunk = fh.read(32768) | |
470 if not in_chunk: | |
471 break | |
472 | |
473 for out_chunk in chunker.compress(in_chunk): | |
474 # Do something with output chunk of size 32768. | |
475 | |
476 for out_chunk in chunker.finish(): | |
477 # Do something with output chunks that finalize the zstd frame. | |
478 | |
479 The ``chunker()`` API is often a better alternative to ``compressobj()``. | |
480 | |
481 ``compressobj()`` will emit output data as it is available. This results in a | |
482 *stream* of output chunks of varying sizes. The consistency of the output chunk | |
483 size with ``chunker()`` is more appropriate for many usages, such as sending | |
484 compressed data to a socket. | |
485 | |
486 ``compressobj()`` may also perform extra memory reallocations in order to | |
487 dynamically adjust the sizes of the output chunks. Since ``chunker()`` output | |
488 chunks are all the same size (except for flushed or final chunks), there is | |
489 less memory allocation overhead. | |
421 | 490 |
422 Batch Compression API | 491 Batch Compression API |
423 ^^^^^^^^^^^^^^^^^^^^^ | 492 ^^^^^^^^^^^^^^^^^^^^^ |
424 | 493 |
425 (Experimental. Not yet supported in CFFI bindings.) | 494 (Experimental. Not yet supported in CFFI bindings.) |
540 ``stream_reader(source)`` can be used to obtain an object conforming to the | 609 ``stream_reader(source)`` can be used to obtain an object conforming to the |
541 ``io.RawIOBase`` interface for reading decompressed output as a stream:: | 610 ``io.RawIOBase`` interface for reading decompressed output as a stream:: |
542 | 611 |
543 with open(path, 'rb') as fh: | 612 with open(path, 'rb') as fh: |
544 dctx = zstd.ZstdDecompressor() | 613 dctx = zstd.ZstdDecompressor() |
614 reader = dctx.stream_reader(fh) | |
615 while True: | |
616 chunk = reader.read(16384) | |
617 if not chunk: | |
618 break | |
619 | |
620 # Do something with decompressed chunk. | |
621 | |
622 The stream can also be used as a context manager:: | |
623 | |
624 with open(path, 'rb') as fh: | |
625 dctx = zstd.ZstdDecompressor() | |
545 with dctx.stream_reader(fh) as reader: | 626 with dctx.stream_reader(fh) as reader: |
546 while True: | 627 ... |
547 chunk = reader.read(16384) | 628 |
548 if not chunk: | 629 When used as a context manager, the stream is closed and the underlying |
549 break | 630 resources are released when the context manager exits. Future operations against |
550 | 631 the stream will fail. |
551 # Do something with decompressed chunk. | |
552 | |
553 The stream can only be read within a context manager. When the context | |
554 manager exits, the stream is closed and the underlying resource is | |
555 released and future operations against the stream will fail. | |
556 | 632 |
557 The ``source`` argument to ``stream_reader()`` can be any object with a | 633 The ``source`` argument to ``stream_reader()`` can be any object with a |
558 ``read(size)`` method or any object implementing the *buffer protocol*. | 634 ``read(size)`` method or any object implementing the *buffer protocol*. |
559 | 635 |
560 If the ``source`` is a stream, you can specify how large ``read()`` requests | 636 If the ``source`` is a stream, you can specify how large ``read()`` requests |
1075 * write_content_size | 1151 * write_content_size |
1076 * write_checksum | 1152 * write_checksum |
1077 * write_dict_id | 1153 * write_dict_id |
1078 * job_size | 1154 * job_size |
1079 * overlap_size_log | 1155 * overlap_size_log |
1080 * compress_literals | |
1081 * force_max_window | 1156 * force_max_window |
1082 * enable_ldm | 1157 * enable_ldm |
1083 * ldm_hash_log | 1158 * ldm_hash_log |
1084 * ldm_min_match | 1159 * ldm_min_match |
1085 * ldm_bucket_size_log | 1160 * ldm_bucket_size_log |