comparison contrib/python-zstandard/README.rst @ 40121:73fef626dae3

zstandard: vendor python-zstandard 0.10.1 This was just released. The upstream source distribution from PyPI was extracted. Unwanted files were removed. The clang-format ignore list was updated to reflect the new source of files. setup.py was updated to pass a new argument to python-zstandard's function for returning an Extension instance. Upstream had to change to use relative paths because Python 3.7's packaging doesn't seem to like absolute paths when defining sources, includes, etc. The default relative path calculation is relative to setup_zstd.py which is different from the directory of Mercurial's setup.py. The project contains a vendored copy of zstandard 1.3.6. The old version was 1.3.4. The API should be backwards compatible and nothing in core should need adjusted. However, there is a new "chunker" API that we may find useful in places where we want to emit compressed chunks of a fixed size. There are a pair of bug fixes in 0.10.0 with regards to compressobj() and decompressobj() when block flushing is used. I actually found these bugs when introducing these APIs in Mercurial! But existing Mercurial code is not affected because we don't perform block flushing. # no-check-commit because 3rd party code has different style guidelines Differential Revision: https://phab.mercurial-scm.org/D4911
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 08 Oct 2018 16:27:40 -0700
parents b1fb341d8a61
children 675775c33ab6
comparison
equal deleted inserted replaced
40120:89742f1fa6cb 40121:73fef626dae3
194 ``stream_reader(source)`` can be used to obtain an object conforming to the 194 ``stream_reader(source)`` can be used to obtain an object conforming to the
195 ``io.RawIOBase`` interface for reading compressed output as a stream:: 195 ``io.RawIOBase`` interface for reading compressed output as a stream::
196 196
197 with open(path, 'rb') as fh: 197 with open(path, 'rb') as fh:
198 cctx = zstd.ZstdCompressor() 198 cctx = zstd.ZstdCompressor()
199 reader = cctx.stream_reader(fh)
200 while True:
201 chunk = reader.read(16384)
202 if not chunk:
203 break
204
205 # Do something with compressed chunk.
206
207 Instances can also be used as context managers::
208
209 with open(path, 'rb') as fh:
199 with cctx.stream_reader(fh) as reader: 210 with cctx.stream_reader(fh) as reader:
200 while True: 211 while True:
201 chunk = reader.read(16384) 212 chunk = reader.read(16384)
202 if not chunk: 213 if not chunk:
203 break 214 break
204 215
205 # Do something with compressed chunk. 216 # Do something with compressed chunk.
206 217
207 The stream can only be read within a context manager. When the context 218 When the context manager exists or ``close()`` is called, the stream is closed,
208 manager exits, the stream is closed and the underlying resource is 219 underlying resources are released, and future operations against the compression
209 released and future operations against the compression stream stream will fail. 220 stream will fail.
210 221
211 The ``source`` argument to ``stream_reader()`` can be any object with a 222 The ``source`` argument to ``stream_reader()`` can be any object with a
212 ``read(size)`` method or any object implementing the *buffer protocol*. 223 ``read(size)`` method or any object implementing the *buffer protocol*.
213 224
214 ``stream_reader()`` accepts a ``size`` argument specifying how large the input 225 ``stream_reader()`` accepts a ``size`` argument specifying how large the input
416 427
417 cctx = zstd.ZstdCompressor() 428 cctx = zstd.ZstdCompressor()
418 cobj = cctx.compressobj(size=6) 429 cobj = cctx.compressobj(size=6)
419 data = cobj.compress(b'foobar') 430 data = cobj.compress(b'foobar')
420 data = cobj.flush() 431 data = cobj.flush()
432
433 Chunker API
434 ^^^^^^^^^^^
435
436 ``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns
437 an object that can be used to iteratively feed chunks of data into a compressor
438 and produce output chunks of a uniform size.
439
440 The object returned by ``chunker()`` exposes the following methods:
441
442 ``compress(data)``
443 Feeds new input data into the compressor.
444
445 ``flush()``
446 Flushes all data currently in the compressor.
447
448 ``finish()``
449 Signals the end of input data. No new data can be compressed after this
450 method is called.
451
452 ``compress()``, ``flush()``, and ``finish()`` all return an iterator of
453 ``bytes`` instances holding compressed data. The iterator may be empty. Callers
454 MUST iterate through all elements of the returned iterator before performing
455 another operation on the object.
456
457 All chunks emitted by ``compress()`` will have a length of ``chunk_size``.
458
459 ``flush()`` and ``finish()`` may return a final chunk smaller than
460 ``chunk_size``.
461
462 Here is how the API should be used::
463
464 cctx = zstd.ZstdCompressor()
465 chunker = cctx.chunker(chunk_size=32768)
466
467 with open(path, 'rb') as fh:
468 while True:
469 in_chunk = fh.read(32768)
470 if not in_chunk:
471 break
472
473 for out_chunk in chunker.compress(in_chunk):
474 # Do something with output chunk of size 32768.
475
476 for out_chunk in chunker.finish():
477 # Do something with output chunks that finalize the zstd frame.
478
479 The ``chunker()`` API is often a better alternative to ``compressobj()``.
480
481 ``compressobj()`` will emit output data as it is available. This results in a
482 *stream* of output chunks of varying sizes. The consistency of the output chunk
483 size with ``chunker()`` is more appropriate for many usages, such as sending
484 compressed data to a socket.
485
486 ``compressobj()`` may also perform extra memory reallocations in order to
487 dynamically adjust the sizes of the output chunks. Since ``chunker()`` output
488 chunks are all the same size (except for flushed or final chunks), there is
489 less memory allocation overhead.
421 490
422 Batch Compression API 491 Batch Compression API
423 ^^^^^^^^^^^^^^^^^^^^^ 492 ^^^^^^^^^^^^^^^^^^^^^
424 493
425 (Experimental. Not yet supported in CFFI bindings.) 494 (Experimental. Not yet supported in CFFI bindings.)
540 ``stream_reader(source)`` can be used to obtain an object conforming to the 609 ``stream_reader(source)`` can be used to obtain an object conforming to the
541 ``io.RawIOBase`` interface for reading decompressed output as a stream:: 610 ``io.RawIOBase`` interface for reading decompressed output as a stream::
542 611
543 with open(path, 'rb') as fh: 612 with open(path, 'rb') as fh:
544 dctx = zstd.ZstdDecompressor() 613 dctx = zstd.ZstdDecompressor()
614 reader = dctx.stream_reader(fh)
615 while True:
616 chunk = reader.read(16384)
617 if not chunk:
618 break
619
620 # Do something with decompressed chunk.
621
622 The stream can also be used as a context manager::
623
624 with open(path, 'rb') as fh:
625 dctx = zstd.ZstdDecompressor()
545 with dctx.stream_reader(fh) as reader: 626 with dctx.stream_reader(fh) as reader:
546 while True: 627 ...
547 chunk = reader.read(16384) 628
548 if not chunk: 629 When used as a context manager, the stream is closed and the underlying
549 break 630 resources are released when the context manager exits. Future operations against
550 631 the stream will fail.
551 # Do something with decompressed chunk.
552
553 The stream can only be read within a context manager. When the context
554 manager exits, the stream is closed and the underlying resource is
555 released and future operations against the stream will fail.
556 632
557 The ``source`` argument to ``stream_reader()`` can be any object with a 633 The ``source`` argument to ``stream_reader()`` can be any object with a
558 ``read(size)`` method or any object implementing the *buffer protocol*. 634 ``read(size)`` method or any object implementing the *buffer protocol*.
559 635
560 If the ``source`` is a stream, you can specify how large ``read()`` requests 636 If the ``source`` is a stream, you can specify how large ``read()`` requests
1075 * write_content_size 1151 * write_content_size
1076 * write_checksum 1152 * write_checksum
1077 * write_dict_id 1153 * write_dict_id
1078 * job_size 1154 * job_size
1079 * overlap_size_log 1155 * overlap_size_log
1080 * compress_literals
1081 * force_max_window 1156 * force_max_window
1082 * enable_ldm 1157 * enable_ldm
1083 * ldm_hash_log 1158 * ldm_hash_log
1084 * ldm_min_match 1159 * ldm_min_match
1085 * ldm_bucket_size_log 1160 * ldm_bucket_size_log