comparison contrib/python-zstandard/README.rst @ 42070:675775c33ab6

zstandard: vendor python-zstandard 0.11 The upstream source distribution from PyPI was extracted. Unwanted files were removed. The clang-format ignore list was updated to reflect the new source of files. The project contains a vendored copy of zstandard 1.3.8. The old version was 1.3.6. This should result in some minor performance wins. test-check-py3-compat.t was updated to reflect now-passing tests on Python 3.8. Some HTTP tests were updated to reflect new zstd compression output. # no-check-commit because 3rd party code has different style guidelines Differential Revision: https://phab.mercurial-scm.org/D6199
author Gregory Szorc <gregory.szorc@gmail.com>
date Thu, 04 Apr 2019 17:34:43 -0700
parents 73fef626dae3
children 69de49c4e39c
comparison
equal deleted inserted replaced
42069:668eff08387f 42070:675775c33ab6
18 | |ci-status| |win-ci-status| 18 | |ci-status| |win-ci-status|
19 19
20 Requirements 20 Requirements
21 ============ 21 ============
22 22
23 This extension is designed to run with Python 2.7, 3.4, 3.5, and 3.6 23 This extension is designed to run with Python 2.7, 3.4, 3.5, 3.6, and 3.7
24 on common platforms (Linux, Windows, and OS X). x86 and x86_64 are well-tested 24 on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above.
25 on Windows. Only x86_64 is well-tested on Linux and macOS. 25 x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS.
26 26
27 Installing 27 Installing
28 ========== 28 ==========
29 29
30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. 30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard.
213 if not chunk: 213 if not chunk:
214 break 214 break
215 215
216 # Do something with compressed chunk. 216 # Do something with compressed chunk.
217 217
218 When the context manager exists or ``close()`` is called, the stream is closed, 218 When the context manager exits or ``close()`` is called, the stream is closed,
219 underlying resources are released, and future operations against the compression 219 underlying resources are released, and future operations against the compression
220 stream will fail. 220 stream will fail.
221 221
222 The ``source`` argument to ``stream_reader()`` can be any object with a 222 The ``source`` argument to ``stream_reader()`` can be any object with a
223 ``read(size)`` method or any object implementing the *buffer protocol*. 223 ``read(size)`` method or any object implementing the *buffer protocol*.
249 emitted so far. 249 emitted so far.
250 250
251 Streaming Input API 251 Streaming Input API
252 ^^^^^^^^^^^^^^^^^^^ 252 ^^^^^^^^^^^^^^^^^^^
253 253
254 ``stream_writer(fh)`` (which behaves as a context manager) allows you to *stream* 254 ``stream_writer(fh)`` allows you to *stream* data into a compressor.
255 data into a compressor.:: 255
256 Returned instances implement the ``io.RawIOBase`` interface. Only methods
257 that involve writing will do useful things.
258
259 The argument to ``stream_writer()`` must have a ``write(data)`` method. As
260 compressed data is available, ``write()`` will be called with the compressed
261 data as its argument. Many common Python types implement ``write()``, including
262 open file handles and ``io.BytesIO``.
263
264 The ``write(data)`` method is used to feed data into the compressor.
265
266 The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever
267 data remains within the compressor's internal state into the output object. This
268 may result in 0 or more ``write()`` calls to the output object. This method
269 accepts an optional ``flush_mode`` argument to control the flushing behavior.
270 Its value can be any of the ``FLUSH_*`` constants.
271
272 Both ``write()`` and ``flush()`` return the number of bytes written to the
273 object's ``write()``. In many cases, small inputs do not accumulate enough
274 data to cause a write and ``write()`` will return ``0``.
275
276 Calling ``close()`` will mark the stream as closed and subsequent I/O
277 operations will raise ``ValueError`` (per the documented behavior of
278 ``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying
279 stream if such a method exists.
280
281 Typically usage is as follows::
282
283 cctx = zstd.ZstdCompressor(level=10)
284 compressor = cctx.stream_writer(fh)
285
286 compressor.write(b'chunk 0\n')
287 compressor.write(b'chunk 1\n')
288 compressor.flush()
289 # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point.
290 # Receiver is also expecting more data in the zstd *frame*.
291
292 compressor.write(b'chunk 2\n')
293 compressor.flush(zstd.FLUSH_FRAME)
294 # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``.
295 # Receiver is expecting no more data, as the zstd frame is closed.
296 # Any future calls to ``write()`` at this point will construct a new
297 # zstd frame.
298
299 Instances can be used as context managers. Exiting the context manager is
300 the equivalent of calling ``close()``, which is equivalent to calling
301 ``flush(zstd.FLUSH_FRAME)``::
256 302
257 cctx = zstd.ZstdCompressor(level=10) 303 cctx = zstd.ZstdCompressor(level=10)
258 with cctx.stream_writer(fh) as compressor: 304 with cctx.stream_writer(fh) as compressor:
259 compressor.write(b'chunk 0') 305 compressor.write(b'chunk 0')
260 compressor.write(b'chunk 1') 306 compressor.write(b'chunk 1')
261 ... 307 ...
262 308
263 The argument to ``stream_writer()`` must have a ``write(data)`` method. As 309 .. important::
264 compressed data is available, ``write()`` will be called with the compressed 310
265 data as its argument. Many common Python types implement ``write()``, including 311 If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute
266 open file handles and ``io.BytesIO``. 312 a full zstd *frame* and consumers of this data may complain about malformed
267 313 input. It is recommended to use instances as a context manager to ensure
268 ``stream_writer()`` returns an object representing a streaming compressor 314 *frames* are properly finished.
269 instance. It **must** be used as a context manager. That object's
270 ``write(data)`` method is used to feed data into the compressor.
271
272 A ``flush()`` method can be called to evict whatever data remains within the
273 compressor's internal state into the output object. This may result in 0 or
274 more ``write()`` calls to the output object.
275
276 Both ``write()`` and ``flush()`` return the number of bytes written to the
277 object's ``write()``. In many cases, small inputs do not accumulate enough
278 data to cause a write and ``write()`` will return ``0``.
279 315
280 If the size of the data being fed to this streaming compressor is known, 316 If the size of the data being fed to this streaming compressor is known,
281 you can declare it before compression begins:: 317 you can declare it before compression begins::
282 318
283 cctx = zstd.ZstdCompressor() 319 cctx = zstd.ZstdCompressor()
307 343
308 cctx = zstd.ZstdCompressor() 344 cctx = zstd.ZstdCompressor()
309 with cctx.stream_writer(fh) as compressor: 345 with cctx.stream_writer(fh) as compressor:
310 ... 346 ...
311 total_written = compressor.tell() 347 total_written = compressor.tell()
348
349 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
350 the return value of ``write()``. When ``False`` (the default), ``write()`` returns
351 the number of bytes that were ``write()``en to the underlying object. When
352 ``True``, ``write()`` returns the number of bytes read from the input that
353 were subsequently written to the compressor. ``True`` is the *proper* behavior
354 for ``write()`` as specified by the ``io.RawIOBase`` interface and will become
355 the default value in a future release.
312 356
313 Streaming Output API 357 Streaming Output API
314 ^^^^^^^^^^^^^^^^^^^^ 358 ^^^^^^^^^^^^^^^^^^^^
315 359
316 ``read_to_iter(reader)`` provides a mechanism to stream data out of a 360 ``read_to_iter(reader)`` provides a mechanism to stream data out of a
652 will raise ``ValueError`` if attempted. 696 will raise ``ValueError`` if attempted.
653 697
654 ``tell()`` returns the number of decompressed bytes read so far. 698 ``tell()`` returns the number of decompressed bytes read so far.
655 699
656 Not all I/O methods are implemented. Notably missing is support for 700 Not all I/O methods are implemented. Notably missing is support for
657 ``readline()``, ``readlines()``, and linewise iteration support. Support for 701 ``readline()``, ``readlines()``, and linewise iteration support. This is
658 these is planned for a future release. 702 because streams operate on binary data - not text data. If you want to
703 convert decompressed output to text, you can chain an ``io.TextIOWrapper``
704 to the stream::
705
706 with open(path, 'rb') as fh:
707 dctx = zstd.ZstdDecompressor()
708 stream_reader = dctx.stream_reader(fh)
709 text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8')
710
711 for line in text_stream:
712 ...
713
714 The ``read_across_frames`` argument to ``stream_reader()`` controls the
715 behavior of read operations when the end of a zstd *frame* is encountered.
716 When ``False`` (the default), a read will complete when the end of a
717 zstd *frame* is encountered. When ``True``, a read can potentially
718 return data spanning multiple zstd *frames*.
659 719
660 Streaming Input API 720 Streaming Input API
661 ^^^^^^^^^^^^^^^^^^^ 721 ^^^^^^^^^^^^^^^^^^^
662 722
663 ``stream_writer(fh)`` can be used to incrementally send compressed data to a 723 ``stream_writer(fh)`` allows you to *stream* data into a decompressor.
664 decompressor.:: 724
725 Returned instances implement the ``io.RawIOBase`` interface. Only methods
726 that involve writing will do useful things.
727
728 The argument to ``stream_writer()`` is typically an object that also implements
729 ``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many
730 common Python types conform to this interface, including open file handles
731 and ``io.BytesIO``.
732
733 Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data
734 is sent to the decompressor by calling ``write(data)`` and decompressed
735 output is written to the underlying stream by calling its ``write(data)``
736 method.::
665 737
666 dctx = zstd.ZstdDecompressor() 738 dctx = zstd.ZstdDecompressor()
667 with dctx.stream_writer(fh) as decompressor: 739 decompressor = dctx.stream_writer(fh)
668 decompressor.write(compressed_data) 740
669 741 decompressor.write(compressed_data)
670 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to 742 ...
671 the decompressor by calling ``write(data)`` and decompressed output is written 743
672 to the output object by calling its ``write(data)`` method.
673 744
674 Calls to ``write()`` will return the number of bytes written to the output 745 Calls to ``write()`` will return the number of bytes written to the output
675 object. Not all inputs will result in bytes being written, so return values 746 object. Not all inputs will result in bytes being written, so return values
676 of ``0`` are possible. 747 of ``0`` are possible.
677 748
749 Like the ``stream_writer()`` compressor, instances can be used as context
750 managers. However, context managers add no extra special behavior and offer
751 little to no benefit to being used.
752
753 Calling ``close()`` will mark the stream as closed and subsequent I/O operations
754 will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``).
755 ``close()`` will also call ``close()`` on the underlying stream if such a
756 method exists.
757
678 The size of chunks being ``write()`` to the destination can be specified:: 758 The size of chunks being ``write()`` to the destination can be specified::
679 759
680 dctx = zstd.ZstdDecompressor() 760 dctx = zstd.ZstdDecompressor()
681 with dctx.stream_writer(fh, write_size=16384) as decompressor: 761 with dctx.stream_writer(fh, write_size=16384) as decompressor:
682 pass 762 pass
684 You can see how much memory is being used by the decompressor:: 764 You can see how much memory is being used by the decompressor::
685 765
686 dctx = zstd.ZstdDecompressor() 766 dctx = zstd.ZstdDecompressor()
687 with dctx.stream_writer(fh) as decompressor: 767 with dctx.stream_writer(fh) as decompressor:
688 byte_size = decompressor.memory_size() 768 byte_size = decompressor.memory_size()
769
770 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
771 the return value of ``write()``. When ``False`` (the default)``, ``write()``
772 returns the number of bytes that were ``write()``en to the underlying stream.
773 When ``True``, ``write()`` returns the number of bytes read from the input.
774 ``True`` is the *proper* behavior for ``write()`` as specified by the
775 ``io.RawIOBase`` interface and will become the default in a future release.
689 776
690 Streaming Output API 777 Streaming Output API
691 ^^^^^^^^^^^^^^^^^^^^ 778 ^^^^^^^^^^^^^^^^^^^^
692 779
693 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a 780 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a
788 .. note:: 875 .. note::
789 876
790 Because calls to ``decompress()`` may need to perform multiple 877 Because calls to ``decompress()`` may need to perform multiple
791 memory (re)allocations, this streaming decompression API isn't as 878 memory (re)allocations, this streaming decompression API isn't as
792 efficient as other APIs. 879 efficient as other APIs.
880
881 For compatibility with the standard library APIs, instances expose a
882 ``flush([length=None])`` method. This method no-ops and has no meaningful
883 side-effects, making it safe to call any time.
793 884
794 Batch Decompression API 885 Batch Decompression API
795 ^^^^^^^^^^^^^^^^^^^^^^^ 886 ^^^^^^^^^^^^^^^^^^^^^^^
796 887
797 (Experimental. Not yet supported in CFFI bindings.) 888 (Experimental. Not yet supported in CFFI bindings.)
1145 * hash_log 1236 * hash_log
1146 * chain_log 1237 * chain_log
1147 * search_log 1238 * search_log
1148 * min_match 1239 * min_match
1149 * target_length 1240 * target_length
1150 * compression_strategy 1241 * strategy
1242 * compression_strategy (deprecated: same as ``strategy``)
1151 * write_content_size 1243 * write_content_size
1152 * write_checksum 1244 * write_checksum
1153 * write_dict_id 1245 * write_dict_id
1154 * job_size 1246 * job_size
1155 * overlap_size_log 1247 * overlap_log
1248 * overlap_size_log (deprecated: same as ``overlap_log``)
1156 * force_max_window 1249 * force_max_window
1157 * enable_ldm 1250 * enable_ldm
1158 * ldm_hash_log 1251 * ldm_hash_log
1159 * ldm_min_match 1252 * ldm_min_match
1160 * ldm_bucket_size_log 1253 * ldm_bucket_size_log
1161 * ldm_hash_every_log 1254 * ldm_hash_rate_log
1255 * ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``)
1162 * threads 1256 * threads
1163 1257
1164 Some of these are very low-level settings. It may help to consult the official 1258 Some of these are very low-level settings. It may help to consult the official
1165 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants 1259 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants
1166 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). 1260 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h).
1237 1331
1238 FRAME_HEADER 1332 FRAME_HEADER
1239 bytes containing header of the Zstandard frame 1333 bytes containing header of the Zstandard frame
1240 MAGIC_NUMBER 1334 MAGIC_NUMBER
1241 Frame header as an integer 1335 Frame header as an integer
1336
1337 FLUSH_BLOCK
1338 Flushing behavior that denotes to flush a zstd block. A decompressor will
1339 be able to decode all data fed into the compressor so far.
1340 FLUSH_FRAME
1341 Flushing behavior that denotes to end a zstd frame. Any new data fed
1342 to the compressor will start a new frame.
1242 1343
1243 CONTENTSIZE_UNKNOWN 1344 CONTENTSIZE_UNKNOWN
1244 Value for content size when the content size is unknown. 1345 Value for content size when the content size is unknown.
1245 CONTENTSIZE_ERROR 1346 CONTENTSIZE_ERROR
1246 Value for content size when content size couldn't be determined. 1347 Value for content size when content size couldn't be determined.
1259 Maximum value for compression parameter 1360 Maximum value for compression parameter
1260 SEARCHLOG_MIN 1361 SEARCHLOG_MIN
1261 Minimum value for compression parameter 1362 Minimum value for compression parameter
1262 SEARCHLOG_MAX 1363 SEARCHLOG_MAX
1263 Maximum value for compression parameter 1364 Maximum value for compression parameter
1365 MINMATCH_MIN
1366 Minimum value for compression parameter
1367 MINMATCH_MAX
1368 Maximum value for compression parameter
1264 SEARCHLENGTH_MIN 1369 SEARCHLENGTH_MIN
1265 Minimum value for compression parameter 1370 Minimum value for compression parameter
1371
1372 Deprecated: use ``MINMATCH_MIN``
1266 SEARCHLENGTH_MAX 1373 SEARCHLENGTH_MAX
1267 Maximum value for compression parameter 1374 Maximum value for compression parameter
1375
1376 Deprecated: use ``MINMATCH_MAX``
1268 TARGETLENGTH_MIN 1377 TARGETLENGTH_MIN
1269 Minimum value for compression parameter 1378 Minimum value for compression parameter
1270 STRATEGY_FAST 1379 STRATEGY_FAST
1271 Compression strategy 1380 Compression strategy
1272 STRATEGY_DFAST 1381 STRATEGY_DFAST
1281 Compression strategy 1390 Compression strategy
1282 STRATEGY_BTOPT 1391 STRATEGY_BTOPT
1283 Compression strategy 1392 Compression strategy
1284 STRATEGY_BTULTRA 1393 STRATEGY_BTULTRA
1285 Compression strategy 1394 Compression strategy
1395 STRATEGY_BTULTRA2
1396 Compression strategy
1286 1397
1287 FORMAT_ZSTD1 1398 FORMAT_ZSTD1
1288 Zstandard frame format 1399 Zstandard frame format
1289 FORMAT_ZSTD1_MAGICLESS 1400 FORMAT_ZSTD1_MAGICLESS
1290 Zstandard frame format without magic header 1401 Zstandard frame format without magic header