Mercurial > hg
comparison contrib/python-zstandard/README.rst @ 42070:675775c33ab6
zstandard: vendor python-zstandard 0.11
The upstream source distribution from PyPI was extracted. Unwanted
files were removed.
The clang-format ignore list was updated to reflect the new source
of files.
The project contains a vendored copy of zstandard 1.3.8. The old
version was 1.3.6. This should result in some minor performance wins.
test-check-py3-compat.t was updated to reflect now-passing tests on
Python 3.8.
Some HTTP tests were updated to reflect new zstd compression output.
# no-check-commit because 3rd party code has different style guidelines
Differential Revision: https://phab.mercurial-scm.org/D6199
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Thu, 04 Apr 2019 17:34:43 -0700 |
parents | 73fef626dae3 |
children | 69de49c4e39c |
comparison
equal
deleted
inserted
replaced
42069:668eff08387f | 42070:675775c33ab6 |
---|---|
18 | |ci-status| |win-ci-status| | 18 | |ci-status| |win-ci-status| |
19 | 19 |
20 Requirements | 20 Requirements |
21 ============ | 21 ============ |
22 | 22 |
23 This extension is designed to run with Python 2.7, 3.4, 3.5, and 3.6 | 23 This extension is designed to run with Python 2.7, 3.4, 3.5, 3.6, and 3.7 |
24 on common platforms (Linux, Windows, and OS X). x86 and x86_64 are well-tested | 24 on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. |
25 on Windows. Only x86_64 is well-tested on Linux and macOS. | 25 x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS. |
26 | 26 |
27 Installing | 27 Installing |
28 ========== | 28 ========== |
29 | 29 |
30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. | 30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. |
213 if not chunk: | 213 if not chunk: |
214 break | 214 break |
215 | 215 |
216 # Do something with compressed chunk. | 216 # Do something with compressed chunk. |
217 | 217 |
218 When the context manager exists or ``close()`` is called, the stream is closed, | 218 When the context manager exits or ``close()`` is called, the stream is closed, |
219 underlying resources are released, and future operations against the compression | 219 underlying resources are released, and future operations against the compression |
220 stream will fail. | 220 stream will fail. |
221 | 221 |
222 The ``source`` argument to ``stream_reader()`` can be any object with a | 222 The ``source`` argument to ``stream_reader()`` can be any object with a |
223 ``read(size)`` method or any object implementing the *buffer protocol*. | 223 ``read(size)`` method or any object implementing the *buffer protocol*. |
249 emitted so far. | 249 emitted so far. |
250 | 250 |
251 Streaming Input API | 251 Streaming Input API |
252 ^^^^^^^^^^^^^^^^^^^ | 252 ^^^^^^^^^^^^^^^^^^^ |
253 | 253 |
254 ``stream_writer(fh)`` (which behaves as a context manager) allows you to *stream* | 254 ``stream_writer(fh)`` allows you to *stream* data into a compressor. |
255 data into a compressor.:: | 255 |
256 Returned instances implement the ``io.RawIOBase`` interface. Only methods | |
257 that involve writing will do useful things. | |
258 | |
259 The argument to ``stream_writer()`` must have a ``write(data)`` method. As | |
260 compressed data is available, ``write()`` will be called with the compressed | |
261 data as its argument. Many common Python types implement ``write()``, including | |
262 open file handles and ``io.BytesIO``. | |
263 | |
264 The ``write(data)`` method is used to feed data into the compressor. | |
265 | |
266 The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever | |
267 data remains within the compressor's internal state into the output object. This | |
268 may result in 0 or more ``write()`` calls to the output object. This method | |
269 accepts an optional ``flush_mode`` argument to control the flushing behavior. | |
270 Its value can be any of the ``FLUSH_*`` constants. | |
271 | |
272 Both ``write()`` and ``flush()`` return the number of bytes written to the | |
273 object's ``write()``. In many cases, small inputs do not accumulate enough | |
274 data to cause a write and ``write()`` will return ``0``. | |
275 | |
276 Calling ``close()`` will mark the stream as closed and subsequent I/O | |
277 operations will raise ``ValueError`` (per the documented behavior of | |
278 ``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying | |
279 stream if such a method exists. | |
280 | |
281 Typically usage is as follows:: | |
282 | |
283 cctx = zstd.ZstdCompressor(level=10) | |
284 compressor = cctx.stream_writer(fh) | |
285 | |
286 compressor.write(b'chunk 0\n') | |
287 compressor.write(b'chunk 1\n') | |
288 compressor.flush() | |
289 # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point. | |
290 # Receiver is also expecting more data in the zstd *frame*. | |
291 | |
292 compressor.write(b'chunk 2\n') | |
293 compressor.flush(zstd.FLUSH_FRAME) | |
294 # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``. | |
295 # Receiver is expecting no more data, as the zstd frame is closed. | |
296 # Any future calls to ``write()`` at this point will construct a new | |
297 # zstd frame. | |
298 | |
299 Instances can be used as context managers. Exiting the context manager is | |
300 the equivalent of calling ``close()``, which is equivalent to calling | |
301 ``flush(zstd.FLUSH_FRAME)``:: | |
256 | 302 |
257 cctx = zstd.ZstdCompressor(level=10) | 303 cctx = zstd.ZstdCompressor(level=10) |
258 with cctx.stream_writer(fh) as compressor: | 304 with cctx.stream_writer(fh) as compressor: |
259 compressor.write(b'chunk 0') | 305 compressor.write(b'chunk 0') |
260 compressor.write(b'chunk 1') | 306 compressor.write(b'chunk 1') |
261 ... | 307 ... |
262 | 308 |
263 The argument to ``stream_writer()`` must have a ``write(data)`` method. As | 309 .. important:: |
264 compressed data is available, ``write()`` will be called with the compressed | 310 |
265 data as its argument. Many common Python types implement ``write()``, including | 311 If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute |
266 open file handles and ``io.BytesIO``. | 312 a full zstd *frame* and consumers of this data may complain about malformed |
267 | 313 input. It is recommended to use instances as a context manager to ensure |
268 ``stream_writer()`` returns an object representing a streaming compressor | 314 *frames* are properly finished. |
269 instance. It **must** be used as a context manager. That object's | |
270 ``write(data)`` method is used to feed data into the compressor. | |
271 | |
272 A ``flush()`` method can be called to evict whatever data remains within the | |
273 compressor's internal state into the output object. This may result in 0 or | |
274 more ``write()`` calls to the output object. | |
275 | |
276 Both ``write()`` and ``flush()`` return the number of bytes written to the | |
277 object's ``write()``. In many cases, small inputs do not accumulate enough | |
278 data to cause a write and ``write()`` will return ``0``. | |
279 | 315 |
280 If the size of the data being fed to this streaming compressor is known, | 316 If the size of the data being fed to this streaming compressor is known, |
281 you can declare it before compression begins:: | 317 you can declare it before compression begins:: |
282 | 318 |
283 cctx = zstd.ZstdCompressor() | 319 cctx = zstd.ZstdCompressor() |
307 | 343 |
308 cctx = zstd.ZstdCompressor() | 344 cctx = zstd.ZstdCompressor() |
309 with cctx.stream_writer(fh) as compressor: | 345 with cctx.stream_writer(fh) as compressor: |
310 ... | 346 ... |
311 total_written = compressor.tell() | 347 total_written = compressor.tell() |
348 | |
349 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control | |
350 the return value of ``write()``. When ``False`` (the default), ``write()`` returns | |
351 the number of bytes that were ``write()``en to the underlying object. When | |
352 ``True``, ``write()`` returns the number of bytes read from the input that | |
353 were subsequently written to the compressor. ``True`` is the *proper* behavior | |
354 for ``write()`` as specified by the ``io.RawIOBase`` interface and will become | |
355 the default value in a future release. | |
312 | 356 |
313 Streaming Output API | 357 Streaming Output API |
314 ^^^^^^^^^^^^^^^^^^^^ | 358 ^^^^^^^^^^^^^^^^^^^^ |
315 | 359 |
316 ``read_to_iter(reader)`` provides a mechanism to stream data out of a | 360 ``read_to_iter(reader)`` provides a mechanism to stream data out of a |
652 will raise ``ValueError`` if attempted. | 696 will raise ``ValueError`` if attempted. |
653 | 697 |
654 ``tell()`` returns the number of decompressed bytes read so far. | 698 ``tell()`` returns the number of decompressed bytes read so far. |
655 | 699 |
656 Not all I/O methods are implemented. Notably missing is support for | 700 Not all I/O methods are implemented. Notably missing is support for |
657 ``readline()``, ``readlines()``, and linewise iteration support. Support for | 701 ``readline()``, ``readlines()``, and linewise iteration support. This is |
658 these is planned for a future release. | 702 because streams operate on binary data - not text data. If you want to |
703 convert decompressed output to text, you can chain an ``io.TextIOWrapper`` | |
704 to the stream:: | |
705 | |
706 with open(path, 'rb') as fh: | |
707 dctx = zstd.ZstdDecompressor() | |
708 stream_reader = dctx.stream_reader(fh) | |
709 text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') | |
710 | |
711 for line in text_stream: | |
712 ... | |
713 | |
714 The ``read_across_frames`` argument to ``stream_reader()`` controls the | |
715 behavior of read operations when the end of a zstd *frame* is encountered. | |
716 When ``False`` (the default), a read will complete when the end of a | |
717 zstd *frame* is encountered. When ``True``, a read can potentially | |
718 return data spanning multiple zstd *frames*. | |
659 | 719 |
660 Streaming Input API | 720 Streaming Input API |
661 ^^^^^^^^^^^^^^^^^^^ | 721 ^^^^^^^^^^^^^^^^^^^ |
662 | 722 |
663 ``stream_writer(fh)`` can be used to incrementally send compressed data to a | 723 ``stream_writer(fh)`` allows you to *stream* data into a decompressor. |
664 decompressor.:: | 724 |
725 Returned instances implement the ``io.RawIOBase`` interface. Only methods | |
726 that involve writing will do useful things. | |
727 | |
728 The argument to ``stream_writer()`` is typically an object that also implements | |
729 ``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many | |
730 common Python types conform to this interface, including open file handles | |
731 and ``io.BytesIO``. | |
732 | |
733 Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data | |
734 is sent to the decompressor by calling ``write(data)`` and decompressed | |
735 output is written to the underlying stream by calling its ``write(data)`` | |
736 method.:: | |
665 | 737 |
666 dctx = zstd.ZstdDecompressor() | 738 dctx = zstd.ZstdDecompressor() |
667 with dctx.stream_writer(fh) as decompressor: | 739 decompressor = dctx.stream_writer(fh) |
668 decompressor.write(compressed_data) | 740 |
669 | 741 decompressor.write(compressed_data) |
670 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to | 742 ... |
671 the decompressor by calling ``write(data)`` and decompressed output is written | 743 |
672 to the output object by calling its ``write(data)`` method. | |
673 | 744 |
674 Calls to ``write()`` will return the number of bytes written to the output | 745 Calls to ``write()`` will return the number of bytes written to the output |
675 object. Not all inputs will result in bytes being written, so return values | 746 object. Not all inputs will result in bytes being written, so return values |
676 of ``0`` are possible. | 747 of ``0`` are possible. |
677 | 748 |
749 Like the ``stream_writer()`` compressor, instances can be used as context | |
750 managers. However, context managers add no extra special behavior and offer | |
751 little to no benefit to being used. | |
752 | |
753 Calling ``close()`` will mark the stream as closed and subsequent I/O operations | |
754 will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``). | |
755 ``close()`` will also call ``close()`` on the underlying stream if such a | |
756 method exists. | |
757 | |
678 The size of chunks being ``write()`` to the destination can be specified:: | 758 The size of chunks being ``write()`` to the destination can be specified:: |
679 | 759 |
680 dctx = zstd.ZstdDecompressor() | 760 dctx = zstd.ZstdDecompressor() |
681 with dctx.stream_writer(fh, write_size=16384) as decompressor: | 761 with dctx.stream_writer(fh, write_size=16384) as decompressor: |
682 pass | 762 pass |
684 You can see how much memory is being used by the decompressor:: | 764 You can see how much memory is being used by the decompressor:: |
685 | 765 |
686 dctx = zstd.ZstdDecompressor() | 766 dctx = zstd.ZstdDecompressor() |
687 with dctx.stream_writer(fh) as decompressor: | 767 with dctx.stream_writer(fh) as decompressor: |
688 byte_size = decompressor.memory_size() | 768 byte_size = decompressor.memory_size() |
769 | |
770 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control | |
771 the return value of ``write()``. When ``False`` (the default)``, ``write()`` | |
772 returns the number of bytes that were ``write()``en to the underlying stream. | |
773 When ``True``, ``write()`` returns the number of bytes read from the input. | |
774 ``True`` is the *proper* behavior for ``write()`` as specified by the | |
775 ``io.RawIOBase`` interface and will become the default in a future release. | |
689 | 776 |
690 Streaming Output API | 777 Streaming Output API |
691 ^^^^^^^^^^^^^^^^^^^^ | 778 ^^^^^^^^^^^^^^^^^^^^ |
692 | 779 |
693 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a | 780 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a |
788 .. note:: | 875 .. note:: |
789 | 876 |
790 Because calls to ``decompress()`` may need to perform multiple | 877 Because calls to ``decompress()`` may need to perform multiple |
791 memory (re)allocations, this streaming decompression API isn't as | 878 memory (re)allocations, this streaming decompression API isn't as |
792 efficient as other APIs. | 879 efficient as other APIs. |
880 | |
881 For compatibility with the standard library APIs, instances expose a | |
882 ``flush([length=None])`` method. This method no-ops and has no meaningful | |
883 side-effects, making it safe to call any time. | |
793 | 884 |
794 Batch Decompression API | 885 Batch Decompression API |
795 ^^^^^^^^^^^^^^^^^^^^^^^ | 886 ^^^^^^^^^^^^^^^^^^^^^^^ |
796 | 887 |
797 (Experimental. Not yet supported in CFFI bindings.) | 888 (Experimental. Not yet supported in CFFI bindings.) |
1145 * hash_log | 1236 * hash_log |
1146 * chain_log | 1237 * chain_log |
1147 * search_log | 1238 * search_log |
1148 * min_match | 1239 * min_match |
1149 * target_length | 1240 * target_length |
1150 * compression_strategy | 1241 * strategy |
1242 * compression_strategy (deprecated: same as ``strategy``) | |
1151 * write_content_size | 1243 * write_content_size |
1152 * write_checksum | 1244 * write_checksum |
1153 * write_dict_id | 1245 * write_dict_id |
1154 * job_size | 1246 * job_size |
1155 * overlap_size_log | 1247 * overlap_log |
1248 * overlap_size_log (deprecated: same as ``overlap_log``) | |
1156 * force_max_window | 1249 * force_max_window |
1157 * enable_ldm | 1250 * enable_ldm |
1158 * ldm_hash_log | 1251 * ldm_hash_log |
1159 * ldm_min_match | 1252 * ldm_min_match |
1160 * ldm_bucket_size_log | 1253 * ldm_bucket_size_log |
1161 * ldm_hash_every_log | 1254 * ldm_hash_rate_log |
1255 * ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``) | |
1162 * threads | 1256 * threads |
1163 | 1257 |
1164 Some of these are very low-level settings. It may help to consult the official | 1258 Some of these are very low-level settings. It may help to consult the official |
1165 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants | 1259 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants |
1166 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). | 1260 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). |
1237 | 1331 |
1238 FRAME_HEADER | 1332 FRAME_HEADER |
1239 bytes containing header of the Zstandard frame | 1333 bytes containing header of the Zstandard frame |
1240 MAGIC_NUMBER | 1334 MAGIC_NUMBER |
1241 Frame header as an integer | 1335 Frame header as an integer |
1336 | |
1337 FLUSH_BLOCK | |
1338 Flushing behavior that denotes to flush a zstd block. A decompressor will | |
1339 be able to decode all data fed into the compressor so far. | |
1340 FLUSH_FRAME | |
1341 Flushing behavior that denotes to end a zstd frame. Any new data fed | |
1342 to the compressor will start a new frame. | |
1242 | 1343 |
1243 CONTENTSIZE_UNKNOWN | 1344 CONTENTSIZE_UNKNOWN |
1244 Value for content size when the content size is unknown. | 1345 Value for content size when the content size is unknown. |
1245 CONTENTSIZE_ERROR | 1346 CONTENTSIZE_ERROR |
1246 Value for content size when content size couldn't be determined. | 1347 Value for content size when content size couldn't be determined. |
1259 Maximum value for compression parameter | 1360 Maximum value for compression parameter |
1260 SEARCHLOG_MIN | 1361 SEARCHLOG_MIN |
1261 Minimum value for compression parameter | 1362 Minimum value for compression parameter |
1262 SEARCHLOG_MAX | 1363 SEARCHLOG_MAX |
1263 Maximum value for compression parameter | 1364 Maximum value for compression parameter |
1365 MINMATCH_MIN | |
1366 Minimum value for compression parameter | |
1367 MINMATCH_MAX | |
1368 Maximum value for compression parameter | |
1264 SEARCHLENGTH_MIN | 1369 SEARCHLENGTH_MIN |
1265 Minimum value for compression parameter | 1370 Minimum value for compression parameter |
1371 | |
1372 Deprecated: use ``MINMATCH_MIN`` | |
1266 SEARCHLENGTH_MAX | 1373 SEARCHLENGTH_MAX |
1267 Maximum value for compression parameter | 1374 Maximum value for compression parameter |
1375 | |
1376 Deprecated: use ``MINMATCH_MAX`` | |
1268 TARGETLENGTH_MIN | 1377 TARGETLENGTH_MIN |
1269 Minimum value for compression parameter | 1378 Minimum value for compression parameter |
1270 STRATEGY_FAST | 1379 STRATEGY_FAST |
1271 Compression strategy | 1380 Compression strategy |
1272 STRATEGY_DFAST | 1381 STRATEGY_DFAST |
1281 Compression strategy | 1390 Compression strategy |
1282 STRATEGY_BTOPT | 1391 STRATEGY_BTOPT |
1283 Compression strategy | 1392 Compression strategy |
1284 STRATEGY_BTULTRA | 1393 STRATEGY_BTULTRA |
1285 Compression strategy | 1394 Compression strategy |
1395 STRATEGY_BTULTRA2 | |
1396 Compression strategy | |
1286 | 1397 |
1287 FORMAT_ZSTD1 | 1398 FORMAT_ZSTD1 |
1288 Zstandard frame format | 1399 Zstandard frame format |
1289 FORMAT_ZSTD1_MAGICLESS | 1400 FORMAT_ZSTD1_MAGICLESS |
1290 Zstandard frame format without magic header | 1401 Zstandard frame format without magic header |