contrib/python-zstandard/README.rst
changeset 30444 b86a448a2965
child 30822 b54a2984cdd4
equal deleted inserted replaced
30443:2e484bdea8c4 30444:b86a448a2965
       
     1 ================
       
     2 python-zstandard
       
     3 ================
       
     4 
       
     5 This project provides a Python C extension for interfacing with the
       
     6 `Zstandard <http://www.zstd.net>`_ compression library.
       
     7 
       
     8 The primary goal of the extension is to provide a Pythonic interface to
       
     9 the underlying C API. This means exposing most of the features and flexibility
       
    10 of the C API while not sacrificing usability or safety that Python provides.
       
    11 
       
    12 |  |ci-status| |win-ci-status|
       
    13 
       
    14 State of Project
       
    15 ================
       
    16 
       
    17 The project is officially in beta state. The author is reasonably satisfied
       
    18 with the current API and that functionality works as advertised. There
       
    19 may be some backwards incompatible changes before 1.0. Though the author
       
    20 does not intend to make any major changes to the Python API.
       
    21 
       
    22 There is continuous integration for Python versions 2.6, 2.7, and 3.3+
       
    23 on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably
       
    24 confident the extension is stable and works as advertised on these
       
    25 platforms.
       
    26 
       
    27 Expected Changes
       
    28 ----------------
       
    29 
       
    30 The author is reasonably confident in the current state of what's
       
    31 implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types.
       
    32 Those APIs likely won't change significantly. Some low-level behavior
       
    33 (such as naming and types expected by arguments) may change.
       
    34 
       
    35 There will likely be arguments added to control the input and output
       
    36 buffer sizes (currently, certain operations read and write in chunk
       
    37 sizes using zstd's preferred defaults).
       
    38 
       
    39 There should be an API that accepts an object that conforms to the buffer
       
    40 interface and returns an iterator over compressed or decompressed output.
       
    41 
       
    42 The author is on the fence as to whether to support the extremely
       
    43 low level compression and decompression APIs. It could be useful to
       
    44 support compression without the framing headers. But the author doesn't
       
    45 believe it a high priority at this time.
       
    46 
       
    47 The CFFI bindings are half-baked and need to be finished.
       
    48 
       
    49 Requirements
       
    50 ============
       
    51 
       
    52 This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5
       
    53 on common platforms (Linux, Windows, and OS X). Only x86_64 is currently
       
    54 well-tested as an architecture.
       
    55 
       
    56 Installing
       
    57 ==========
       
    58 
       
    59 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard.
       
    60 So, to install this package::
       
    61 
       
    62    $ pip install zstandard
       
    63 
       
    64 Binary wheels are made available for some platforms. If you need to
       
    65 install from a source distribution, all you should need is a working C
       
    66 compiler and the Python development headers/libraries. On many Linux
       
    67 distributions, you can install a ``python-dev`` or ``python-devel``
       
    68 package to provide these dependencies.
       
    69 
       
    70 Packages are also uploaded to Anaconda Cloud at
       
    71 https://anaconda.org/indygreg/zstandard. See that URL for how to install
       
    72 this package with ``conda``.
       
    73 
       
    74 Performance
       
    75 ===========
       
    76 
       
    77 Very crude and non-scientific benchmarking (most benchmarks fall in this
       
    78 category because proper benchmarking is hard) show that the Python bindings
       
    79 perform within 10% of the native C implementation.
       
    80 
       
    81 The following table compares the performance of compressing and decompressing
       
    82 a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values
       
    83 obtained with the ``zstd`` program are on the left. The remaining columns detail
       
    84 performance of various compression APIs in the Python bindings.
       
    85 
       
    86 +-------+-----------------+-----------------+-----------------+---------------+
       
    87 | Level | Native          | Simple          | Stream In       | Stream Out    |
       
    88 |       | Comp / Decomp   | Comp / Decomp   | Comp / Decomp   | Comp          |
       
    89 +=======+=================+=================+=================+===============+
       
    90 |   1   | 490 / 1338 MB/s | 458 / 1266 MB/s | 407 / 1156 MB/s |  405 MB/s     |
       
    91 +-------+-----------------+-----------------+-----------------+---------------+
       
    92 |   2   | 412 / 1288 MB/s | 381 / 1203 MB/s | 345 / 1128 MB/s |  349 MB/s     |
       
    93 +-------+-----------------+-----------------+-----------------+---------------+
       
    94 |   3   | 342 / 1312 MB/s | 319 / 1182 MB/s | 285 / 1165 MB/s |  287 MB/s     |
       
    95 +-------+-----------------+-----------------+-----------------+---------------+
       
    96 |  11   |  64 / 1506 MB/s |  66 / 1436 MB/s |  56 / 1342 MB/s |   57 MB/s     |
       
    97 +-------+-----------------+-----------------+-----------------+---------------+
       
    98 
       
    99 Again, these are very unscientific. But it shows that Python is capable of
       
   100 compressing at several hundred MB/s and decompressing at over 1 GB/s.
       
   101 
       
   102 Comparison to Other Python Bindings
       
   103 ===================================
       
   104 
       
   105 https://pypi.python.org/pypi/zstd is an alternative Python binding to
       
   106 Zstandard. At the time this was written, the latest release of that
       
   107 package (1.0.0.2) had the following significant differences from this package:
       
   108 
       
   109 * It only exposes the simple API for compression and decompression operations.
       
   110   This extension exposes the streaming API, dictionary training, and more.
       
   111 * It adds a custom framing header to compressed data and there is no way to
       
   112   disable it. This means that data produced with that module cannot be used by
       
   113   other Zstandard implementations.
       
   114 
       
   115 Bundling of Zstandard Source Code
       
   116 =================================
       
   117 
       
   118 The source repository for this project contains a vendored copy of the
       
   119 Zstandard source code. This is done for a few reasons.
       
   120 
       
   121 First, Zstandard is relatively new and not yet widely available as a system
       
   122 package. Providing a copy of the source code enables the Python C extension
       
   123 to be compiled without requiring the user to obtain the Zstandard source code
       
   124 separately.
       
   125 
       
   126 Second, Zstandard has both a stable *public* API and an *experimental* API.
       
   127 The *experimental* API is actually quite useful (contains functionality for
       
   128 training dictionaries for example), so it is something we wish to expose to
       
   129 Python. However, the *experimental* API is only available via static linking.
       
   130 Furthermore, the *experimental* API can change at any time. So, control over
       
   131 the exact version of the Zstandard library linked against is important to
       
   132 ensure known behavior.
       
   133 
       
   134 Instructions for Building and Testing
       
   135 =====================================
       
   136 
       
   137 Once you have the source code, the extension can be built via setup.py::
       
   138 
       
   139    $ python setup.py build_ext
       
   140 
       
   141 We recommend testing with ``nose``::
       
   142 
       
   143    $ nosetests
       
   144 
       
   145 A Tox configuration is present to test against multiple Python versions::
       
   146 
       
   147    $ tox
       
   148 
       
   149 Tests use the ``hypothesis`` Python package to perform fuzzing. If you
       
   150 don't have it, those tests won't run.
       
   151 
       
   152 There is also an experimental CFFI module. You need the ``cffi`` Python
       
   153 package installed to build and test that.
       
   154 
       
   155 To create a virtualenv with all development dependencies, do something
       
   156 like the following::
       
   157 
       
   158   # Python 2
       
   159   $ virtualenv venv
       
   160 
       
   161   # Python 3
       
   162   $ python3 -m venv venv
       
   163 
       
   164   $ source venv/bin/activate
       
   165   $ pip install cffi hypothesis nose tox
       
   166 
       
   167 API
       
   168 ===
       
   169 
       
   170 The compiled C extension provides a ``zstd`` Python module. This module
       
   171 exposes the following interfaces.
       
   172 
       
   173 ZstdCompressor
       
   174 --------------
       
   175 
       
   176 The ``ZstdCompressor`` class provides an interface for performing
       
   177 compression operations.
       
   178 
       
   179 Each instance is associated with parameters that control compression
       
   180 behavior. These come from the following named arguments (all optional):
       
   181 
       
   182 level
       
   183    Integer compression level. Valid values are between 1 and 22.
       
   184 dict_data
       
   185    Compression dictionary to use.
       
   186 
       
   187    Note: When using dictionary data and ``compress()`` is called multiple
       
   188    times, the ``CompressionParameters`` derived from an integer compression
       
   189    ``level`` and the first compressed data's size will be reused for all
       
   190    subsequent operations. This may not be desirable if source data size
       
   191    varies significantly.
       
   192 compression_params
       
   193    A ``CompressionParameters`` instance (overrides the ``level`` value).
       
   194 write_checksum
       
   195    Whether a 4 byte checksum should be written with the compressed data.
       
   196    Defaults to False. If True, the decompressor can verify that decompressed
       
   197    data matches the original input data.
       
   198 write_content_size
       
   199    Whether the size of the uncompressed data will be written into the
       
   200    header of compressed data. Defaults to False. The data will only be
       
   201    written if the compressor knows the size of the input data. This is
       
   202    likely not true for streaming compression.
       
   203 write_dict_id
       
   204    Whether to write the dictionary ID into the compressed data.
       
   205    Defaults to True. The dictionary ID is only written if a dictionary
       
   206    is being used.
       
   207 
       
   208 Simple API
       
   209 ^^^^^^^^^^
       
   210 
       
   211 ``compress(data)`` compresses and returns data as a one-shot operation.::
       
   212 
       
   213    cctx = zstd.ZsdCompressor()
       
   214    compressed = cctx.compress(b'data to compress')
       
   215 
       
   216 Streaming Input API
       
   217 ^^^^^^^^^^^^^^^^^^^
       
   218 
       
   219 ``write_to(fh)`` (which behaves as a context manager) allows you to *stream*
       
   220 data into a compressor.::
       
   221 
       
   222    cctx = zstd.ZstdCompressor(level=10)
       
   223    with cctx.write_to(fh) as compressor:
       
   224        compressor.write(b'chunk 0')
       
   225        compressor.write(b'chunk 1')
       
   226        ...
       
   227 
       
   228 The argument to ``write_to()`` must have a ``write(data)`` method. As
       
   229 compressed data is available, ``write()`` will be called with the comrpessed
       
   230 data as its argument. Many common Python types implement ``write()``, including
       
   231 open file handles and ``io.BytesIO``.
       
   232 
       
   233 ``write_to()`` returns an object representing a streaming compressor instance.
       
   234 It **must** be used as a context manager. That object's ``write(data)`` method
       
   235 is used to feed data into the compressor.
       
   236 
       
   237 If the size of the data being fed to this streaming compressor is known,
       
   238 you can declare it before compression begins::
       
   239 
       
   240    cctx = zstd.ZstdCompressor()
       
   241    with cctx.write_to(fh, size=data_len) as compressor:
       
   242        compressor.write(chunk0)
       
   243        compressor.write(chunk1)
       
   244        ...
       
   245 
       
   246 Declaring the size of the source data allows compression parameters to
       
   247 be tuned. And if ``write_content_size`` is used, it also results in the
       
   248 content size being written into the frame header of the output data.
       
   249 
       
   250 The size of chunks being ``write()`` to the destination can be specified::
       
   251 
       
   252     cctx = zstd.ZstdCompressor()
       
   253     with cctx.write_to(fh, write_size=32768) as compressor:
       
   254         ...
       
   255 
       
   256 To see how much memory is being used by the streaming compressor::
       
   257 
       
   258     cctx = zstd.ZstdCompressor()
       
   259     with cctx.write_to(fh) as compressor:
       
   260         ...
       
   261         byte_size = compressor.memory_size()
       
   262 
       
   263 Streaming Output API
       
   264 ^^^^^^^^^^^^^^^^^^^^
       
   265 
       
   266 ``read_from(reader)`` provides a mechanism to stream data out of a compressor
       
   267 as an iterator of data chunks.::
       
   268 
       
   269    cctx = zstd.ZstdCompressor()
       
   270    for chunk in cctx.read_from(fh):
       
   271         # Do something with emitted data.
       
   272 
       
   273 ``read_from()`` accepts an object that has a ``read(size)`` method or conforms
       
   274 to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that
       
   275 provide the buffer protocol.)
       
   276 
       
   277 Uncompressed data is fetched from the source either by calling ``read(size)``
       
   278 or by fetching a slice of data from the object directly (in the case where
       
   279 the buffer protocol is being used). The returned iterator consists of chunks
       
   280 of compressed data.
       
   281 
       
   282 Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument
       
   283 declaring the size of the input stream::
       
   284 
       
   285     cctx = zstd.ZstdCompressor()
       
   286     for chunk in cctx.read_from(fh, size=some_int):
       
   287         pass
       
   288 
       
   289 You can also control the size that data is ``read()`` from the source and
       
   290 the ideal size of output chunks::
       
   291 
       
   292     cctx = zstd.ZstdCompressor()
       
   293     for chunk in cctx.read_from(fh, read_size=16384, write_size=8192):
       
   294         pass
       
   295 
       
   296 Stream Copying API
       
   297 ^^^^^^^^^^^^^^^^^^
       
   298 
       
   299 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while
       
   300 compressing it.::
       
   301 
       
   302    cctx = zstd.ZstdCompressor()
       
   303    cctx.copy_stream(ifh, ofh)
       
   304 
       
   305 For example, say you wish to compress a file::
       
   306 
       
   307    cctx = zstd.ZstdCompressor()
       
   308    with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
       
   309        cctx.copy_stream(ifh, ofh)
       
   310 
       
   311 It is also possible to declare the size of the source stream::
       
   312 
       
   313    cctx = zstd.ZstdCompressor()
       
   314    cctx.copy_stream(ifh, ofh, size=len_of_input)
       
   315 
       
   316 You can also specify how large the chunks that are ``read()`` and ``write()``
       
   317 from and to the streams::
       
   318 
       
   319    cctx = zstd.ZstdCompressor()
       
   320    cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384)
       
   321 
       
   322 The stream copier returns a 2-tuple of bytes read and written::
       
   323 
       
   324    cctx = zstd.ZstdCompressor()
       
   325    read_count, write_count = cctx.copy_stream(ifh, ofh)
       
   326 
       
   327 Compressor API
       
   328 ^^^^^^^^^^^^^^
       
   329 
       
   330 ``compressobj()`` returns an object that exposes ``compress(data)`` and
       
   331 ``flush()`` methods. Each returns compressed data or an empty bytes.
       
   332 
       
   333 The purpose of ``compressobj()`` is to provide an API-compatible interface
       
   334 with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to
       
   335 swap in different compressor objects while using the same API.
       
   336 
       
   337 Once ``flush()`` is called, the compressor will no longer accept new data
       
   338 to ``compress()``. ``flush()`` **must** be called to end the compression
       
   339 context. If not called, the returned data may be incomplete.
       
   340 
       
   341 Here is how this API should be used::
       
   342 
       
   343    cctx = zstd.ZstdCompressor()
       
   344    cobj = cctx.compressobj()
       
   345    data = cobj.compress(b'raw input 0')
       
   346    data = cobj.compress(b'raw input 1')
       
   347    data = cobj.flush()
       
   348 
       
   349 For best performance results, keep input chunks under 256KB. This avoids
       
   350 extra allocations for a large output object.
       
   351 
       
   352 It is possible to declare the input size of the data that will be fed into
       
   353 the compressor::
       
   354 
       
   355    cctx = zstd.ZstdCompressor()
       
   356    cobj = cctx.compressobj(size=6)
       
   357    data = cobj.compress(b'foobar')
       
   358    data = cobj.flush()
       
   359 
       
   360 ZstdDecompressor
       
   361 ----------------
       
   362 
       
   363 The ``ZstdDecompressor`` class provides an interface for performing
       
   364 decompression.
       
   365 
       
   366 Each instance is associated with parameters that control decompression. These
       
   367 come from the following named arguments (all optional):
       
   368 
       
   369 dict_data
       
   370    Compression dictionary to use.
       
   371 
       
   372 The interface of this class is very similar to ``ZstdCompressor`` (by design).
       
   373 
       
   374 Simple API
       
   375 ^^^^^^^^^^
       
   376 
       
   377 ``decompress(data)`` can be used to decompress an entire compressed zstd
       
   378 frame in a single operation.::
       
   379 
       
   380     dctx = zstd.ZstdDecompressor()
       
   381     decompressed = dctx.decompress(data)
       
   382 
       
   383 By default, ``decompress(data)`` will only work on data written with the content
       
   384 size encoded in its header. This can be achieved by creating a
       
   385 ``ZstdCompressor`` with ``write_content_size=True``. If compressed data without
       
   386 an embedded content size is seen, ``zstd.ZstdError`` will be raised.
       
   387 
       
   388 If the compressed data doesn't have its content size embedded within it,
       
   389 decompression can be attempted by specifying the ``max_output_size``
       
   390 argument.::
       
   391 
       
   392     dctx = zstd.ZstdDecompressor()
       
   393     uncompressed = dctx.decompress(data, max_output_size=1048576)
       
   394 
       
   395 Ideally, ``max_output_size`` will be identical to the decompressed output
       
   396 size.
       
   397 
       
   398 If ``max_output_size`` is too small to hold the decompressed data,
       
   399 ``zstd.ZstdError`` will be raised.
       
   400 
       
   401 If ``max_output_size`` is larger than the decompressed data, the allocated
       
   402 output buffer will be resized to only use the space required.
       
   403 
       
   404 Please note that an allocation of the requested ``max_output_size`` will be
       
   405 performed every time the method is called. Setting to a very large value could
       
   406 result in a lot of work for the memory allocator and may result in
       
   407 ``MemoryError`` being raised if the allocation fails.
       
   408 
       
   409 If the exact size of decompressed data is unknown, it is **strongly**
       
   410 recommended to use a streaming API.
       
   411 
       
   412 Streaming Input API
       
   413 ^^^^^^^^^^^^^^^^^^^
       
   414 
       
   415 ``write_to(fh)`` can be used to incrementally send compressed data to a
       
   416 decompressor.::
       
   417 
       
   418     dctx = zstd.ZstdDecompressor()
       
   419     with dctx.write_to(fh) as decompressor:
       
   420         decompressor.write(compressed_data)
       
   421 
       
   422 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to
       
   423 the decompressor by calling ``write(data)`` and decompressed output is written
       
   424 to the output object by calling its ``write(data)`` method.
       
   425 
       
   426 The size of chunks being ``write()`` to the destination can be specified::
       
   427 
       
   428     dctx = zstd.ZstdDecompressor()
       
   429     with dctx.write_to(fh, write_size=16384) as decompressor:
       
   430         pass
       
   431 
       
   432 You can see how much memory is being used by the decompressor::
       
   433 
       
   434     dctx = zstd.ZstdDecompressor()
       
   435     with dctx.write_to(fh) as decompressor:
       
   436         byte_size = decompressor.memory_size()
       
   437 
       
   438 Streaming Output API
       
   439 ^^^^^^^^^^^^^^^^^^^^
       
   440 
       
   441 ``read_from(fh)`` provides a mechanism to stream decompressed data out of a
       
   442 compressed source as an iterator of data chunks.:: 
       
   443 
       
   444     dctx = zstd.ZstdDecompressor()
       
   445     for chunk in dctx.read_from(fh):
       
   446         # Do something with original data.
       
   447 
       
   448 ``read_from()`` accepts a) an object with a ``read(size)`` method that will
       
   449 return  compressed bytes b) an object conforming to the buffer protocol that
       
   450 can expose its data as a contiguous range of bytes. The ``bytes`` and
       
   451 ``memoryview`` types expose this buffer protocol.
       
   452 
       
   453 ``read_from()`` returns an iterator whose elements are chunks of the
       
   454 decompressed data.
       
   455 
       
   456 The size of requested ``read()`` from the source can be specified::
       
   457 
       
   458     dctx = zstd.ZstdDecompressor()
       
   459     for chunk in dctx.read_from(fh, read_size=16384):
       
   460         pass
       
   461 
       
   462 It is also possible to skip leading bytes in the input data::
       
   463 
       
   464     dctx = zstd.ZstdDecompressor()
       
   465     for chunk in dctx.read_from(fh, skip_bytes=1):
       
   466         pass
       
   467 
       
   468 Skipping leading bytes is useful if the source data contains extra
       
   469 *header* data but you want to avoid the overhead of making a buffer copy
       
   470 or allocating a new ``memoryview`` object in order to decompress the data.
       
   471 
       
   472 Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator
       
   473 controls when data is decompressed. If the iterator isn't consumed,
       
   474 decompression is put on hold.
       
   475 
       
   476 When ``read_from()`` is passed an object conforming to the buffer protocol,
       
   477 the behavior may seem similar to what occurs when the simple decompression
       
   478 API is used. However, this API works when the decompressed size is unknown.
       
   479 Furthermore, if feeding large inputs, the decompressor will work in chunks
       
   480 instead of performing a single operation.
       
   481 
       
   482 Stream Copying API
       
   483 ^^^^^^^^^^^^^^^^^^
       
   484 
       
   485 ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while
       
   486 performing decompression.::
       
   487 
       
   488     dctx = zstd.ZstdDecompressor()
       
   489     dctx.copy_stream(ifh, ofh)
       
   490 
       
   491 e.g. to decompress a file to another file::
       
   492 
       
   493     dctx = zstd.ZstdDecompressor()
       
   494     with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
       
   495         dctx.copy_stream(ifh, ofh)
       
   496 
       
   497 The size of chunks being ``read()`` and ``write()`` from and to the streams
       
   498 can be specified::
       
   499 
       
   500     dctx = zstd.ZstdDecompressor()
       
   501     dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384)
       
   502 
       
   503 Decompressor API
       
   504 ^^^^^^^^^^^^^^^^
       
   505 
       
   506 ``decompressobj()`` returns an object that exposes a ``decompress(data)``
       
   507 methods. Compressed data chunks are fed into ``decompress(data)`` and
       
   508 uncompressed output (or an empty bytes) is returned. Output from subsequent
       
   509 calls needs to be concatenated to reassemble the full decompressed byte
       
   510 sequence.
       
   511 
       
   512 The purpose of ``decompressobj()`` is to provide an API-compatible interface
       
   513 with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers
       
   514 to swap in different decompressor objects while using the same API.
       
   515 
       
   516 Each object is single use: once an input frame is decoded, ``decompress()``
       
   517 can no longer be called.
       
   518 
       
   519 Here is how this API should be used::
       
   520 
       
   521    dctx = zstd.ZstdDeompressor()
       
   522    dobj = cctx.decompressobj()
       
   523    data = dobj.decompress(compressed_chunk_0)
       
   524    data = dobj.decompress(compressed_chunk_1)
       
   525 
       
   526 Choosing an API
       
   527 ---------------
       
   528 
       
   529 Various forms of compression and decompression APIs are provided because each
       
   530 are suitable for different use cases.
       
   531 
       
   532 The simple/one-shot APIs are useful for small data, when the decompressed
       
   533 data size is known (either recorded in the zstd frame header via
       
   534 ``write_content_size`` or known via an out-of-band mechanism, such as a file
       
   535 size).
       
   536 
       
   537 A limitation of the simple APIs is that input or output data must fit in memory.
       
   538 And unless using advanced tricks with Python *buffer objects*, both input and
       
   539 output must fit in memory simultaneously.
       
   540 
       
   541 Another limitation is that compression or decompression is performed as a single
       
   542 operation. So if you feed large input, it could take a long time for the
       
   543 function to return.
       
   544 
       
   545 The streaming APIs do not have the limitations of the simple API. The cost to
       
   546 this is they are more complex to use than a single function call.
       
   547 
       
   548 The streaming APIs put the caller in control of compression and decompression
       
   549 behavior by allowing them to directly control either the input or output side
       
   550 of the operation.
       
   551 
       
   552 With the streaming input APIs, the caller feeds data into the compressor or
       
   553 decompressor as they see fit. Output data will only be written after the caller
       
   554 has explicitly written data.
       
   555 
       
   556 With the streaming output APIs, the caller consumes output from the compressor
       
   557 or decompressor as they see fit. The compressor or decompressor will only
       
   558 consume data from the source when the caller is ready to receive it.
       
   559 
       
   560 One end of the streaming APIs involves a file-like object that must
       
   561 ``write()`` output data or ``read()`` input data. Depending on what the
       
   562 backing storage for these objects is, those operations may not complete quickly.
       
   563 For example, when streaming compressed data to a file, the ``write()`` into
       
   564 a streaming compressor could result in a ``write()`` to the filesystem, which
       
   565 may take a long time to finish due to slow I/O on the filesystem. So, there
       
   566 may be overhead in streaming APIs beyond the compression and decompression
       
   567 operations.
       
   568 
       
   569 Dictionary Creation and Management
       
   570 ----------------------------------
       
   571 
       
   572 Zstandard allows *dictionaries* to be used when compressing and
       
   573 decompressing data. The idea is that if you are compressing a lot of similar
       
   574 data, you can precompute common properties of that data (such as recurring
       
   575 byte sequences) to achieve better compression ratios.
       
   576 
       
   577 In Python, compression dictionaries are represented as the
       
   578 ``ZstdCompressionDict`` type.
       
   579 
       
   580 Instances can be constructed from bytes::
       
   581 
       
   582    dict_data = zstd.ZstdCompressionDict(data)
       
   583 
       
   584 More interestingly, instances can be created by *training* on sample data::
       
   585 
       
   586    dict_data = zstd.train_dictionary(size, samples)
       
   587 
       
   588 This takes a list of bytes instances and creates and returns a
       
   589 ``ZstdCompressionDict``.
       
   590 
       
   591 You can see how many bytes are in the dictionary by calling ``len()``::
       
   592 
       
   593    dict_data = zstd.train_dictionary(size, samples)
       
   594    dict_size = len(dict_data)  # will not be larger than ``size``
       
   595 
       
   596 Once you have a dictionary, you can pass it to the objects performing
       
   597 compression and decompression::
       
   598 
       
   599    dict_data = zstd.train_dictionary(16384, samples)
       
   600 
       
   601    cctx = zstd.ZstdCompressor(dict_data=dict_data)
       
   602    for source_data in input_data:
       
   603        compressed = cctx.compress(source_data)
       
   604        # Do something with compressed data.
       
   605 
       
   606    dctx = zstd.ZstdDecompressor(dict_data=dict_data)
       
   607    for compressed_data in input_data:
       
   608        buffer = io.BytesIO()
       
   609        with dctx.write_to(buffer) as decompressor:
       
   610            decompressor.write(compressed_data)
       
   611        # Do something with raw data in ``buffer``.
       
   612 
       
   613 Dictionaries have unique integer IDs. You can retrieve this ID via::
       
   614 
       
   615    dict_id = zstd.dictionary_id(dict_data)
       
   616 
       
   617 You can obtain the raw data in the dict (useful for persisting and constructing
       
   618 a ``ZstdCompressionDict`` later) via ``as_bytes()``::
       
   619 
       
   620    dict_data = zstd.train_dictionary(size, samples)
       
   621    raw_data = dict_data.as_bytes()
       
   622 
       
   623 Explicit Compression Parameters
       
   624 -------------------------------
       
   625 
       
   626 Zstandard's integer compression levels along with the input size and dictionary
       
   627 size are converted into a data structure defining multiple parameters to tune
       
   628 behavior of the compression algorithm. It is possible to use define this
       
   629 data structure explicitly to have lower-level control over compression behavior.
       
   630 
       
   631 The ``zstd.CompressionParameters`` type represents this data structure.
       
   632 You can see how Zstandard converts compression levels to this data structure
       
   633 by calling ``zstd.get_compression_parameters()``. e.g.::
       
   634 
       
   635     params = zstd.get_compression_parameters(5)
       
   636 
       
   637 This function also accepts the uncompressed data size and dictionary size
       
   638 to adjust parameters::
       
   639 
       
   640     params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data))
       
   641 
       
   642 You can also construct compression parameters from their low-level components::
       
   643 
       
   644     params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST)
       
   645 
       
   646 You can then configure a compressor to use the custom parameters::
       
   647 
       
   648     cctx = zstd.ZstdCompressor(compression_params=params)
       
   649 
       
   650 The members of the ``CompressionParameters`` tuple are as follows::
       
   651 
       
   652 * 0 - Window log
       
   653 * 1 - Chain log
       
   654 * 2 - Hash log
       
   655 * 3 - Search log
       
   656 * 4 - Search length
       
   657 * 5 - Target length
       
   658 * 6 - Strategy (one of the ``zstd.STRATEGY_`` constants)
       
   659 
       
   660 You'll need to read the Zstandard documentation for what these parameters
       
   661 do.
       
   662 
       
   663 Misc Functionality
       
   664 ------------------
       
   665 
       
   666 estimate_compression_context_size(CompressionParameters)
       
   667 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       
   668 
       
   669 Given a ``CompressionParameters`` struct, estimate the memory size required
       
   670 to perform compression.
       
   671 
       
   672 estimate_decompression_context_size()
       
   673 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       
   674 
       
   675 Estimate the memory size requirements for a decompressor instance.
       
   676 
       
   677 Constants
       
   678 ---------
       
   679 
       
   680 The following module constants/attributes are exposed:
       
   681 
       
   682 ZSTD_VERSION
       
   683     This module attribute exposes a 3-tuple of the Zstandard version. e.g.
       
   684     ``(1, 0, 0)``
       
   685 MAX_COMPRESSION_LEVEL
       
   686     Integer max compression level accepted by compression functions
       
   687 COMPRESSION_RECOMMENDED_INPUT_SIZE
       
   688     Recommended chunk size to feed to compressor functions
       
   689 COMPRESSION_RECOMMENDED_OUTPUT_SIZE
       
   690     Recommended chunk size for compression output
       
   691 DECOMPRESSION_RECOMMENDED_INPUT_SIZE
       
   692     Recommended chunk size to feed into decompresor functions
       
   693 DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE
       
   694     Recommended chunk size for decompression output
       
   695 
       
   696 FRAME_HEADER
       
   697     bytes containing header of the Zstandard frame
       
   698 MAGIC_NUMBER
       
   699     Frame header as an integer
       
   700 
       
   701 WINDOWLOG_MIN
       
   702     Minimum value for compression parameter
       
   703 WINDOWLOG_MAX
       
   704     Maximum value for compression parameter
       
   705 CHAINLOG_MIN
       
   706     Minimum value for compression parameter
       
   707 CHAINLOG_MAX
       
   708     Maximum value for compression parameter
       
   709 HASHLOG_MIN
       
   710     Minimum value for compression parameter
       
   711 HASHLOG_MAX
       
   712     Maximum value for compression parameter
       
   713 SEARCHLOG_MIN
       
   714     Minimum value for compression parameter
       
   715 SEARCHLOG_MAX
       
   716     Maximum value for compression parameter
       
   717 SEARCHLENGTH_MIN
       
   718     Minimum value for compression parameter
       
   719 SEARCHLENGTH_MAX
       
   720     Maximum value for compression parameter
       
   721 TARGETLENGTH_MIN
       
   722     Minimum value for compression parameter
       
   723 TARGETLENGTH_MAX
       
   724     Maximum value for compression parameter
       
   725 STRATEGY_FAST
       
   726     Compression strategory
       
   727 STRATEGY_DFAST
       
   728     Compression strategory
       
   729 STRATEGY_GREEDY
       
   730     Compression strategory
       
   731 STRATEGY_LAZY
       
   732     Compression strategory
       
   733 STRATEGY_LAZY2
       
   734     Compression strategory
       
   735 STRATEGY_BTLAZY2
       
   736     Compression strategory
       
   737 STRATEGY_BTOPT
       
   738     Compression strategory
       
   739 
       
   740 Note on Zstandard's *Experimental* API
       
   741 ======================================
       
   742 
       
   743 Many of the Zstandard APIs used by this module are marked as *experimental*
       
   744 within the Zstandard project. This includes a large number of useful
       
   745 features, such as compression and frame parameters and parts of dictionary
       
   746 compression.
       
   747 
       
   748 It is unclear how Zstandard's C API will evolve over time, especially with
       
   749 regards to this *experimental* functionality. We will try to maintain
       
   750 backwards compatibility at the Python API level. However, we cannot
       
   751 guarantee this for things not under our control.
       
   752 
       
   753 Since a copy of the Zstandard source code is distributed with this
       
   754 module and since we compile against it, the behavior of a specific
       
   755 version of this module should be constant for all of time. So if you
       
   756 pin the version of this module used in your projects (which is a Python
       
   757 best practice), you should be buffered from unwanted future changes.
       
   758 
       
   759 Donate
       
   760 ======
       
   761 
       
   762 A lot of time has been invested into this project by the author.
       
   763 
       
   764 If you find this project useful and would like to thank the author for
       
   765 their work, consider donating some money. Any amount is appreciated.
       
   766 
       
   767 .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif
       
   768     :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard&currency_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted
       
   769     :alt: Donate via PayPal
       
   770 
       
   771 .. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master
       
   772     :target: https://travis-ci.org/indygreg/python-zstandard
       
   773 
       
   774 .. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true
       
   775     :target: https://ci.appveyor.com/project/indygreg/python-zstandard
       
   776     :alt: Windows build status