comparison contrib/python-zstandard/README.rst @ 30822:b54a2984cdd4

zstd: vendor python-zstandard 0.6.0 Commit 63c68d6f5fc8de4afd9bde81b13b537beb4e47e8 from https://github.com/indygreg/python-zstandard is imported without modifications (other than removing unwanted files). This includes minor performance and feature improvements. It also changes the vendored zstd library from 1.1.1 to 1.1.2. # no-check-commit
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 14 Jan 2017 19:41:43 -0800
parents b86a448a2965
children c32454d69b85
comparison
equal deleted inserted replaced
30821:7005c03f7387 30822:b54a2984cdd4
1 ================ 1 ================
2 python-zstandard 2 python-zstandard
3 ================ 3 ================
4 4
5 This project provides a Python C extension for interfacing with the 5 This project provides Python bindings for interfacing with the
6 `Zstandard <http://www.zstd.net>`_ compression library. 6 `Zstandard <http://www.zstd.net>`_ compression library. A C extension
7 and CFFI interface is provided.
7 8
8 The primary goal of the extension is to provide a Pythonic interface to 9 The primary goal of the extension is to provide a Pythonic interface to
9 the underlying C API. This means exposing most of the features and flexibility 10 the underlying C API. This means exposing most of the features and flexibility
10 of the C API while not sacrificing usability or safety that Python provides. 11 of the C API while not sacrificing usability or safety that Python provides.
12
13 The canonical home for this project is
14 https://github.com/indygreg/python-zstandard.
11 15
12 | |ci-status| |win-ci-status| 16 | |ci-status| |win-ci-status|
13 17
14 State of Project 18 State of Project
15 ================ 19 ================
203 write_dict_id 207 write_dict_id
204 Whether to write the dictionary ID into the compressed data. 208 Whether to write the dictionary ID into the compressed data.
205 Defaults to True. The dictionary ID is only written if a dictionary 209 Defaults to True. The dictionary ID is only written if a dictionary
206 is being used. 210 is being used.
207 211
212 Unless specified otherwise, assume that no two methods of ``ZstdCompressor``
213 instances can be called from multiple Python threads simultaneously. In other
214 words, assume instances are not thread safe unless stated otherwise.
215
208 Simple API 216 Simple API
209 ^^^^^^^^^^ 217 ^^^^^^^^^^
210 218
211 ``compress(data)`` compresses and returns data as a one-shot operation.:: 219 ``compress(data)`` compresses and returns data as a one-shot operation.::
212 220
213 cctx = zstd.ZsdCompressor() 221 cctx = zstd.ZstdCompressor()
214 compressed = cctx.compress(b'data to compress') 222 compressed = cctx.compress(b'data to compress')
223
224 Unless ``compression_params`` or ``dict_data`` are passed to the
225 ``ZstdCompressor``, each invocation of ``compress()`` will calculate the
226 optimal compression parameters for the configured compression ``level`` and
227 input data size (some parameters are fine-tuned for small input sizes).
228
229 If a compression dictionary is being used, the compression parameters
230 determined from the first input's size will be reused for subsequent
231 operations.
232
233 There is currently a deficiency in zstd's C APIs that makes it difficult
234 to round trip empty inputs when ``write_content_size=True``. Attempting
235 this will raise a ``ValueError`` unless ``allow_empty=True`` is passed
236 to ``compress()``.
215 237
216 Streaming Input API 238 Streaming Input API
217 ^^^^^^^^^^^^^^^^^^^ 239 ^^^^^^^^^^^^^^^^^^^
218 240
219 ``write_to(fh)`` (which behaves as a context manager) allows you to *stream* 241 ``write_to(fh)`` (which behaves as a context manager) allows you to *stream*
224 compressor.write(b'chunk 0') 246 compressor.write(b'chunk 0')
225 compressor.write(b'chunk 1') 247 compressor.write(b'chunk 1')
226 ... 248 ...
227 249
228 The argument to ``write_to()`` must have a ``write(data)`` method. As 250 The argument to ``write_to()`` must have a ``write(data)`` method. As
229 compressed data is available, ``write()`` will be called with the comrpessed 251 compressed data is available, ``write()`` will be called with the compressed
230 data as its argument. Many common Python types implement ``write()``, including 252 data as its argument. Many common Python types implement ``write()``, including
231 open file handles and ``io.BytesIO``. 253 open file handles and ``io.BytesIO``.
232 254
233 ``write_to()`` returns an object representing a streaming compressor instance. 255 ``write_to()`` returns an object representing a streaming compressor instance.
234 It **must** be used as a context manager. That object's ``write(data)`` method 256 It **must** be used as a context manager. That object's ``write(data)`` method
235 is used to feed data into the compressor. 257 is used to feed data into the compressor.
258
259 A ``flush()`` method can be called to evict whatever data remains within the
260 compressor's internal state into the output object. This may result in 0 or
261 more ``write()`` calls to the output object.
236 262
237 If the size of the data being fed to this streaming compressor is known, 263 If the size of the data being fed to this streaming compressor is known,
238 you can declare it before compression begins:: 264 you can declare it before compression begins::
239 265
240 cctx = zstd.ZstdCompressor() 266 cctx = zstd.ZstdCompressor()
277 Uncompressed data is fetched from the source either by calling ``read(size)`` 303 Uncompressed data is fetched from the source either by calling ``read(size)``
278 or by fetching a slice of data from the object directly (in the case where 304 or by fetching a slice of data from the object directly (in the case where
279 the buffer protocol is being used). The returned iterator consists of chunks 305 the buffer protocol is being used). The returned iterator consists of chunks
280 of compressed data. 306 of compressed data.
281 307
308 If reading from the source via ``read()``, ``read()`` will be called until
309 it raises or returns an empty bytes (``b''``). It is perfectly valid for
310 the source to deliver fewer bytes than were what requested by ``read(size)``.
311
282 Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument 312 Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument
283 declaring the size of the input stream:: 313 declaring the size of the input stream::
284 314
285 cctx = zstd.ZstdCompressor() 315 cctx = zstd.ZstdCompressor()
286 for chunk in cctx.read_from(fh, size=some_int): 316 for chunk in cctx.read_from(fh, size=some_int):
291 321
292 cctx = zstd.ZstdCompressor() 322 cctx = zstd.ZstdCompressor()
293 for chunk in cctx.read_from(fh, read_size=16384, write_size=8192): 323 for chunk in cctx.read_from(fh, read_size=16384, write_size=8192):
294 pass 324 pass
295 325
326 Unlike ``write_to()``, ``read_from()`` does not give direct control over the
327 sizes of chunks fed into the compressor. Instead, chunk sizes will be whatever
328 the object being read from delivers. These will often be of a uniform size.
329
296 Stream Copying API 330 Stream Copying API
297 ^^^^^^^^^^^^^^^^^^ 331 ^^^^^^^^^^^^^^^^^^
298 332
299 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while 333 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while
300 compressing it.:: 334 compressing it.::
332 366
333 The purpose of ``compressobj()`` is to provide an API-compatible interface 367 The purpose of ``compressobj()`` is to provide an API-compatible interface
334 with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to 368 with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to
335 swap in different compressor objects while using the same API. 369 swap in different compressor objects while using the same API.
336 370
337 Once ``flush()`` is called, the compressor will no longer accept new data 371 ``flush()`` accepts an optional argument indicating how to end the stream.
338 to ``compress()``. ``flush()`` **must** be called to end the compression 372 ``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream.
339 context. If not called, the returned data may be incomplete. 373 Once this type of flush is performed, ``compress()`` and ``flush()`` can
374 no longer be called. This type of flush **must** be called to end the
375 compression context. If not called, returned data may be incomplete.
376
377 A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a
378 zstd block. Flushes of this type can be performed multiple times. The next
379 call to ``compress()`` will begin a new zstd block.
340 380
341 Here is how this API should be used:: 381 Here is how this API should be used::
342 382
343 cctx = zstd.ZstdCompressor() 383 cctx = zstd.ZstdCompressor()
344 cobj = cctx.compressobj() 384 cobj = cctx.compressobj()
345 data = cobj.compress(b'raw input 0') 385 data = cobj.compress(b'raw input 0')
346 data = cobj.compress(b'raw input 1') 386 data = cobj.compress(b'raw input 1')
347 data = cobj.flush() 387 data = cobj.flush()
348 388
389 Or to flush blocks::
390
391 cctx.zstd.ZstdCompressor()
392 cobj = cctx.compressobj()
393 data = cobj.compress(b'chunk in first block')
394 data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK)
395 data = cobj.compress(b'chunk in second block')
396 data = cobj.flush()
397
349 For best performance results, keep input chunks under 256KB. This avoids 398 For best performance results, keep input chunks under 256KB. This avoids
350 extra allocations for a large output object. 399 extra allocations for a large output object.
351 400
352 It is possible to declare the input size of the data that will be fed into 401 It is possible to declare the input size of the data that will be fed into
353 the compressor:: 402 the compressor::
368 417
369 dict_data 418 dict_data
370 Compression dictionary to use. 419 Compression dictionary to use.
371 420
372 The interface of this class is very similar to ``ZstdCompressor`` (by design). 421 The interface of this class is very similar to ``ZstdCompressor`` (by design).
422
423 Unless specified otherwise, assume that no two methods of ``ZstdDecompressor``
424 instances can be called from multiple Python threads simultaneously. In other
425 words, assume instances are not thread safe unless stated otherwise.
373 426
374 Simple API 427 Simple API
375 ^^^^^^^^^^ 428 ^^^^^^^^^^
376 429
377 ``decompress(data)`` can be used to decompress an entire compressed zstd 430 ``decompress(data)`` can be used to decompress an entire compressed zstd