Mercurial > hg
annotate contrib/python-zstandard/README.rst @ 46472:98e39f04d60e
upgrade: implement partial upgrade for upgrading persistent-nodemap
Upgrading repositories to use persistent nodemap should be fast and easy as it
requires only two things:
1) Updating the requirements
2) Writing a persistent-nodemap on disk
For both of the steps above, we don't need to edit existing revlogs.
This patch makes upgrade only do the above mentioned two steps if we are
only upgarding to use persistent-nodemap feature.
Since `nodemap.persist_nodemap()` assumes that there exists a nodemap file for
the given revlog if we are trying to call it, this patch adds `force` argument
to create a file if does not exist which is true in our upgrade case.
The test changes demonstrate that we no longer write nodemap files for manifest
after upgrade which I think is desirable.
Differential Revision: https://phab.mercurial-scm.org/D9936
author | Pulkit Goyal <7895pulkit@gmail.com> |
---|---|
date | Mon, 01 Feb 2021 00:02:00 +0530 |
parents | de7838053207 |
children |
rev | line source |
---|---|
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
2 python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
3 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
4 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
5 This project provides Python bindings for interfacing with the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
6 `Zstandard <http://www.zstd.net>`_ compression library. A C extension |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
7 and CFFI interface are provided. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
8 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
9 The primary goal of the project is to provide a rich interface to the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
10 underlying C API through a Pythonic interface while not sacrificing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
11 performance. This means exposing most of the features and flexibility |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
12 of the C API while not sacrificing usability or safety that Python provides. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
13 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
14 The canonical home for this project lives in a Mercurial repository run by |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
15 the author. For convenience, that repository is frequently synchronized to |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
16 https://github.com/indygreg/python-zstandard. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
17 |
42937
69de49c4e39c
zstandard: vendor python-zstandard 0.12
Gregory Szorc <gregory.szorc@gmail.com>
parents:
42070
diff
changeset
|
18 | |ci-status| |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
19 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
20 Requirements |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
21 ============ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
22 |
43994
de7838053207
zstandard: vendor python-zstandard 0.13.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
42937
diff
changeset
|
23 This extension is designed to run with Python 2.7, 3.5, 3.6, 3.7, and 3.8 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
24 on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
25 x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
26 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
27 Installing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
28 ========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
29 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
30 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
31 So, to install this package:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
32 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
33 $ pip install zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
34 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
35 Binary wheels are made available for some platforms. If you need to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
36 install from a source distribution, all you should need is a working C |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
37 compiler and the Python development headers/libraries. On many Linux |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
38 distributions, you can install a ``python-dev`` or ``python-devel`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
39 package to provide these dependencies. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
40 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
41 Packages are also uploaded to Anaconda Cloud at |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
42 https://anaconda.org/indygreg/zstandard. See that URL for how to install |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
43 this package with ``conda``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
44 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
45 Performance |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
46 =========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
47 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
48 zstandard is a highly tunable compression algorithm. In its default settings |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
49 (compression level 3), it will be faster at compression and decompression and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
50 will have better compression ratios than zlib on most data sets. When tuned |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
51 for speed, it approaches lz4's speed and ratios. When tuned for compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
52 ratio, it approaches lzma ratios and compression speed, but decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
53 speed is much faster. See the official zstandard documentation for more. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
54 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
55 zstandard and this library support multi-threaded compression. There is a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
56 mechanism to compress large inputs using multiple threads. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
57 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
58 The performance of this library is usually very similar to what the zstandard |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
59 C API can deliver. Overhead in this library is due to general Python overhead |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
60 and can't easily be avoided by *any* zstandard Python binding. This library |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
61 exposes multiple APIs for performing compression and decompression so callers |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
62 can pick an API suitable for their need. Contrast with the compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
63 modules in Python's standard library (like ``zlib``), which only offer limited |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
64 mechanisms for performing operations. The API flexibility means consumers can |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
65 choose to use APIs that facilitate zero copying or minimize Python object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
66 creation and garbage collection overhead. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
67 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
68 This library is capable of single-threaded throughputs well over 1 GB/s. For |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
69 exact numbers, measure yourself. The source code repository has a ``bench.py`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
70 script that can be used to measure things. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
71 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
72 API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
73 === |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
74 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
75 To interface with Zstandard, simply import the ``zstandard`` module:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
76 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
77 import zstandard |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
78 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
79 It is a popular convention to alias the module as a different name for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
80 brevity:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
81 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
82 import zstandard as zstd |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
83 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
84 This module attempts to import and use either the C extension or CFFI |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
85 implementation. On Python platforms known to support C extensions (like |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
86 CPython), it raises an ImportError if the C extension cannot be imported. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
87 On Python platforms known to not support C extensions (like PyPy), it only |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
88 attempts to import the CFFI implementation and raises ImportError if that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
89 can't be done. On other platforms, it first tries to import the C extension |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
90 then falls back to CFFI if that fails and raises ImportError if CFFI fails. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
91 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
92 To change the module import behavior, a ``PYTHON_ZSTANDARD_IMPORT_POLICY`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
93 environment variable can be set. The following values are accepted: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
94 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
95 default |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
96 The behavior described above. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
97 cffi_fallback |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
98 Always try to import the C extension then fall back to CFFI if that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
99 fails. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
100 cext |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
101 Only attempt to import the C extension. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
102 cffi |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
103 Only attempt to import the CFFI implementation. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
104 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
105 In addition, the ``zstandard`` module exports a ``backend`` attribute |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
106 containing the string name of the backend being used. It will be one |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
107 of ``cext`` or ``cffi`` (for *C extension* and *cffi*, respectively). |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
108 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
109 The types, functions, and attributes exposed by the ``zstandard`` module |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
110 are documented in the sections below. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
111 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
112 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
113 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
114 The documentation in this section makes references to various zstd |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
115 concepts and functionality. The source repository contains a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
116 ``docs/concepts.rst`` file explaining these in more detail. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
117 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
118 ZstdCompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
119 -------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
120 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
121 The ``ZstdCompressor`` class provides an interface for performing |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
122 compression operations. Each instance is essentially a wrapper around a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
123 ``ZSTD_CCtx`` from the C API. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
124 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
125 Each instance is associated with parameters that control compression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
126 behavior. These come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
127 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
128 level |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
129 Integer compression level. Valid values are between 1 and 22. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
130 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
131 Compression dictionary to use. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
132 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
133 Note: When using dictionary data and ``compress()`` is called multiple |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
134 times, the ``ZstdCompressionParameters`` derived from an integer |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
135 compression ``level`` and the first compressed data's size will be reused |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
136 for all subsequent operations. This may not be desirable if source data |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
137 size varies significantly. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
138 compression_params |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
139 A ``ZstdCompressionParameters`` instance defining compression settings. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
140 write_checksum |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
141 Whether a 4 byte checksum should be written with the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
142 Defaults to False. If True, the decompressor can verify that decompressed |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
143 data matches the original input data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
144 write_content_size |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
145 Whether the size of the uncompressed data will be written into the |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
146 header of compressed data. Defaults to True. The data will only be |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
147 written if the compressor knows the size of the input data. This is |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
148 often not true for streaming compression. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
149 write_dict_id |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
150 Whether to write the dictionary ID into the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
151 Defaults to True. The dictionary ID is only written if a dictionary |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
152 is being used. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
153 threads |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
154 Enables and sets the number of threads to use for multi-threaded compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
155 operations. Defaults to 0, which means to use single-threaded compression. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
156 Negative values will resolve to the number of logical CPUs in the system. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
157 Read below for more info on multi-threaded compression. This argument only |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
158 controls thread count for operations that operate on individual pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
159 data. APIs that spawn multiple threads for working on multiple pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
160 data have their own ``threads`` argument. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
161 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
162 ``compression_params`` is mutually exclusive with ``level``, ``write_checksum``, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
163 ``write_content_size``, ``write_dict_id``, and ``threads``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
164 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
165 Unless specified otherwise, assume that no two methods of ``ZstdCompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
166 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
167 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
168 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
169 Utility Methods |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
170 ^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
171 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
172 ``frame_progression()`` returns a 3-tuple containing the number of bytes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
173 ingested, consumed, and produced by the current compression operation. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
174 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
175 ``memory_size()`` obtains the memory utilization of the underlying zstd |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
176 compression context, in bytes.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
177 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
178 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
179 memory = cctx.memory_size() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
180 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
181 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
182 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
183 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
184 ``compress(data)`` compresses and returns data as a one-shot operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
185 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
186 cctx = zstd.ZstdCompressor() |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
187 compressed = cctx.compress(b'data to compress') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
188 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
189 The ``data`` argument can be any object that implements the *buffer protocol*. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
190 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
191 Stream Reader API |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
192 ^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
193 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
194 ``stream_reader(source)`` can be used to obtain an object conforming to the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
195 ``io.RawIOBase`` interface for reading compressed output as a stream:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
196 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
197 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
198 cctx = zstd.ZstdCompressor() |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
199 reader = cctx.stream_reader(fh) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
200 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
201 chunk = reader.read(16384) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
202 if not chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
203 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
204 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
205 # Do something with compressed chunk. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
206 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
207 Instances can also be used as context managers:: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
208 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
209 with open(path, 'rb') as fh: |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
210 with cctx.stream_reader(fh) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
211 while True: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
212 chunk = reader.read(16384) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
213 if not chunk: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
214 break |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
215 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
216 # Do something with compressed chunk. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
217 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
218 When the context manager exits or ``close()`` is called, the stream is closed, |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
219 underlying resources are released, and future operations against the compression |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
220 stream will fail. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
221 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
222 The ``source`` argument to ``stream_reader()`` can be any object with a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
223 ``read(size)`` method or any object implementing the *buffer protocol*. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
224 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
225 ``stream_reader()`` accepts a ``size`` argument specifying how large the input |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
226 stream is. This is used to adjust compression parameters so they are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
227 tailored to the source size.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
228 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
229 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
230 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
231 with cctx.stream_reader(fh, size=os.stat(path).st_size) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
232 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
233 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
234 If the ``source`` is a stream, you can specify how large ``read()`` requests |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
235 to that stream should be via the ``read_size`` argument. It defaults to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
236 ``zstandard.COMPRESSION_RECOMMENDED_INPUT_SIZE``.:: |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
237 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
238 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
239 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
240 # Will perform fh.read(8192) when obtaining data to feed into the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
241 # compressor. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
242 with cctx.stream_reader(fh, read_size=8192) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
243 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
244 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
245 The stream returned by ``stream_reader()`` is neither writable nor seekable |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
246 (even if the underlying source is seekable). ``readline()`` and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
247 ``readlines()`` are not implemented because they don't make sense for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
248 compressed data. ``tell()`` returns the number of compressed bytes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
249 emitted so far. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
250 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
251 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
252 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
253 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
254 ``stream_writer(fh)`` allows you to *stream* data into a compressor. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
255 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
256 Returned instances implement the ``io.RawIOBase`` interface. Only methods |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
257 that involve writing will do useful things. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
258 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
259 The argument to ``stream_writer()`` must have a ``write(data)`` method. As |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
260 compressed data is available, ``write()`` will be called with the compressed |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
261 data as its argument. Many common Python types implement ``write()``, including |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
262 open file handles and ``io.BytesIO``. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
263 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
264 The ``write(data)`` method is used to feed data into the compressor. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
265 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
266 The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
267 data remains within the compressor's internal state into the output object. This |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
268 may result in 0 or more ``write()`` calls to the output object. This method |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
269 accepts an optional ``flush_mode`` argument to control the flushing behavior. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
270 Its value can be any of the ``FLUSH_*`` constants. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
271 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
272 Both ``write()`` and ``flush()`` return the number of bytes written to the |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
273 object's ``write()``. In many cases, small inputs do not accumulate enough |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
274 data to cause a write and ``write()`` will return ``0``. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
275 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
276 Calling ``close()`` will mark the stream as closed and subsequent I/O |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
277 operations will raise ``ValueError`` (per the documented behavior of |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
278 ``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
279 stream if such a method exists. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
280 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
281 Typically usage is as follows:: |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
282 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
283 cctx = zstd.ZstdCompressor(level=10) |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
284 compressor = cctx.stream_writer(fh) |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
285 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
286 compressor.write(b'chunk 0\n') |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
287 compressor.write(b'chunk 1\n') |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
288 compressor.flush() |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
289 # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
290 # Receiver is also expecting more data in the zstd *frame*. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
291 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
292 compressor.write(b'chunk 2\n') |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
293 compressor.flush(zstd.FLUSH_FRAME) |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
294 # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
295 # Receiver is expecting no more data, as the zstd frame is closed. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
296 # Any future calls to ``write()`` at this point will construct a new |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
297 # zstd frame. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
298 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
299 Instances can be used as context managers. Exiting the context manager is |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
300 the equivalent of calling ``close()``, which is equivalent to calling |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
301 ``flush(zstd.FLUSH_FRAME)``:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
302 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
303 cctx = zstd.ZstdCompressor(level=10) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
304 with cctx.stream_writer(fh) as compressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
305 compressor.write(b'chunk 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
306 compressor.write(b'chunk 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
307 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
308 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
309 .. important:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
310 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
311 If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
312 a full zstd *frame* and consumers of this data may complain about malformed |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
313 input. It is recommended to use instances as a context manager to ensure |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
314 *frames* are properly finished. |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
315 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
316 If the size of the data being fed to this streaming compressor is known, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
317 you can declare it before compression begins:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
318 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
319 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
320 with cctx.stream_writer(fh, size=data_len) as compressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
321 compressor.write(chunk0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
322 compressor.write(chunk1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
323 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
324 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
325 Declaring the size of the source data allows compression parameters to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
326 be tuned. And if ``write_content_size`` is used, it also results in the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
327 content size being written into the frame header of the output data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
328 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
329 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
330 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
331 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
332 with cctx.stream_writer(fh, write_size=32768) as compressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
333 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
334 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
335 To see how much memory is being used by the streaming compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
336 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
337 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
338 with cctx.stream_writer(fh) as compressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
339 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
340 byte_size = compressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
341 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
342 Thte total number of bytes written so far are exposed via ``tell()``:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
343 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
344 cctx = zstd.ZstdCompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
345 with cctx.stream_writer(fh) as compressor: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
346 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
347 total_written = compressor.tell() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
348 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
349 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
350 the return value of ``write()``. When ``False`` (the default), ``write()`` returns |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
351 the number of bytes that were ``write()``en to the underlying object. When |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
352 ``True``, ``write()`` returns the number of bytes read from the input that |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
353 were subsequently written to the compressor. ``True`` is the *proper* behavior |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
354 for ``write()`` as specified by the ``io.RawIOBase`` interface and will become |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
355 the default value in a future release. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
356 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
357 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
358 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
359 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
360 ``read_to_iter(reader)`` provides a mechanism to stream data out of a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
361 compressor as an iterator of data chunks.:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
362 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
363 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
364 for chunk in cctx.read_to_iter(fh): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
365 # Do something with emitted data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
366 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
367 ``read_to_iter()`` accepts an object that has a ``read(size)`` method or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
368 conforms to the buffer protocol. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
369 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
370 Uncompressed data is fetched from the source either by calling ``read(size)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
371 or by fetching a slice of data from the object directly (in the case where |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
372 the buffer protocol is being used). The returned iterator consists of chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
373 of compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
374 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
375 If reading from the source via ``read()``, ``read()`` will be called until |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
376 it raises or returns an empty bytes (``b''``). It is perfectly valid for |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
377 the source to deliver fewer bytes than were what requested by ``read(size)``. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
378 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
379 Like ``stream_writer()``, ``read_to_iter()`` also accepts a ``size`` argument |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
380 declaring the size of the input stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
381 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
382 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
383 for chunk in cctx.read_to_iter(fh, size=some_int): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
384 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
385 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
386 You can also control the size that data is ``read()`` from the source and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
387 the ideal size of output chunks:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
388 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
389 cctx = zstd.ZstdCompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
390 for chunk in cctx.read_to_iter(fh, read_size=16384, write_size=8192): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
391 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
392 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
393 Unlike ``stream_writer()``, ``read_to_iter()`` does not give direct control |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
394 over the sizes of chunks fed into the compressor. Instead, chunk sizes will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
395 be whatever the object being read from delivers. These will often be of a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
396 uniform size. |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
397 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
398 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
399 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
400 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
401 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
402 compressing it.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
403 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
404 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
405 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
406 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
407 For example, say you wish to compress a file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
408 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
409 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
410 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
411 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
412 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
413 It is also possible to declare the size of the source stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
414 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
415 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
416 cctx.copy_stream(ifh, ofh, size=len_of_input) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
417 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
418 You can also specify how large the chunks that are ``read()`` and ``write()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
419 from and to the streams:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
420 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
421 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
422 cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
423 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
424 The stream copier returns a 2-tuple of bytes read and written:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
425 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
426 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
427 read_count, write_count = cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
428 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
429 Compressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
430 ^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
431 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
432 ``compressobj()`` returns an object that exposes ``compress(data)`` and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
433 ``flush()`` methods. Each returns compressed data or an empty bytes. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
434 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
435 The purpose of ``compressobj()`` is to provide an API-compatible interface |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
436 with ``zlib.compressobj``, ``bz2.BZ2Compressor``, etc. This allows callers to |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
437 swap in different compressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
438 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
439 ``flush()`` accepts an optional argument indicating how to end the stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
440 ``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
441 Once this type of flush is performed, ``compress()`` and ``flush()`` can |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
442 no longer be called. This type of flush **must** be called to end the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
443 compression context. If not called, returned data may be incomplete. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
444 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
445 A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
446 zstd block. Flushes of this type can be performed multiple times. The next |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
447 call to ``compress()`` will begin a new zstd block. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
448 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
449 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
450 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
451 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
452 cobj = cctx.compressobj() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
453 data = cobj.compress(b'raw input 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
454 data = cobj.compress(b'raw input 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
455 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
456 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
457 Or to flush blocks:: |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
458 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
459 cctx.zstd.ZstdCompressor() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
460 cobj = cctx.compressobj() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
461 data = cobj.compress(b'chunk in first block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
462 data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK) |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
463 data = cobj.compress(b'chunk in second block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
464 data = cobj.flush() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
465 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
466 For best performance results, keep input chunks under 256KB. This avoids |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
467 extra allocations for a large output object. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
468 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
469 It is possible to declare the input size of the data that will be fed into |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
470 the compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
471 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
472 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
473 cobj = cctx.compressobj(size=6) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
474 data = cobj.compress(b'foobar') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
475 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
476 |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
477 Chunker API |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
478 ^^^^^^^^^^^ |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
479 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
480 ``chunker(size=None, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE)`` returns |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
481 an object that can be used to iteratively feed chunks of data into a compressor |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
482 and produce output chunks of a uniform size. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
483 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
484 The object returned by ``chunker()`` exposes the following methods: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
485 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
486 ``compress(data)`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
487 Feeds new input data into the compressor. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
488 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
489 ``flush()`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
490 Flushes all data currently in the compressor. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
491 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
492 ``finish()`` |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
493 Signals the end of input data. No new data can be compressed after this |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
494 method is called. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
495 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
496 ``compress()``, ``flush()``, and ``finish()`` all return an iterator of |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
497 ``bytes`` instances holding compressed data. The iterator may be empty. Callers |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
498 MUST iterate through all elements of the returned iterator before performing |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
499 another operation on the object. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
500 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
501 All chunks emitted by ``compress()`` will have a length of ``chunk_size``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
502 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
503 ``flush()`` and ``finish()`` may return a final chunk smaller than |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
504 ``chunk_size``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
505 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
506 Here is how the API should be used:: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
507 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
508 cctx = zstd.ZstdCompressor() |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
509 chunker = cctx.chunker(chunk_size=32768) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
510 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
511 with open(path, 'rb') as fh: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
512 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
513 in_chunk = fh.read(32768) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
514 if not in_chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
515 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
516 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
517 for out_chunk in chunker.compress(in_chunk): |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
518 # Do something with output chunk of size 32768. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
519 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
520 for out_chunk in chunker.finish(): |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
521 # Do something with output chunks that finalize the zstd frame. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
522 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
523 The ``chunker()`` API is often a better alternative to ``compressobj()``. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
524 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
525 ``compressobj()`` will emit output data as it is available. This results in a |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
526 *stream* of output chunks of varying sizes. The consistency of the output chunk |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
527 size with ``chunker()`` is more appropriate for many usages, such as sending |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
528 compressed data to a socket. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
529 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
530 ``compressobj()`` may also perform extra memory reallocations in order to |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
531 dynamically adjust the sizes of the output chunks. Since ``chunker()`` output |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
532 chunks are all the same size (except for flushed or final chunks), there is |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
533 less memory allocation overhead. |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
534 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
535 Batch Compression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
536 ^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
537 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
538 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
539 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
540 ``multi_compress_to_buffer(data, [threads=0])`` performs compression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
541 inputs as a single operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
542 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
543 Data to be compressed can be passed as a ``BufferWithSegmentsCollection``, a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
544 ``BufferWithSegments``, or a list containing byte like objects. Each element of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
545 the container will be compressed individually using the configured parameters |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
546 on the ``ZstdCompressor`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
547 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
548 The ``threads`` argument controls how many threads to use for compression. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
549 default is ``0`` which means to use a single thread. Negative values use the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
550 number of logical CPUs in the machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
551 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
552 The function returns a ``BufferWithSegmentsCollection``. This type represents |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
553 N discrete memory allocations, eaching holding 1 or more compressed frames. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
554 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
555 Output data is written to shared memory buffers. This means that unlike |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
556 regular Python objects, a reference to *any* object within the collection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
557 keeps the shared buffer and therefore memory backing it alive. This can have |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
558 undesirable effects on process memory usage. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
559 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
560 The API and behavior of this function is experimental and will likely change. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
561 Known deficiencies include: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
562 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
563 * If asked to use multiple threads, it will always spawn that many threads, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
564 even if the input is too small to use them. It should automatically lower |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
565 the thread count when the extra threads would just add overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
566 * The buffer allocation strategy is fixed. There is room to make it dynamic, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
567 perhaps even to allow one output buffer per input, facilitating a variation |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
568 of the API to return a list without the adverse effects of shared memory |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
569 buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
570 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
571 ZstdDecompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
572 ---------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
573 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
574 The ``ZstdDecompressor`` class provides an interface for performing |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
575 decompression. It is effectively a wrapper around the ``ZSTD_DCtx`` type from |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
576 the C API. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
577 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
578 Each instance is associated with parameters that control decompression. These |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
579 come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
580 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
581 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
582 Compression dictionary to use. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
583 max_window_size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
584 Sets an uppet limit on the window size for decompression operations in |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
585 kibibytes. This setting can be used to prevent large memory allocations |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
586 for inputs using large compression windows. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
587 format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
588 Set the format of data for the decoder. By default, this is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
589 ``zstd.FORMAT_ZSTD1``. It can be set to ``zstd.FORMAT_ZSTD1_MAGICLESS`` to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
590 allow decoding frames without the 4 byte magic header. Not all decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
591 APIs support this mode. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
592 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
593 The interface of this class is very similar to ``ZstdCompressor`` (by design). |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
594 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
595 Unless specified otherwise, assume that no two methods of ``ZstdDecompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
596 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
597 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
598 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
599 Utility Methods |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
600 ^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
601 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
602 ``memory_size()`` obtains the size of the underlying zstd decompression context, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
603 in bytes.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
604 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
605 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
606 size = dctx.memory_size() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
607 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
608 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
609 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
610 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
611 ``decompress(data)`` can be used to decompress an entire compressed zstd |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
612 frame in a single operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
613 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
614 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
615 decompressed = dctx.decompress(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
616 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
617 By default, ``decompress(data)`` will only work on data written with the content |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
618 size encoded in its header (this is the default behavior of |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
619 ``ZstdCompressor().compress()`` but may not be true for streaming compression). If |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
620 compressed data without an embedded content size is seen, ``zstd.ZstdError`` will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
621 be raised. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
622 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
623 If the compressed data doesn't have its content size embedded within it, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
624 decompression can be attempted by specifying the ``max_output_size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
625 argument.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
626 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
627 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
628 uncompressed = dctx.decompress(data, max_output_size=1048576) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
629 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
630 Ideally, ``max_output_size`` will be identical to the decompressed output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
631 size. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
632 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
633 If ``max_output_size`` is too small to hold the decompressed data, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
634 ``zstd.ZstdError`` will be raised. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
635 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
636 If ``max_output_size`` is larger than the decompressed data, the allocated |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
637 output buffer will be resized to only use the space required. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
638 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
639 Please note that an allocation of the requested ``max_output_size`` will be |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
640 performed every time the method is called. Setting to a very large value could |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
641 result in a lot of work for the memory allocator and may result in |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
642 ``MemoryError`` being raised if the allocation fails. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
643 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
644 .. important:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
645 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
646 If the exact size of decompressed data is unknown (not passed in explicitly |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
647 and not stored in the zstandard frame), for performance reasons it is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
648 encouraged to use a streaming API. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
649 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
650 Stream Reader API |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
651 ^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
652 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
653 ``stream_reader(source)`` can be used to obtain an object conforming to the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
654 ``io.RawIOBase`` interface for reading decompressed output as a stream:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
655 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
656 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
657 dctx = zstd.ZstdDecompressor() |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
658 reader = dctx.stream_reader(fh) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
659 while True: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
660 chunk = reader.read(16384) |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
661 if not chunk: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
662 break |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
663 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
664 # Do something with decompressed chunk. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
665 |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
666 The stream can also be used as a context manager:: |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
667 |
40121
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
668 with open(path, 'rb') as fh: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
669 dctx = zstd.ZstdDecompressor() |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
670 with dctx.stream_reader(fh) as reader: |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
671 ... |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
672 |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
673 When used as a context manager, the stream is closed and the underlying |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
674 resources are released when the context manager exits. Future operations against |
73fef626dae3
zstandard: vendor python-zstandard 0.10.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
37495
diff
changeset
|
675 the stream will fail. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
676 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
677 The ``source`` argument to ``stream_reader()`` can be any object with a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
678 ``read(size)`` method or any object implementing the *buffer protocol*. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
679 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
680 If the ``source`` is a stream, you can specify how large ``read()`` requests |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
681 to that stream should be via the ``read_size`` argument. It defaults to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
682 ``zstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE``.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
683 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
684 with open(path, 'rb') as fh: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
685 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
686 # Will perform fh.read(8192) when obtaining data for the decompressor. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
687 with dctx.stream_reader(fh, read_size=8192) as reader: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
688 ... |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
689 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
690 The stream returned by ``stream_reader()`` is not writable. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
691 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
692 The stream returned by ``stream_reader()`` is *partially* seekable. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
693 Absolute and relative positions (``SEEK_SET`` and ``SEEK_CUR``) forward |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
694 of the current position are allowed. Offsets behind the current read |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
695 position and offsets relative to the end of stream are not allowed and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
696 will raise ``ValueError`` if attempted. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
697 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
698 ``tell()`` returns the number of decompressed bytes read so far. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
699 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
700 Not all I/O methods are implemented. Notably missing is support for |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
701 ``readline()``, ``readlines()``, and linewise iteration support. This is |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
702 because streams operate on binary data - not text data. If you want to |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
703 convert decompressed output to text, you can chain an ``io.TextIOWrapper`` |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
704 to the stream:: |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
705 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
706 with open(path, 'rb') as fh: |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
707 dctx = zstd.ZstdDecompressor() |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
708 stream_reader = dctx.stream_reader(fh) |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
709 text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
710 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
711 for line in text_stream: |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
712 ... |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
713 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
714 The ``read_across_frames`` argument to ``stream_reader()`` controls the |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
715 behavior of read operations when the end of a zstd *frame* is encountered. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
716 When ``False`` (the default), a read will complete when the end of a |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
717 zstd *frame* is encountered. When ``True``, a read can potentially |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
718 return data spanning multiple zstd *frames*. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
719 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
720 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
721 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
722 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
723 ``stream_writer(fh)`` allows you to *stream* data into a decompressor. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
724 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
725 Returned instances implement the ``io.RawIOBase`` interface. Only methods |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
726 that involve writing will do useful things. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
727 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
728 The argument to ``stream_writer()`` is typically an object that also implements |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
729 ``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
730 common Python types conform to this interface, including open file handles |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
731 and ``io.BytesIO``. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
732 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
733 Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
734 is sent to the decompressor by calling ``write(data)`` and decompressed |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
735 output is written to the underlying stream by calling its ``write(data)`` |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
736 method.:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
737 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
738 dctx = zstd.ZstdDecompressor() |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
739 decompressor = dctx.stream_writer(fh) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
740 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
741 decompressor.write(compressed_data) |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
742 ... |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
743 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
744 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
745 Calls to ``write()`` will return the number of bytes written to the output |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
746 object. Not all inputs will result in bytes being written, so return values |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
747 of ``0`` are possible. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
748 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
749 Like the ``stream_writer()`` compressor, instances can be used as context |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
750 managers. However, context managers add no extra special behavior and offer |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
751 little to no benefit to being used. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
752 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
753 Calling ``close()`` will mark the stream as closed and subsequent I/O operations |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
754 will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``). |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
755 ``close()`` will also call ``close()`` on the underlying stream if such a |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
756 method exists. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
757 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
758 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
759 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
760 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
761 with dctx.stream_writer(fh, write_size=16384) as decompressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
762 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
763 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
764 You can see how much memory is being used by the decompressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
765 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
766 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
767 with dctx.stream_writer(fh) as decompressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
768 byte_size = decompressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
769 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
770 ``stream_writer()`` accepts a ``write_return_read`` boolean argument to control |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
771 the return value of ``write()``. When ``False`` (the default)``, ``write()`` |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
772 returns the number of bytes that were ``write()``en to the underlying stream. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
773 When ``True``, ``write()`` returns the number of bytes read from the input. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
774 ``True`` is the *proper* behavior for ``write()`` as specified by the |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
775 ``io.RawIOBase`` interface and will become the default in a future release. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
776 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
777 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
778 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
779 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
780 ``read_to_iter(fh)`` provides a mechanism to stream decompressed data out of a |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
781 compressed source as an iterator of data chunks.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
782 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
783 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
784 for chunk in dctx.read_to_iter(fh): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
785 # Do something with original data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
786 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
787 ``read_to_iter()`` accepts an object with a ``read(size)`` method that will |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
788 return compressed bytes or an object conforming to the buffer protocol that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
789 can expose its data as a contiguous range of bytes. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
790 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
791 ``read_to_iter()`` returns an iterator whose elements are chunks of the |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
792 decompressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
793 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
794 The size of requested ``read()`` from the source can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
795 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
796 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
797 for chunk in dctx.read_to_iter(fh, read_size=16384): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
798 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
799 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
800 It is also possible to skip leading bytes in the input data:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
801 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
802 dctx = zstd.ZstdDecompressor() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
803 for chunk in dctx.read_to_iter(fh, skip_bytes=1): |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
804 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
805 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
806 .. tip:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
807 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
808 Skipping leading bytes is useful if the source data contains extra |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
809 *header* data. Traditionally, you would need to create a slice or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
810 ``memoryview`` of the data you want to decompress. This would create |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
811 overhead. It is more efficient to pass the offset into this API. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
812 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
813 Similarly to ``ZstdCompressor.read_to_iter()``, the consumer of the iterator |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
814 controls when data is decompressed. If the iterator isn't consumed, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
815 decompression is put on hold. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
816 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
817 When ``read_to_iter()`` is passed an object conforming to the buffer protocol, |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
818 the behavior may seem similar to what occurs when the simple decompression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
819 API is used. However, this API works when the decompressed size is unknown. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
820 Furthermore, if feeding large inputs, the decompressor will work in chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
821 instead of performing a single operation. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
822 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
823 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
824 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
825 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
826 ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
827 performing decompression.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
828 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
829 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
830 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
831 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
832 e.g. to decompress a file to another file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
833 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
834 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
835 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
836 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
837 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
838 The size of chunks being ``read()`` and ``write()`` from and to the streams |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
839 can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
840 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
841 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
842 dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
843 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
844 Decompressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
845 ^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
846 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
847 ``decompressobj()`` returns an object that exposes a ``decompress(data)`` |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
848 method. Compressed data chunks are fed into ``decompress(data)`` and |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
849 uncompressed output (or an empty bytes) is returned. Output from subsequent |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
850 calls needs to be concatenated to reassemble the full decompressed byte |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
851 sequence. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
852 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
853 The purpose of ``decompressobj()`` is to provide an API-compatible interface |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
854 with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
855 to swap in different decompressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
856 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
857 Each object is single use: once an input frame is decoded, ``decompress()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
858 can no longer be called. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
859 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
860 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
861 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
862 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
863 dobj = dctx.decompressobj() |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
864 data = dobj.decompress(compressed_chunk_0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
865 data = dobj.decompress(compressed_chunk_1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
866 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
867 By default, calls to ``decompress()`` write output data in chunks of size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
868 ``DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE``. These chunks are concatenated |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
869 before being returned to the caller. It is possible to define the size of |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
870 these temporary chunks by passing ``write_size`` to ``decompressobj()``:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
871 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
872 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
873 dobj = dctx.decompressobj(write_size=1048576) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
874 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
875 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
876 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
877 Because calls to ``decompress()`` may need to perform multiple |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
878 memory (re)allocations, this streaming decompression API isn't as |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
879 efficient as other APIs. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
880 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
881 For compatibility with the standard library APIs, instances expose a |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
882 ``flush([length=None])`` method. This method no-ops and has no meaningful |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
883 side-effects, making it safe to call any time. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
884 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
885 Batch Decompression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
886 ^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
887 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
888 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
889 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
890 ``multi_decompress_to_buffer()`` performs decompression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
891 frames as a single operation and returns a ``BufferWithSegmentsCollection`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
892 containing decompressed data for all inputs. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
893 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
894 Compressed frames can be passed to the function as a ``BufferWithSegments``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
895 a ``BufferWithSegmentsCollection``, or as a list containing objects that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
896 conform to the buffer protocol. For best performance, pass a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
897 ``BufferWithSegmentsCollection`` or a ``BufferWithSegments``, as |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
898 minimal input validation will be done for that type. If calling from |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
899 Python (as opposed to C), constructing one of these instances may add |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
900 overhead cancelling out the performance overhead of validation for list |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
901 inputs.:: |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
902 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
903 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
904 results = dctx.multi_decompress_to_buffer([b'...', b'...']) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
905 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
906 The decompressed size of each frame MUST be discoverable. It can either be |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
907 embedded within the zstd frame (``write_content_size=True`` argument to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
908 ``ZstdCompressor``) or passed in via the ``decompressed_sizes`` argument. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
909 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
910 The ``decompressed_sizes`` argument is an object conforming to the buffer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
911 protocol which holds an array of 64-bit unsigned integers in the machine's |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
912 native format defining the decompressed sizes of each frame. If this argument |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
913 is passed, it avoids having to scan each frame for its decompressed size. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
914 This frame scanning can add noticeable overhead in some scenarios.:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
915 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
916 frames = [...] |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
917 sizes = struct.pack('=QQQQ', len0, len1, len2, len3) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
918 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
919 dctx = zstd.ZstdDecompressor() |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
920 results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes) |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
921 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
922 The ``threads`` argument controls the number of threads to use to perform |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
923 decompression operations. The default (``0``) or the value ``1`` means to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
924 use a single thread. Negative values use the number of logical CPUs in the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
925 machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
926 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
927 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
928 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
929 It is possible to pass a ``mmap.mmap()`` instance into this function by |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
930 wrapping it with a ``BufferWithSegments`` instance (which will define the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
931 offsets of frames within the memory mapped region). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
932 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
933 This function is logically equivalent to performing ``dctx.decompress()`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
934 on each input frame and returning the result. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
935 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
936 This function exists to perform decompression on multiple frames as fast |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
937 as possible by having as little overhead as possible. Since decompression is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
938 performed as a single operation and since the decompressed output is stored in |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
939 a single buffer, extra memory allocations, Python objects, and Python function |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
940 calls are avoided. This is ideal for scenarios where callers know up front that |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
941 they need to access data for multiple frames, such as when *delta chains* are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
942 being used. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
943 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
944 Currently, the implementation always spawns multiple threads when requested, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
945 even if the amount of work to do is small. In the future, it will be smarter |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
946 about avoiding threads and their associated overhead when the amount of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
947 work to do is small. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
948 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
949 Prefix Dictionary Chain Decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
950 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
951 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
952 ``decompress_content_dict_chain(frames)`` performs decompression of a list of |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
953 zstd frames produced using chained *prefix* dictionary compression. Such |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
954 a list of frames is produced by compressing discrete inputs where each |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
955 non-initial input is compressed with a *prefix* dictionary consisting of the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
956 content of the previous input. |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
957 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
958 For example, say you have the following inputs:: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
959 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
960 inputs = [b'input 1', b'input 2', b'input 3'] |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
961 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
962 The zstd frame chain consists of: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
963 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
964 1. ``b'input 1'`` compressed in standalone/discrete mode |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
965 2. ``b'input 2'`` compressed using ``b'input 1'`` as a *prefix* dictionary |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
966 3. ``b'input 3'`` compressed using ``b'input 2'`` as a *prefix* dictionary |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
967 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
968 Each zstd frame **must** have the content size written. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
969 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
970 The following Python code can be used to produce a *prefix dictionary chain*:: |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
971 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
972 def make_chain(inputs): |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
973 frames = [] |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
974 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
975 # First frame is compressed in standalone/discrete mode. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
976 zctx = zstd.ZstdCompressor() |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
977 frames.append(zctx.compress(inputs[0])) |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
978 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
979 # Subsequent frames use the previous fulltext as a prefix dictionary |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
980 for i, raw in enumerate(inputs[1:]): |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
981 dict_data = zstd.ZstdCompressionDict( |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
982 inputs[i], dict_type=zstd.DICT_TYPE_RAWCONTENT) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
983 zctx = zstd.ZstdCompressor(dict_data=dict_data) |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
984 frames.append(zctx.compress(raw)) |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
985 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
986 return frames |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
987 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
988 ``decompress_content_dict_chain()`` returns the uncompressed data of the last |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
989 element in the input chain. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
990 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
991 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
992 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
993 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
994 It is possible to implement *prefix dictionary chain* decompression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
995 on top of other APIs. However, this function will likely be faster - |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
996 especially for long input chains - as it avoids the overhead of instantiating |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
997 and passing around intermediate objects between C and Python. |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
998 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
999 Multi-Threaded Compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1000 -------------------------- |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1001 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1002 ``ZstdCompressor`` accepts a ``threads`` argument that controls the number |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1003 of threads to use for compression. The way this works is that input is split |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1004 into segments and each segment is fed into a worker pool for compression. Once |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1005 a segment is compressed, it is flushed/appended to the output. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1006 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1007 .. note:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1008 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1009 These threads are created at the C layer and are not Python threads. So they |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1010 work outside the GIL. It is therefore possible to CPU saturate multiple cores |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1011 from Python. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1012 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1013 The segment size for multi-threaded compression is chosen from the window size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1014 of the compressor. This is derived from the ``window_log`` attribute of a |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1015 ``ZstdCompressionParameters`` instance. By default, segment sizes are in the 1+MB |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1016 range. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1017 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1018 If multi-threaded compression is requested and the input is smaller than the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1019 configured segment size, only a single compression thread will be used. If the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1020 input is smaller than the segment size multiplied by the thread pool size or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1021 if data cannot be delivered to the compressor fast enough, not all requested |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1022 compressor threads may be active simultaneously. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1023 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1024 Compared to non-multi-threaded compression, multi-threaded compression has |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1025 higher per-operation overhead. This includes extra memory operations, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1026 thread creation, lock acquisition, etc. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1027 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1028 Due to the nature of multi-threaded compression using *N* compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1029 *states*, the output from multi-threaded compression will likely be larger |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1030 than non-multi-threaded compression. The difference is usually small. But |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1031 there is a CPU/wall time versus size trade off that may warrant investigation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1032 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1033 Output from multi-threaded compression does not require any special handling |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1034 on the decompression side. To the decompressor, data generated with single |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1035 threaded compressor looks the same as data generated by a multi-threaded |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1036 compressor and does not require any special handling or additional resource |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1037 requirements. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1038 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1039 Dictionary Creation and Management |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1040 ---------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1041 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1042 Compression dictionaries are represented with the ``ZstdCompressionDict`` type. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1043 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1044 Instances can be constructed from bytes:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1045 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1046 dict_data = zstd.ZstdCompressionDict(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1047 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1048 It is possible to construct a dictionary from *any* data. If the data doesn't |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1049 begin with a magic header, it will be treated as a *prefix* dictionary. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1050 *Prefix* dictionaries allow compression operations to reference raw data |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1051 within the dictionary. |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1052 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1053 It is possible to force the use of *prefix* dictionaries or to require a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1054 dictionary header: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1055 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1056 dict_data = zstd.ZstdCompressionDict(data, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1057 dict_type=zstd.DICT_TYPE_RAWCONTENT) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1058 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1059 dict_data = zstd.ZstdCompressionDict(data, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1060 dict_type=zstd.DICT_TYPE_FULLDICT) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1061 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1062 You can see how many bytes are in the dictionary by calling ``len()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1063 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1064 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1065 dict_size = len(dict_data) # will not be larger than ``size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1066 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1067 Once you have a dictionary, you can pass it to the objects performing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1068 compression and decompression:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1069 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1070 dict_data = zstd.train_dictionary(131072, samples) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1071 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1072 cctx = zstd.ZstdCompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1073 for source_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1074 compressed = cctx.compress(source_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1075 # Do something with compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1076 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1077 dctx = zstd.ZstdDecompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1078 for compressed_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1079 buffer = io.BytesIO() |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1080 with dctx.stream_writer(buffer) as decompressor: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1081 decompressor.write(compressed_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1082 # Do something with raw data in ``buffer``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1083 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1084 Dictionaries have unique integer IDs. You can retrieve this ID via:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1085 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1086 dict_id = zstd.dictionary_id(dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1087 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1088 You can obtain the raw data in the dict (useful for persisting and constructing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1089 a ``ZstdCompressionDict`` later) via ``as_bytes()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1090 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1091 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1092 raw_data = dict_data.as_bytes() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1093 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1094 By default, when a ``ZstdCompressionDict`` is *attached* to a |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1095 ``ZstdCompressor``, each ``ZstdCompressor`` performs work to prepare the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1096 dictionary for use. This is fine if only 1 compression operation is being |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1097 performed or if the ``ZstdCompressor`` is being reused for multiple operations. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1098 But if multiple ``ZstdCompressor`` instances are being used with the dictionary, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1099 this can add overhead. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1100 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1101 It is possible to *precompute* the dictionary so it can readily be consumed |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1102 by multiple ``ZstdCompressor`` instances:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1103 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1104 d = zstd.ZstdCompressionDict(data) |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1105 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1106 # Precompute for compression level 3. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1107 d.precompute_compress(level=3) |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1108 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1109 # Precompute with specific compression parameters. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1110 params = zstd.ZstdCompressionParameters(...) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1111 d.precompute_compress(compression_params=params) |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1112 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1113 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1114 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1115 When a dictionary is precomputed, the compression parameters used to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1116 precompute the dictionary overwrite some of the compression parameters |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1117 specified to ``ZstdCompressor.__init__``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1118 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1119 Training Dictionaries |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1120 ^^^^^^^^^^^^^^^^^^^^^ |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1121 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1122 Unless using *prefix* dictionaries, dictionary data is produced by *training* |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1123 on existing data:: |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1124 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1125 dict_data = zstd.train_dictionary(size, samples) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1126 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1127 This takes a target dictionary size and list of bytes instances and creates and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1128 returns a ``ZstdCompressionDict``. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1129 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1130 The dictionary training mechanism is known as *cover*. More details about it are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1131 available in the paper *Effective Construction of Relative Lempel-Ziv |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1132 Dictionaries* (authors: Liao, Petri, Moffat, Wirth). |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1133 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1134 The cover algorithm takes parameters ``k` and ``d``. These are the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1135 *segment size* and *dmer size*, respectively. The returned dictionary |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1136 instance created by this function has ``k`` and ``d`` attributes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1137 containing the values for these parameters. If a ``ZstdCompressionDict`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1138 is constructed from raw bytes data (a content-only dictionary), the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1139 ``k`` and ``d`` attributes will be ``0``. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1140 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1141 The segment and dmer size parameters to the cover algorithm can either be |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1142 specified manually or ``train_dictionary()`` can try multiple values |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1143 and pick the best one, where *best* means the smallest compressed data size. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1144 This later mode is called *optimization* mode. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1145 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1146 If none of ``k``, ``d``, ``steps``, ``threads``, ``level``, ``notifications``, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1147 or ``dict_id`` (basically anything from the underlying ``ZDICT_cover_params_t`` |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1148 struct) are defined, *optimization* mode is used with default parameter |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1149 values. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1150 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1151 If ``steps`` or ``threads`` are defined, then *optimization* mode is engaged |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1152 with explicit control over those parameters. Specifying ``threads=0`` or |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1153 ``threads=1`` can be used to engage *optimization* mode if other parameters |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1154 are not defined. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1155 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1156 Otherwise, non-*optimization* mode is used with the parameters specified. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1157 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1158 This function takes the following arguments: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1159 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1160 dict_size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1161 Target size in bytes of the dictionary to generate. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1162 samples |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1163 A list of bytes holding samples the dictionary will be trained from. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1164 k |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1165 Parameter to cover algorithm defining the segment size. A reasonable range |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1166 is [16, 2048+]. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1167 d |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1168 Parameter to cover algorithm defining the dmer size. A reasonable range is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1169 [6, 16]. ``d`` must be less than or equal to ``k``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1170 dict_id |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1171 Integer dictionary ID for the produced dictionary. Default is 0, which uses |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1172 a random value. |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1173 steps |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1174 Number of steps through ``k`` values to perform when trying parameter |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1175 variations. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1176 threads |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1177 Number of threads to use when trying parameter variations. Default is 0, |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1178 which means to use a single thread. A negative value can be specified to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1179 use as many threads as there are detected logical CPUs. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1180 level |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1181 Integer target compression level when trying parameter variations. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1182 notifications |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1183 Controls writing of informational messages to ``stderr``. ``0`` (the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1184 default) means to write nothing. ``1`` writes errors. ``2`` writes |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1185 progression info. ``3`` writes more details. And ``4`` writes all info. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1186 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1187 Explicit Compression Parameters |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1188 ------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1189 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1190 Zstandard offers a high-level *compression level* that maps to lower-level |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1191 compression parameters. For many consumers, this numeric level is the only |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1192 compression setting you'll need to touch. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1193 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1194 But for advanced use cases, it might be desirable to tweak these lower-level |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1195 settings. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1196 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1197 The ``ZstdCompressionParameters`` type represents these low-level compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1198 settings. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1199 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1200 Instances of this type can be constructed from a myriad of keyword arguments |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1201 (defined below) for complete low-level control over each adjustable |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1202 compression setting. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1203 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1204 From a higher level, one can construct a ``ZstdCompressionParameters`` instance |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1205 given a desired compression level and target input and dictionary size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1206 using ``ZstdCompressionParameters.from_level()``. e.g.:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1207 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1208 # Derive compression settings for compression level 7. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1209 params = zstd.ZstdCompressionParameters.from_level(7) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1210 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1211 # With an input size of 1MB |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1212 params = zstd.ZstdCompressionParameters.from_level(7, source_size=1048576) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1213 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1214 Using ``from_level()``, it is also possible to override individual compression |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1215 parameters or to define additional settings that aren't automatically derived. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1216 e.g.:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1217 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1218 params = zstd.ZstdCompressionParameters.from_level(4, window_log=10) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1219 params = zstd.ZstdCompressionParameters.from_level(5, threads=4) |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1220 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1221 Or you can define low-level compression settings directly:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1222 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1223 params = zstd.ZstdCompressionParameters(window_log=12, enable_ldm=True) |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1224 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1225 Once a ``ZstdCompressionParameters`` instance is obtained, it can be used to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1226 configure a compressor:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1227 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1228 cctx = zstd.ZstdCompressor(compression_params=params) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1229 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1230 The named arguments and attributes of ``ZstdCompressionParameters`` are as |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1231 follows: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1232 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1233 * format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1234 * compression_level |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1235 * window_log |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1236 * hash_log |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1237 * chain_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1238 * search_log |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1239 * min_match |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1240 * target_length |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1241 * strategy |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1242 * compression_strategy (deprecated: same as ``strategy``) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1243 * write_content_size |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1244 * write_checksum |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1245 * write_dict_id |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1246 * job_size |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1247 * overlap_log |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1248 * overlap_size_log (deprecated: same as ``overlap_log``) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1249 * force_max_window |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1250 * enable_ldm |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1251 * ldm_hash_log |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1252 * ldm_min_match |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1253 * ldm_bucket_size_log |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1254 * ldm_hash_rate_log |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1255 * ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1256 * threads |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1257 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1258 Some of these are very low-level settings. It may help to consult the official |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1259 zstandard documentation for their behavior. Look for the ``ZSTD_p_*`` constants |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1260 in ``zstd.h`` (https://github.com/facebook/zstd/blob/dev/lib/zstd.h). |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1261 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1262 Frame Inspection |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1263 ---------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1264 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1265 Data emitted from zstd compression is encapsulated in a *frame*. This frame |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1266 begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1267 the frame in more detail. For more info, see |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1268 https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1269 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1270 ``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1271 instance and return a ``FrameParameters`` object describing the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1272 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1273 Depending on which fields are present in the frame and their values, the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1274 length of the frame parameters varies. If insufficient bytes are passed |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1275 in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1276 frame parameters can be parsed, pass in at least 18 bytes. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1277 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1278 ``FrameParameters`` instances have the following attributes: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1279 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1280 content_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1281 Integer size of original, uncompressed content. This will be ``0`` if the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1282 original content size isn't written to the frame (controlled with the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1283 ``write_content_size`` argument to ``ZstdCompressor``) or if the input |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1284 content size was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1285 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1286 window_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1287 Integer size of maximum back-reference distance in compressed data. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1288 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1289 dict_id |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1290 Integer of dictionary ID used for compression. ``0`` if no dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1291 ID was used or if the dictionary ID was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1292 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1293 has_checksum |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1294 Bool indicating whether a 4 byte content checksum is stored at the end |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1295 of the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1296 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1297 ``zstd.frame_header_size(data)`` returns the size of the zstandard frame |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1298 header. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1299 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1300 ``zstd.frame_content_size(data)`` returns the content size as parsed from |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1301 the frame header. ``-1`` means the content size is unknown. ``0`` means |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1302 an empty frame. The content size is usually correct. However, it may not |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1303 be accurate. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1304 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1305 Misc Functionality |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1306 ------------------ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1307 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1308 estimate_decompression_context_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1309 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1310 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1311 Estimate the memory size requirements for a decompressor instance. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1312 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1313 Constants |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1314 --------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1315 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1316 The following module constants/attributes are exposed: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1317 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1318 ZSTD_VERSION |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1319 This module attribute exposes a 3-tuple of the Zstandard version. e.g. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1320 ``(1, 0, 0)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1321 MAX_COMPRESSION_LEVEL |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1322 Integer max compression level accepted by compression functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1323 COMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1324 Recommended chunk size to feed to compressor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1325 COMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1326 Recommended chunk size for compression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1327 DECOMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1328 Recommended chunk size to feed into decompresor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1329 DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1330 Recommended chunk size for decompression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1331 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1332 FRAME_HEADER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1333 bytes containing header of the Zstandard frame |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1334 MAGIC_NUMBER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1335 Frame header as an integer |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1336 |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1337 FLUSH_BLOCK |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1338 Flushing behavior that denotes to flush a zstd block. A decompressor will |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1339 be able to decode all data fed into the compressor so far. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1340 FLUSH_FRAME |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1341 Flushing behavior that denotes to end a zstd frame. Any new data fed |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1342 to the compressor will start a new frame. |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1343 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1344 CONTENTSIZE_UNKNOWN |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1345 Value for content size when the content size is unknown. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1346 CONTENTSIZE_ERROR |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1347 Value for content size when content size couldn't be determined. |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1348 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1349 WINDOWLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1350 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1351 WINDOWLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1352 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1353 CHAINLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1354 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1355 CHAINLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1356 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1357 HASHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1358 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1359 HASHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1360 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1361 SEARCHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1362 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1363 SEARCHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1364 Maximum value for compression parameter |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1365 MINMATCH_MIN |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1366 Minimum value for compression parameter |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1367 MINMATCH_MAX |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1368 Maximum value for compression parameter |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1369 SEARCHLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1370 Minimum value for compression parameter |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1371 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1372 Deprecated: use ``MINMATCH_MIN`` |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1373 SEARCHLENGTH_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1374 Maximum value for compression parameter |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1375 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1376 Deprecated: use ``MINMATCH_MAX`` |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1377 TARGETLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1378 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1379 STRATEGY_FAST |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1380 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1381 STRATEGY_DFAST |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1382 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1383 STRATEGY_GREEDY |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1384 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1385 STRATEGY_LAZY |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1386 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1387 STRATEGY_LAZY2 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1388 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1389 STRATEGY_BTLAZY2 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1390 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1391 STRATEGY_BTOPT |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1392 Compression strategy |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1393 STRATEGY_BTULTRA |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1394 Compression strategy |
42070
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1395 STRATEGY_BTULTRA2 |
675775c33ab6
zstandard: vendor python-zstandard 0.11
Gregory Szorc <gregory.szorc@gmail.com>
parents:
40121
diff
changeset
|
1396 Compression strategy |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1397 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1398 FORMAT_ZSTD1 |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1399 Zstandard frame format |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1400 FORMAT_ZSTD1_MAGICLESS |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1401 Zstandard frame format without magic header |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1402 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1403 Performance Considerations |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1404 -------------------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1405 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1406 The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1407 persistent compression or decompression *context*. Reusing a ``ZstdCompressor`` |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1408 or ``ZstdDecompressor`` instance for multiple operations is faster than |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1409 instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1410 operation. The differences are magnified as the size of data decreases. For |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1411 example, the difference between *context* reuse and non-reuse for 100,000 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1412 100 byte inputs will be significant (possiby over 10x faster to reuse contexts) |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1413 whereas 10 100,000,000 byte inputs will be more similar in speed (because the |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1414 time spent doing compression dwarfs time spent creating new *contexts*). |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1415 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1416 Buffer Types |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1417 ------------ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1418 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1419 The API exposes a handful of custom types for interfacing with memory buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1420 The primary goal of these types is to facilitate efficient multi-object |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1421 operations. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1422 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1423 The essential idea is to have a single memory allocation provide backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1424 storage for multiple logical objects. This has 2 main advantages: fewer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1425 allocations and optimal memory access patterns. This avoids having to allocate |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1426 a Python object for each logical object and furthermore ensures that access of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1427 data for objects can be sequential (read: fast) in memory. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1428 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1429 BufferWithSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1430 ^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1431 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1432 The ``BufferWithSegments`` type represents a memory buffer containing N |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1433 discrete items of known lengths (segments). It is essentially a fixed size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1434 memory address and an array of 2-tuples of ``(offset, length)`` 64-bit |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1435 unsigned native endian integers defining the byte offset and length of each |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1436 segment within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1437 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1438 Instances behave like containers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1439 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1440 ``len()`` returns the number of segments within the instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1441 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1442 ``o[index]`` or ``__getitem__`` obtains a ``BufferSegment`` representing an |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1443 individual segment within the backing buffer. That returned object references |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1444 (not copies) memory. This means that iterating all objects doesn't copy |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1445 data within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1446 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1447 The ``.size`` attribute contains the total size in bytes of the backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1448 buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1449 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1450 Instances conform to the buffer protocol. So a reference to the backing bytes |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1451 can be obtained via ``memoryview(o)``. A *copy* of the backing bytes can also |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1452 be obtained via ``.tobytes()``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1453 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1454 The ``.segments`` attribute exposes the array of ``(offset, length)`` for |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1455 segments within the buffer. It is a ``BufferSegments`` type. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1456 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1457 BufferSegment |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1458 ^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1459 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1460 The ``BufferSegment`` type represents a segment within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1461 It is essentially a reference to N bytes within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1462 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1463 ``len()`` returns the length of the segment in bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1464 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1465 ``.offset`` contains the byte offset of this segment within its parent |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1466 ``BufferWithSegments`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1467 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1468 The object conforms to the buffer protocol. ``.tobytes()`` can be called to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1469 obtain a ``bytes`` instance with a copy of the backing bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1470 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1471 BufferSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1472 ^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1473 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1474 This type represents an array of ``(offset, length)`` integers defining segments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1475 within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1476 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1477 The array members are 64-bit unsigned integers using host/native bit order. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1478 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1479 Instances conform to the buffer protocol. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1480 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1481 BufferWithSegmentsCollection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1482 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1483 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1484 The ``BufferWithSegmentsCollection`` type represents a virtual spanning view |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1485 of multiple ``BufferWithSegments`` instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1486 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1487 Instances are constructed from 1 or more ``BufferWithSegments`` instances. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1488 resulting object behaves like an ordered sequence whose members are the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1489 segments within each ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1490 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1491 ``len()`` returns the number of segments within all ``BufferWithSegments`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1492 instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1493 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1494 ``o[index]`` and ``__getitem__(index)`` return the ``BufferSegment`` at |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1495 that offset as if all ``BufferWithSegments`` instances were a single |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1496 entity. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1497 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1498 If the object is composed of 2 ``BufferWithSegments`` instances with the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1499 first having 2 segments and the second have 3 segments, then ``b[0]`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1500 and ``b[1]`` access segments in the first object and ``b[2]``, ``b[3]``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1501 and ``b[4]`` access segments from the second. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1502 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1503 Choosing an API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1504 =============== |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1505 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1506 There are multiple APIs for performing compression and decompression. This is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1507 because different applications have different needs and the library wants to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1508 facilitate optimal use in as many use cases as possible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1509 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1510 From a high-level, APIs are divided into *one-shot* and *streaming*: either you |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1511 are operating on all data at once or you operate on it piecemeal. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1512 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1513 The *one-shot* APIs are useful for small data, where the input or output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1514 size is known. (The size can come from a buffer length, file size, or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1515 stored in the zstd frame header.) A limitation of the *one-shot* APIs is that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1516 input and output must fit in memory simultaneously. For say a 4 GB input, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1517 this is often not feasible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1518 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1519 The *one-shot* APIs also perform all work as a single operation. So, if you |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1520 feed it large input, it could take a long time for the function to return. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1521 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1522 The streaming APIs do not have the limitations of the simple API. But the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1523 price you pay for this flexibility is that they are more complex than a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1524 single function call. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1525 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1526 The streaming APIs put the caller in control of compression and decompression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1527 behavior by allowing them to directly control either the input or output side |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1528 of the operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1529 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1530 With the *streaming input*, *compressor*, and *decompressor* APIs, the caller |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1531 has full control over the input to the compression or decompression stream. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1532 They can directly choose when new data is operated on. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1533 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1534 With the *streaming ouput* APIs, the caller has full control over the output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1535 of the compression or decompression stream. It can choose when to receive |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1536 new data. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1537 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1538 When using the *streaming* APIs that operate on file-like or stream objects, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1539 it is important to consider what happens in that object when I/O is requested. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1540 There is potential for long pauses as data is read or written from the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1541 underlying stream (say from interacting with a filesystem or network). This |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1542 could add considerable overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1543 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1544 Thread Safety |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1545 ============= |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1546 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1547 ``ZstdCompressor`` and ``ZstdDecompressor`` instances have no guarantees |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1548 about thread safety. Do not operate on the same ``ZstdCompressor`` and |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1549 ``ZstdDecompressor`` instance simultaneously from different threads. It is |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1550 fine to have different threads call into a single instance, just not at the |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1551 same time. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1552 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1553 Some operations require multiple function calls to complete. e.g. streaming |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1554 operations. A single ``ZstdCompressor`` or ``ZstdDecompressor`` cannot be used |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1555 for simultaneously active operations. e.g. you must not start a streaming |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1556 operation when another streaming operation is already active. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1557 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1558 The C extension releases the GIL during non-trivial calls into the zstd C |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1559 API. Non-trivial calls are notably compression and decompression. Trivial |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1560 calls are things like parsing frame parameters. Where the GIL is released |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1561 is considered an implementation detail and can change in any release. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1562 |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1563 APIs that accept bytes-like objects don't enforce that the underlying object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1564 is read-only. However, it is assumed that the passed object is read-only for |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1565 the duration of the function call. It is possible to pass a mutable object |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1566 (like a ``bytearray``) to e.g. ``ZstdCompressor.compress()``, have the GIL |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1567 released, and mutate the object from another thread. Such a race condition |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1568 is a bug in the consumer of python-zstandard. Most Python data types are |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1569 immutable, so unless you are doing something fancy, you don't need to |
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1570 worry about this. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1571 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1572 Note on Zstandard's *Experimental* API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1573 ====================================== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1574 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1575 Many of the Zstandard APIs used by this module are marked as *experimental* |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1576 within the Zstandard project. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1577 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1578 It is unclear how Zstandard's C API will evolve over time, especially with |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1579 regards to this *experimental* functionality. We will try to maintain |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1580 backwards compatibility at the Python API level. However, we cannot |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1581 guarantee this for things not under our control. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1582 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1583 Since a copy of the Zstandard source code is distributed with this |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1584 module and since we compile against it, the behavior of a specific |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1585 version of this module should be constant for all of time. So if you |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1586 pin the version of this module used in your projects (which is a Python |
37495
b1fb341d8a61
zstandard: vendor python-zstandard 0.9.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
31796
diff
changeset
|
1587 best practice), you should be shielded from unwanted future changes. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1588 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1589 Donate |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1590 ====== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1591 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1592 A lot of time has been invested into this project by the author. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1593 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1594 If you find this project useful and would like to thank the author for |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1595 their work, consider donating some money. Any amount is appreciated. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1596 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1597 .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1598 :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1599 :alt: Donate via PayPal |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1600 |
42937
69de49c4e39c
zstandard: vendor python-zstandard 0.12
Gregory Szorc <gregory.szorc@gmail.com>
parents:
42070
diff
changeset
|
1601 .. |ci-status| image:: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master |
69de49c4e39c
zstandard: vendor python-zstandard 0.12
Gregory Szorc <gregory.szorc@gmail.com>
parents:
42070
diff
changeset
|
1602 :target: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master |