Mercurial > hg
annotate contrib/python-zstandard/README.rst @ 36979:b9a6ee2066f9 stable
tests: demonstrate aborted rebase strips commits that didn't need rebasing
I haven't verified, but this has probably been broken ever since I
added the feature in 78496ac30025 (rebase: allow rebase even if some
revisions need no rebase (BC) (issue5422), 2017-05-11).
Differential Revision: https://phab.mercurial-scm.org/D2877
author | Martin von Zweigbergk <martinvonz@google.com> |
---|---|
date | Thu, 15 Mar 2018 21:51:33 -0700 |
parents | e0dc40530c5a |
children | b1fb341d8a61 |
rev | line source |
---|---|
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
2 python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
3 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
4 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
5 This project provides Python bindings for interfacing with the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
6 `Zstandard <http://www.zstd.net>`_ compression library. A C extension |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
7 and CFFI interface are provided. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
8 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
9 The primary goal of the project is to provide a rich interface to the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
10 underlying C API through a Pythonic interface while not sacrificing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
11 performance. This means exposing most of the features and flexibility |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
12 of the C API while not sacrificing usability or safety that Python provides. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
13 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
14 The canonical home for this project is |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
15 https://github.com/indygreg/python-zstandard. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
16 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
17 | |ci-status| |win-ci-status| |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
18 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
19 State of Project |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
20 ================ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
21 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
22 The project is officially in beta state. The author is reasonably satisfied |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
23 that functionality works as advertised. **There will be some backwards |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
24 incompatible changes before 1.0, probably in the 0.9 release.** This may |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
25 involve renaming the main module from *zstd* to *zstandard* and renaming |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
26 various types and methods. Pin the package version to prevent unwanted |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
27 breakage when this change occurs! |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
28 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
29 This project is vendored and distributed with Mercurial 4.1, where it is |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
30 used in a production capacity. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
31 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
32 There is continuous integration for Python versions 2.6, 2.7, and 3.3+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
33 on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
34 confident the extension is stable and works as advertised on these |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
35 platforms. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
36 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
37 The CFFI bindings are mostly feature complete. Where a feature is implemented |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
38 in CFFI, unit tests run against both C extension and CFFI implementation to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
39 ensure behavior parity. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
40 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
41 Expected Changes |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
42 ---------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
43 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
44 The author is reasonably confident in the current state of what's |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
45 implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
46 Those APIs likely won't change significantly. Some low-level behavior |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
47 (such as naming and types expected by arguments) may change. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
48 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
49 There will likely be arguments added to control the input and output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
50 buffer sizes (currently, certain operations read and write in chunk |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
51 sizes using zstd's preferred defaults). |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
52 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
53 There should be an API that accepts an object that conforms to the buffer |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
54 interface and returns an iterator over compressed or decompressed output. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
55 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
56 There should be an API that exposes an ``io.RawIOBase`` interface to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
57 compressor and decompressor streams, like how ``gzip.GzipFile`` from |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
58 the standard library works (issue 13). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
59 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
60 The author is on the fence as to whether to support the extremely |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
61 low level compression and decompression APIs. It could be useful to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
62 support compression without the framing headers. But the author doesn't |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
63 believe it a high priority at this time. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
64 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
65 There will likely be a refactoring of the module names. Currently, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
66 ``zstd`` is a C extension and ``zstd_cffi`` is the CFFI interface. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
67 This means that all code for the C extension must be implemented in |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
68 C. ``zstd`` may be converted to a Python module so code can be reused |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
69 between CFFI and C and so not all code in the C extension has to be C. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
70 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
71 Requirements |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
72 ============ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
73 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
74 This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, 3.5, and |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
75 3.6 on common platforms (Linux, Windows, and OS X). Only x86_64 is |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
76 currently well-tested as an architecture. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
77 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
78 Installing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
79 ========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
80 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
81 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
82 So, to install this package:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
83 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
84 $ pip install zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
85 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
86 Binary wheels are made available for some platforms. If you need to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
87 install from a source distribution, all you should need is a working C |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
88 compiler and the Python development headers/libraries. On many Linux |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
89 distributions, you can install a ``python-dev`` or ``python-devel`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
90 package to provide these dependencies. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
91 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
92 Packages are also uploaded to Anaconda Cloud at |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
93 https://anaconda.org/indygreg/zstandard. See that URL for how to install |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
94 this package with ``conda``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
95 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
96 Performance |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
97 =========== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
98 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
99 Very crude and non-scientific benchmarking (most benchmarks fall in this |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
100 category because proper benchmarking is hard) show that the Python bindings |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
101 perform within 10% of the native C implementation. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
102 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
103 The following table compares the performance of compressing and decompressing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
104 a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
105 obtained with the ``zstd`` program are on the left. The remaining columns detail |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
106 performance of various compression APIs in the Python bindings. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
107 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
108 +-------+-----------------+-----------------+-----------------+---------------+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
109 | Level | Native | Simple | Stream In | Stream Out | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
110 | | Comp / Decomp | Comp / Decomp | Comp / Decomp | Comp | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
111 +=======+=================+=================+=================+===============+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
112 | 1 | 490 / 1338 MB/s | 458 / 1266 MB/s | 407 / 1156 MB/s | 405 MB/s | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
113 +-------+-----------------+-----------------+-----------------+---------------+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
114 | 2 | 412 / 1288 MB/s | 381 / 1203 MB/s | 345 / 1128 MB/s | 349 MB/s | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
115 +-------+-----------------+-----------------+-----------------+---------------+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
116 | 3 | 342 / 1312 MB/s | 319 / 1182 MB/s | 285 / 1165 MB/s | 287 MB/s | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
117 +-------+-----------------+-----------------+-----------------+---------------+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
118 | 11 | 64 / 1506 MB/s | 66 / 1436 MB/s | 56 / 1342 MB/s | 57 MB/s | |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
119 +-------+-----------------+-----------------+-----------------+---------------+ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
120 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
121 Again, these are very unscientific. But it shows that Python is capable of |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
122 compressing at several hundred MB/s and decompressing at over 1 GB/s. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
123 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
124 Comparison to Other Python Bindings |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
125 =================================== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
126 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
127 https://pypi.python.org/pypi/zstd is an alternate Python binding to |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
128 Zstandard. At the time this was written, the latest release of that |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
129 package (1.1.2) only exposed the simple APIs for compression and decompression. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
130 This package exposes much more of the zstd API, including streaming and |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
131 dictionary compression. This package also has CFFI support. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
132 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
133 Bundling of Zstandard Source Code |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
134 ================================= |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
135 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
136 The source repository for this project contains a vendored copy of the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
137 Zstandard source code. This is done for a few reasons. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
138 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
139 First, Zstandard is relatively new and not yet widely available as a system |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
140 package. Providing a copy of the source code enables the Python C extension |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
141 to be compiled without requiring the user to obtain the Zstandard source code |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
142 separately. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
143 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
144 Second, Zstandard has both a stable *public* API and an *experimental* API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
145 The *experimental* API is actually quite useful (contains functionality for |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
146 training dictionaries for example), so it is something we wish to expose to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
147 Python. However, the *experimental* API is only available via static linking. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
148 Furthermore, the *experimental* API can change at any time. So, control over |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
149 the exact version of the Zstandard library linked against is important to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
150 ensure known behavior. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
151 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
152 Instructions for Building and Testing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
153 ===================================== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
154 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
155 Once you have the source code, the extension can be built via setup.py:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
156 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
157 $ python setup.py build_ext |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
158 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
159 We recommend testing with ``nose``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
160 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
161 $ nosetests |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
162 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
163 A Tox configuration is present to test against multiple Python versions:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
164 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
165 $ tox |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
166 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
167 Tests use the ``hypothesis`` Python package to perform fuzzing. If you |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
168 don't have it, those tests won't run. Since the fuzzing tests take longer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
169 to execute than normal tests, you'll need to opt in to running them by |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
170 setting the ``ZSTD_SLOW_TESTS`` environment variable. This is set |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
171 automatically when using ``tox``. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
172 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
173 The ``cffi`` Python package needs to be installed in order to build the CFFI |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
174 bindings. If it isn't present, the CFFI bindings won't be built. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
175 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
176 To create a virtualenv with all development dependencies, do something |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
177 like the following:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
178 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
179 # Python 2 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
180 $ virtualenv venv |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
181 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
182 # Python 3 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
183 $ python3 -m venv venv |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
184 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
185 $ source venv/bin/activate |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
186 $ pip install cffi hypothesis nose tox |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
187 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
188 API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
189 === |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
190 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
191 The compiled C extension provides a ``zstd`` Python module. The CFFI |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
192 bindings provide a ``zstd_cffi`` module. Both provide an identical API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
193 interface. The types, functions, and attributes exposed by these modules |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
194 are documented in the sections below. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
195 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
196 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
197 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
198 The documentation in this section makes references to various zstd |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
199 concepts and functionality. The ``Concepts`` section below explains |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
200 these concepts in more detail. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
201 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
202 ZstdCompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
203 -------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
204 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
205 The ``ZstdCompressor`` class provides an interface for performing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
206 compression operations. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
207 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
208 Each instance is associated with parameters that control compression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
209 behavior. These come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
210 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
211 level |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
212 Integer compression level. Valid values are between 1 and 22. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
213 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
214 Compression dictionary to use. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
215 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
216 Note: When using dictionary data and ``compress()`` is called multiple |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
217 times, the ``CompressionParameters`` derived from an integer compression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
218 ``level`` and the first compressed data's size will be reused for all |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
219 subsequent operations. This may not be desirable if source data size |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
220 varies significantly. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
221 compression_params |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
222 A ``CompressionParameters`` instance (overrides the ``level`` value). |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
223 write_checksum |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
224 Whether a 4 byte checksum should be written with the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
225 Defaults to False. If True, the decompressor can verify that decompressed |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
226 data matches the original input data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
227 write_content_size |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
228 Whether the size of the uncompressed data will be written into the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
229 header of compressed data. Defaults to False. The data will only be |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
230 written if the compressor knows the size of the input data. This is |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
231 likely not true for streaming compression. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
232 write_dict_id |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
233 Whether to write the dictionary ID into the compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
234 Defaults to True. The dictionary ID is only written if a dictionary |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
235 is being used. |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
236 threads |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
237 Enables and sets the number of threads to use for multi-threaded compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
238 operations. Defaults to 0, which means to use single-threaded compression. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
239 Negative values will resolve to the number of logical CPUs in the system. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
240 Read below for more info on multi-threaded compression. This argument only |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
241 controls thread count for operations that operate on individual pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
242 data. APIs that spawn multiple threads for working on multiple pieces of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
243 data have their own ``threads`` argument. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
244 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
245 Unless specified otherwise, assume that no two methods of ``ZstdCompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
246 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
247 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
248 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
249 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
250 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
251 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
252 ``compress(data)`` compresses and returns data as a one-shot operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
253 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
254 cctx = zstd.ZstdCompressor() |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
255 compressed = cctx.compress(b'data to compress') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
256 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
257 The ``data`` argument can be any object that implements the *buffer protocol*. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
258 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
259 Unless ``compression_params`` or ``dict_data`` are passed to the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
260 ``ZstdCompressor``, each invocation of ``compress()`` will calculate the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
261 optimal compression parameters for the configured compression ``level`` and |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
262 input data size (some parameters are fine-tuned for small input sizes). |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
263 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
264 If a compression dictionary is being used, the compression parameters |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
265 determined from the first input's size will be reused for subsequent |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
266 operations. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
267 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
268 There is currently a deficiency in zstd's C APIs that makes it difficult |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
269 to round trip empty inputs when ``write_content_size=True``. Attempting |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
270 this will raise a ``ValueError`` unless ``allow_empty=True`` is passed |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
271 to ``compress()``. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
272 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
273 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
274 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
275 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
276 ``write_to(fh)`` (which behaves as a context manager) allows you to *stream* |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
277 data into a compressor.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
278 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
279 cctx = zstd.ZstdCompressor(level=10) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
280 with cctx.write_to(fh) as compressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
281 compressor.write(b'chunk 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
282 compressor.write(b'chunk 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
283 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
284 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
285 The argument to ``write_to()`` must have a ``write(data)`` method. As |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
286 compressed data is available, ``write()`` will be called with the compressed |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
287 data as its argument. Many common Python types implement ``write()``, including |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
288 open file handles and ``io.BytesIO``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
289 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
290 ``write_to()`` returns an object representing a streaming compressor instance. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
291 It **must** be used as a context manager. That object's ``write(data)`` method |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
292 is used to feed data into the compressor. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
293 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
294 A ``flush()`` method can be called to evict whatever data remains within the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
295 compressor's internal state into the output object. This may result in 0 or |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
296 more ``write()`` calls to the output object. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
297 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
298 Both ``write()`` and ``flush()`` return the number of bytes written to the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
299 object's ``write()``. In many cases, small inputs do not accumulate enough |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
300 data to cause a write and ``write()`` will return ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
301 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
302 If the size of the data being fed to this streaming compressor is known, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
303 you can declare it before compression begins:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
304 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
305 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
306 with cctx.write_to(fh, size=data_len) as compressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
307 compressor.write(chunk0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
308 compressor.write(chunk1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
309 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
310 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
311 Declaring the size of the source data allows compression parameters to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
312 be tuned. And if ``write_content_size`` is used, it also results in the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
313 content size being written into the frame header of the output data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
314 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
315 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
316 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
317 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
318 with cctx.write_to(fh, write_size=32768) as compressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
319 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
320 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
321 To see how much memory is being used by the streaming compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
322 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
323 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
324 with cctx.write_to(fh) as compressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
325 ... |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
326 byte_size = compressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
327 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
328 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
329 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
330 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
331 ``read_from(reader)`` provides a mechanism to stream data out of a compressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
332 as an iterator of data chunks.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
333 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
334 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
335 for chunk in cctx.read_from(fh): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
336 # Do something with emitted data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
337 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
338 ``read_from()`` accepts an object that has a ``read(size)`` method or conforms |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
339 to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
340 provide the buffer protocol.) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
341 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
342 Uncompressed data is fetched from the source either by calling ``read(size)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
343 or by fetching a slice of data from the object directly (in the case where |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
344 the buffer protocol is being used). The returned iterator consists of chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
345 of compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
346 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
347 If reading from the source via ``read()``, ``read()`` will be called until |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
348 it raises or returns an empty bytes (``b''``). It is perfectly valid for |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
349 the source to deliver fewer bytes than were what requested by ``read(size)``. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
350 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
351 Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
352 declaring the size of the input stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
353 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
354 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
355 for chunk in cctx.read_from(fh, size=some_int): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
356 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
357 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
358 You can also control the size that data is ``read()`` from the source and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
359 the ideal size of output chunks:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
360 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
361 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
362 for chunk in cctx.read_from(fh, read_size=16384, write_size=8192): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
363 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
364 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
365 Unlike ``write_to()``, ``read_from()`` does not give direct control over the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
366 sizes of chunks fed into the compressor. Instead, chunk sizes will be whatever |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
367 the object being read from delivers. These will often be of a uniform size. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
368 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
369 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
370 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
371 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
372 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
373 compressing it.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
374 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
375 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
376 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
377 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
378 For example, say you wish to compress a file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
379 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
380 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
381 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
382 cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
383 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
384 It is also possible to declare the size of the source stream:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
385 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
386 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
387 cctx.copy_stream(ifh, ofh, size=len_of_input) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
388 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
389 You can also specify how large the chunks that are ``read()`` and ``write()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
390 from and to the streams:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
391 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
392 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
393 cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
394 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
395 The stream copier returns a 2-tuple of bytes read and written:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
396 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
397 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
398 read_count, write_count = cctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
399 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
400 Compressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
401 ^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
402 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
403 ``compressobj()`` returns an object that exposes ``compress(data)`` and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
404 ``flush()`` methods. Each returns compressed data or an empty bytes. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
405 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
406 The purpose of ``compressobj()`` is to provide an API-compatible interface |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
407 with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
408 swap in different compressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
409 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
410 ``flush()`` accepts an optional argument indicating how to end the stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
411 ``zstd.COMPRESSOBJ_FLUSH_FINISH`` (the default) ends the compression stream. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
412 Once this type of flush is performed, ``compress()`` and ``flush()`` can |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
413 no longer be called. This type of flush **must** be called to end the |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
414 compression context. If not called, returned data may be incomplete. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
415 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
416 A ``zstd.COMPRESSOBJ_FLUSH_BLOCK`` argument to ``flush()`` will flush a |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
417 zstd block. Flushes of this type can be performed multiple times. The next |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
418 call to ``compress()`` will begin a new zstd block. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
419 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
420 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
421 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
422 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
423 cobj = cctx.compressobj() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
424 data = cobj.compress(b'raw input 0') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
425 data = cobj.compress(b'raw input 1') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
426 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
427 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
428 Or to flush blocks:: |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
429 |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
430 cctx.zstd.ZstdCompressor() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
431 cobj = cctx.compressobj() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
432 data = cobj.compress(b'chunk in first block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
433 data = cobj.flush(zstd.COMPRESSOBJ_FLUSH_BLOCK) |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
434 data = cobj.compress(b'chunk in second block') |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
435 data = cobj.flush() |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
436 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
437 For best performance results, keep input chunks under 256KB. This avoids |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
438 extra allocations for a large output object. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
439 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
440 It is possible to declare the input size of the data that will be fed into |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
441 the compressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
442 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
443 cctx = zstd.ZstdCompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
444 cobj = cctx.compressobj(size=6) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
445 data = cobj.compress(b'foobar') |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
446 data = cobj.flush() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
447 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
448 Batch Compression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
449 ^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
450 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
451 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
452 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
453 ``multi_compress_to_buffer(data, [threads=0])`` performs compression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
454 inputs as a single operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
455 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
456 Data to be compressed can be passed as a ``BufferWithSegmentsCollection``, a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
457 ``BufferWithSegments``, or a list containing byte like objects. Each element of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
458 the container will be compressed individually using the configured parameters |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
459 on the ``ZstdCompressor`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
460 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
461 The ``threads`` argument controls how many threads to use for compression. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
462 default is ``0`` which means to use a single thread. Negative values use the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
463 number of logical CPUs in the machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
464 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
465 The function returns a ``BufferWithSegmentsCollection``. This type represents |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
466 N discrete memory allocations, eaching holding 1 or more compressed frames. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
467 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
468 Output data is written to shared memory buffers. This means that unlike |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
469 regular Python objects, a reference to *any* object within the collection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
470 keeps the shared buffer and therefore memory backing it alive. This can have |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
471 undesirable effects on process memory usage. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
472 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
473 The API and behavior of this function is experimental and will likely change. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
474 Known deficiencies include: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
475 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
476 * If asked to use multiple threads, it will always spawn that many threads, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
477 even if the input is too small to use them. It should automatically lower |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
478 the thread count when the extra threads would just add overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
479 * The buffer allocation strategy is fixed. There is room to make it dynamic, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
480 perhaps even to allow one output buffer per input, facilitating a variation |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
481 of the API to return a list without the adverse effects of shared memory |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
482 buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
483 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
484 ZstdDecompressor |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
485 ---------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
486 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
487 The ``ZstdDecompressor`` class provides an interface for performing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
488 decompression. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
489 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
490 Each instance is associated with parameters that control decompression. These |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
491 come from the following named arguments (all optional): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
492 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
493 dict_data |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
494 Compression dictionary to use. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
495 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
496 The interface of this class is very similar to ``ZstdCompressor`` (by design). |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
497 |
30822
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
498 Unless specified otherwise, assume that no two methods of ``ZstdDecompressor`` |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
499 instances can be called from multiple Python threads simultaneously. In other |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
500 words, assume instances are not thread safe unless stated otherwise. |
b54a2984cdd4
zstd: vendor python-zstandard 0.6.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30435
diff
changeset
|
501 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
502 Simple API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
503 ^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
504 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
505 ``decompress(data)`` can be used to decompress an entire compressed zstd |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
506 frame in a single operation.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
507 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
508 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
509 decompressed = dctx.decompress(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
510 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
511 By default, ``decompress(data)`` will only work on data written with the content |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
512 size encoded in its header. This can be achieved by creating a |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
513 ``ZstdCompressor`` with ``write_content_size=True``. If compressed data without |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
514 an embedded content size is seen, ``zstd.ZstdError`` will be raised. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
515 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
516 If the compressed data doesn't have its content size embedded within it, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
517 decompression can be attempted by specifying the ``max_output_size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
518 argument.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
519 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
520 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
521 uncompressed = dctx.decompress(data, max_output_size=1048576) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
522 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
523 Ideally, ``max_output_size`` will be identical to the decompressed output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
524 size. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
525 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
526 If ``max_output_size`` is too small to hold the decompressed data, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
527 ``zstd.ZstdError`` will be raised. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
528 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
529 If ``max_output_size`` is larger than the decompressed data, the allocated |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
530 output buffer will be resized to only use the space required. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
531 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
532 Please note that an allocation of the requested ``max_output_size`` will be |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
533 performed every time the method is called. Setting to a very large value could |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
534 result in a lot of work for the memory allocator and may result in |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
535 ``MemoryError`` being raised if the allocation fails. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
536 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
537 If the exact size of decompressed data is unknown, it is **strongly** |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
538 recommended to use a streaming API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
539 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
540 Streaming Input API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
541 ^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
542 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
543 ``write_to(fh)`` can be used to incrementally send compressed data to a |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
544 decompressor.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
545 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
546 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
547 with dctx.write_to(fh) as decompressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
548 decompressor.write(compressed_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
549 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
550 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
551 the decompressor by calling ``write(data)`` and decompressed output is written |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
552 to the output object by calling its ``write(data)`` method. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
553 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
554 Calls to ``write()`` will return the number of bytes written to the output |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
555 object. Not all inputs will result in bytes being written, so return values |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
556 of ``0`` are possible. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
557 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
558 The size of chunks being ``write()`` to the destination can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
559 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
560 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
561 with dctx.write_to(fh, write_size=16384) as decompressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
562 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
563 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
564 You can see how much memory is being used by the decompressor:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
565 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
566 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
567 with dctx.write_to(fh) as decompressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
568 byte_size = decompressor.memory_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
569 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
570 Streaming Output API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
571 ^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
572 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
573 ``read_from(fh)`` provides a mechanism to stream decompressed data out of a |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
574 compressed source as an iterator of data chunks.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
575 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
576 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
577 for chunk in dctx.read_from(fh): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
578 # Do something with original data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
579 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
580 ``read_from()`` accepts a) an object with a ``read(size)`` method that will |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
581 return compressed bytes b) an object conforming to the buffer protocol that |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
582 can expose its data as a contiguous range of bytes. The ``bytes`` and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
583 ``memoryview`` types expose this buffer protocol. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
584 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
585 ``read_from()`` returns an iterator whose elements are chunks of the |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
586 decompressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
587 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
588 The size of requested ``read()`` from the source can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
589 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
590 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
591 for chunk in dctx.read_from(fh, read_size=16384): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
592 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
593 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
594 It is also possible to skip leading bytes in the input data:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
595 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
596 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
597 for chunk in dctx.read_from(fh, skip_bytes=1): |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
598 pass |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
599 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
600 Skipping leading bytes is useful if the source data contains extra |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
601 *header* data but you want to avoid the overhead of making a buffer copy |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
602 or allocating a new ``memoryview`` object in order to decompress the data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
603 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
604 Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
605 controls when data is decompressed. If the iterator isn't consumed, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
606 decompression is put on hold. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
607 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
608 When ``read_from()`` is passed an object conforming to the buffer protocol, |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
609 the behavior may seem similar to what occurs when the simple decompression |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
610 API is used. However, this API works when the decompressed size is unknown. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
611 Furthermore, if feeding large inputs, the decompressor will work in chunks |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
612 instead of performing a single operation. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
613 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
614 Stream Copying API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
615 ^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
616 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
617 ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
618 performing decompression.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
619 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
620 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
621 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
622 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
623 e.g. to decompress a file to another file:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
624 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
625 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
626 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
627 dctx.copy_stream(ifh, ofh) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
628 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
629 The size of chunks being ``read()`` and ``write()`` from and to the streams |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
630 can be specified:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
631 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
632 dctx = zstd.ZstdDecompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
633 dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
634 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
635 Decompressor API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
636 ^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
637 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
638 ``decompressobj()`` returns an object that exposes a ``decompress(data)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
639 methods. Compressed data chunks are fed into ``decompress(data)`` and |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
640 uncompressed output (or an empty bytes) is returned. Output from subsequent |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
641 calls needs to be concatenated to reassemble the full decompressed byte |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
642 sequence. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
643 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
644 The purpose of ``decompressobj()`` is to provide an API-compatible interface |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
645 with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
646 to swap in different decompressor objects while using the same API. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
647 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
648 Each object is single use: once an input frame is decoded, ``decompress()`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
649 can no longer be called. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
650 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
651 Here is how this API should be used:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
652 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
653 dctx = zstd.ZstdDeompressor() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
654 dobj = cctx.decompressobj() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
655 data = dobj.decompress(compressed_chunk_0) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
656 data = dobj.decompress(compressed_chunk_1) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
657 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
658 Batch Decompression API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
659 ^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
660 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
661 (Experimental. Not yet supported in CFFI bindings.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
662 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
663 ``multi_decompress_to_buffer()`` performs decompression of multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
664 frames as a single operation and returns a ``BufferWithSegmentsCollection`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
665 containing decompressed data for all inputs. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
666 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
667 Compressed frames can be passed to the function as a ``BufferWithSegments``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
668 a ``BufferWithSegmentsCollection``, or as a list containing objects that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
669 conform to the buffer protocol. For best performance, pass a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
670 ``BufferWithSegmentsCollection`` or a ``BufferWithSegments``, as |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
671 minimal input validation will be done for that type. If calling from |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
672 Python (as opposed to C), constructing one of these instances may add |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
673 overhead cancelling out the performance overhead of validation for list |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
674 inputs. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
675 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
676 The decompressed size of each frame must be discoverable. It can either be |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
677 embedded within the zstd frame (``write_content_size=True`` argument to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
678 ``ZstdCompressor``) or passed in via the ``decompressed_sizes`` argument. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
679 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
680 The ``decompressed_sizes`` argument is an object conforming to the buffer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
681 protocol which holds an array of 64-bit unsigned integers in the machine's |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
682 native format defining the decompressed sizes of each frame. If this argument |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
683 is passed, it avoids having to scan each frame for its decompressed size. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
684 This frame scanning can add noticeable overhead in some scenarios. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
685 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
686 The ``threads`` argument controls the number of threads to use to perform |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
687 decompression operations. The default (``0``) or the value ``1`` means to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
688 use a single thread. Negative values use the number of logical CPUs in the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
689 machine. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
690 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
691 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
692 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
693 It is possible to pass a ``mmap.mmap()`` instance into this function by |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
694 wrapping it with a ``BufferWithSegments`` instance (which will define the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
695 offsets of frames within the memory mapped region). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
696 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
697 This function is logically equivalent to performing ``dctx.decompress()`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
698 on each input frame and returning the result. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
699 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
700 This function exists to perform decompression on multiple frames as fast |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
701 as possible by having as little overhead as possible. Since decompression is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
702 performed as a single operation and since the decompressed output is stored in |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
703 a single buffer, extra memory allocations, Python objects, and Python function |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
704 calls are avoided. This is ideal for scenarios where callers need to access |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
705 decompressed data for multiple frames. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
706 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
707 Currently, the implementation always spawns multiple threads when requested, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
708 even if the amount of work to do is small. In the future, it will be smarter |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
709 about avoiding threads and their associated overhead when the amount of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
710 work to do is small. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
711 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
712 Content-Only Dictionary Chain Decompression |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
713 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
714 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
715 ``decompress_content_dict_chain(frames)`` performs decompression of a list of |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
716 zstd frames produced using chained *content-only* dictionary compression. Such |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
717 a list of frames is produced by compressing discrete inputs where each |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
718 non-initial input is compressed with a *content-only* dictionary consisting |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
719 of the content of the previous input. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
720 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
721 For example, say you have the following inputs:: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
722 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
723 inputs = [b'input 1', b'input 2', b'input 3'] |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
724 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
725 The zstd frame chain consists of: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
726 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
727 1. ``b'input 1'`` compressed in standalone/discrete mode |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
728 2. ``b'input 2'`` compressed using ``b'input 1'`` as a *content-only* dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
729 3. ``b'input 3'`` compressed using ``b'input 2'`` as a *content-only* dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
730 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
731 Each zstd frame **must** have the content size written. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
732 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
733 The following Python code can be used to produce a *content-only dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
734 chain*:: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
735 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
736 def make_chain(inputs): |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
737 frames = [] |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
738 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
739 # First frame is compressed in standalone/discrete mode. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
740 zctx = zstd.ZstdCompressor(write_content_size=True) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
741 frames.append(zctx.compress(inputs[0])) |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
742 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
743 # Subsequent frames use the previous fulltext as a content-only dictionary |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
744 for i, raw in enumerate(inputs[1:]): |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
745 dict_data = zstd.ZstdCompressionDict(inputs[i]) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
746 zctx = zstd.ZstdCompressor(write_content_size=True, dict_data=dict_data) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
747 frames.append(zctx.compress(raw)) |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
748 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
749 return frames |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
750 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
751 ``decompress_content_dict_chain()`` returns the uncompressed data of the last |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
752 element in the input chain. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
753 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
754 It is possible to implement *content-only dictionary chain* decompression |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
755 on top of other Python APIs. However, this function will likely be significantly |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
756 faster, especially for long input chains, as it avoids the overhead of |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
757 instantiating and passing around intermediate objects between C and Python. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
758 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
759 Multi-Threaded Compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
760 -------------------------- |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
761 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
762 ``ZstdCompressor`` accepts a ``threads`` argument that controls the number |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
763 of threads to use for compression. The way this works is that input is split |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
764 into segments and each segment is fed into a worker pool for compression. Once |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
765 a segment is compressed, it is flushed/appended to the output. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
766 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
767 The segment size for multi-threaded compression is chosen from the window size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
768 of the compressor. This is derived from the ``window_log`` attribute of a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
769 ``CompressionParameters`` instance. By default, segment sizes are in the 1+MB |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
770 range. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
771 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
772 If multi-threaded compression is requested and the input is smaller than the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
773 configured segment size, only a single compression thread will be used. If the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
774 input is smaller than the segment size multiplied by the thread pool size or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
775 if data cannot be delivered to the compressor fast enough, not all requested |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
776 compressor threads may be active simultaneously. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
777 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
778 Compared to non-multi-threaded compression, multi-threaded compression has |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
779 higher per-operation overhead. This includes extra memory operations, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
780 thread creation, lock acquisition, etc. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
781 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
782 Due to the nature of multi-threaded compression using *N* compression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
783 *states*, the output from multi-threaded compression will likely be larger |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
784 than non-multi-threaded compression. The difference is usually small. But |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
785 there is a CPU/wall time versus size trade off that may warrant investigation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
786 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
787 Output from multi-threaded compression does not require any special handling |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
788 on the decompression side. In other words, any zstd decompressor should be able |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
789 to consume data produced with multi-threaded compression. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
790 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
791 Dictionary Creation and Management |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
792 ---------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
793 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
794 Compression dictionaries are represented as the ``ZstdCompressionDict`` type. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
795 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
796 Instances can be constructed from bytes:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
797 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
798 dict_data = zstd.ZstdCompressionDict(data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
799 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
800 It is possible to construct a dictionary from *any* data. Unless the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
801 data begins with a magic header, the dictionary will be treated as |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
802 *content-only*. *Content-only* dictionaries allow compression operations |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
803 that follow to reference raw data within the content. For one use of |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
804 *content-only* dictionaries, see |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
805 ``ZstdDecompressor.decompress_content_dict_chain()``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
806 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
807 More interestingly, instances can be created by *training* on sample data:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
808 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
809 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
810 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
811 This takes a list of bytes instances and creates and returns a |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
812 ``ZstdCompressionDict``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
813 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
814 You can see how many bytes are in the dictionary by calling ``len()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
815 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
816 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
817 dict_size = len(dict_data) # will not be larger than ``size`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
818 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
819 Once you have a dictionary, you can pass it to the objects performing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
820 compression and decompression:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
821 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
822 dict_data = zstd.train_dictionary(16384, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
823 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
824 cctx = zstd.ZstdCompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
825 for source_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
826 compressed = cctx.compress(source_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
827 # Do something with compressed data. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
828 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
829 dctx = zstd.ZstdDecompressor(dict_data=dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
830 for compressed_data in input_data: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
831 buffer = io.BytesIO() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
832 with dctx.write_to(buffer) as decompressor: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
833 decompressor.write(compressed_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
834 # Do something with raw data in ``buffer``. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
835 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
836 Dictionaries have unique integer IDs. You can retrieve this ID via:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
837 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
838 dict_id = zstd.dictionary_id(dict_data) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
839 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
840 You can obtain the raw data in the dict (useful for persisting and constructing |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
841 a ``ZstdCompressionDict`` later) via ``as_bytes()``:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
842 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
843 dict_data = zstd.train_dictionary(size, samples) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
844 raw_data = dict_data.as_bytes() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
845 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
846 The following named arguments to ``train_dictionary`` can also be used |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
847 to further control dictionary generation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
848 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
849 selectivity |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
850 Integer selectivity level. Default is 9. Larger values yield more data in |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
851 dictionary. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
852 level |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
853 Integer compression level. Default is 6. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
854 dict_id |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
855 Integer dictionary ID for the produced dictionary. Default is 0, which |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
856 means to use a random value. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
857 notifications |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
858 Controls writing of informational messages to ``stderr``. ``0`` (the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
859 default) means to write nothing. ``1`` writes errors. ``2`` writes |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
860 progression info. ``3`` writes more details. And ``4`` writes all info. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
861 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
862 Cover Dictionaries |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
863 ^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
864 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
865 An alternate dictionary training mechanism named *cover* is also available. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
866 More details about this training mechanism are available in the paper |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
867 *Effective Construction of Relative Lempel-Ziv Dictionaries* (authors: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
868 Liao, Petri, Moffat, Wirth). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
869 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
870 To use this mechanism, use ``zstd.train_cover_dictionary()`` instead of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
871 ``zstd.train_dictionary()``. The function behaves nearly the same except |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
872 its arguments are different and the returned dictionary will contain ``k`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
873 and ``d`` attributes reflecting the parameters to the cover algorithm. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
874 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
875 .. note:: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
876 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
877 The ``k`` and ``d`` attributes are only populated on dictionary |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
878 instances created by this function. If a ``ZstdCompressionDict`` is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
879 constructed from raw bytes data, the ``k`` and ``d`` attributes will |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
880 be ``0``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
881 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
882 The segment and dmer size parameters to the cover algorithm can either be |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
883 specified manually or you can ask ``train_cover_dictionary()`` to try |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
884 multiple values and pick the best one, where *best* means the smallest |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
885 compressed data size. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
886 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
887 In manual mode, the ``k`` and ``d`` arguments must be specified or a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
888 ``ZstdError`` will be raised. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
889 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
890 In automatic mode (triggered by specifying ``optimize=True``), ``k`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
891 and ``d`` are optional. If a value isn't specified, then default values for |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
892 both are tested. The ``steps`` argument can control the number of steps |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
893 through ``k`` values. The ``level`` argument defines the compression level |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
894 that will be used when testing the compressed size. And ``threads`` can |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
895 specify the number of threads to use for concurrent operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
896 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
897 This function takes the following arguments: |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
898 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
899 dict_size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
900 Target size in bytes of the dictionary to generate. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
901 samples |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
902 A list of bytes holding samples the dictionary will be trained from. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
903 k |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
904 Parameter to cover algorithm defining the segment size. A reasonable range |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
905 is [16, 2048+]. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
906 d |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
907 Parameter to cover algorithm defining the dmer size. A reasonable range is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
908 [6, 16]. ``d`` must be less than or equal to ``k``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
909 dict_id |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
910 Integer dictionary ID for the produced dictionary. Default is 0, which uses |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
911 a random value. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
912 optimize |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
913 When true, test dictionary generation with multiple parameters. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
914 level |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
915 Integer target compression level when testing compression with |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
916 ``optimize=True``. Default is 1. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
917 steps |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
918 Number of steps through ``k`` values to perform when ``optimize=True``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
919 Default is 32. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
920 threads |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
921 Number of threads to use when ``optimize=True``. Default is 0, which means |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
922 to use a single thread. A negative value can be specified to use as many |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
923 threads as there are detected logical CPUs. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
924 notifications |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
925 Controls writing of informational messages to ``stderr``. See the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
926 documentation for ``train_dictionary()`` for more. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
927 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
928 Explicit Compression Parameters |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
929 ------------------------------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
930 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
931 Zstandard's integer compression levels along with the input size and dictionary |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
932 size are converted into a data structure defining multiple parameters to tune |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
933 behavior of the compression algorithm. It is possible to use define this |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
934 data structure explicitly to have lower-level control over compression behavior. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
935 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
936 The ``zstd.CompressionParameters`` type represents this data structure. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
937 You can see how Zstandard converts compression levels to this data structure |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
938 by calling ``zstd.get_compression_parameters()``. e.g.:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
939 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
940 params = zstd.get_compression_parameters(5) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
941 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
942 This function also accepts the uncompressed data size and dictionary size |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
943 to adjust parameters:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
944 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
945 params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data)) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
946 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
947 You can also construct compression parameters from their low-level components:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
948 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
949 params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
950 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
951 You can then configure a compressor to use the custom parameters:: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
952 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
953 cctx = zstd.ZstdCompressor(compression_params=params) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
954 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
955 The members/attributes of ``CompressionParameters`` instances are as follows:: |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
956 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
957 * window_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
958 * chain_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
959 * hash_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
960 * search_log |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
961 * search_length |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
962 * target_length |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
963 * strategy |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
964 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
965 This is the order the arguments are passed to the constructor if not using |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
966 named arguments. |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
967 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
968 You'll need to read the Zstandard documentation for what these parameters |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
969 do. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
970 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
971 Frame Inspection |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
972 ---------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
973 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
974 Data emitted from zstd compression is encapsulated in a *frame*. This frame |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
975 begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
976 the frame in more detail. For more info, see |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
977 https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
978 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
979 ``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
980 instance and return a ``FrameParameters`` object describing the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
981 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
982 Depending on which fields are present in the frame and their values, the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
983 length of the frame parameters varies. If insufficient bytes are passed |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
984 in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
985 frame parameters can be parsed, pass in at least 18 bytes. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
986 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
987 ``FrameParameters`` instances have the following attributes: |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
988 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
989 content_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
990 Integer size of original, uncompressed content. This will be ``0`` if the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
991 original content size isn't written to the frame (controlled with the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
992 ``write_content_size`` argument to ``ZstdCompressor``) or if the input |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
993 content size was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
994 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
995 window_size |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
996 Integer size of maximum back-reference distance in compressed data. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
997 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
998 dict_id |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
999 Integer of dictionary ID used for compression. ``0`` if no dictionary |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1000 ID was used or if the dictionary ID was ``0``. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1001 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1002 has_checksum |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1003 Bool indicating whether a 4 byte content checksum is stored at the end |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1004 of the frame. |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1005 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1006 Misc Functionality |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1007 ------------------ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1008 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1009 estimate_compression_context_size(CompressionParameters) |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1010 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1011 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1012 Given a ``CompressionParameters`` struct, estimate the memory size required |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1013 to perform compression. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1014 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1015 estimate_decompression_context_size() |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1016 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1017 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1018 Estimate the memory size requirements for a decompressor instance. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1019 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1020 Constants |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1021 --------- |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1022 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1023 The following module constants/attributes are exposed: |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1024 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1025 ZSTD_VERSION |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1026 This module attribute exposes a 3-tuple of the Zstandard version. e.g. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1027 ``(1, 0, 0)`` |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1028 MAX_COMPRESSION_LEVEL |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1029 Integer max compression level accepted by compression functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1030 COMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1031 Recommended chunk size to feed to compressor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1032 COMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1033 Recommended chunk size for compression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1034 DECOMPRESSION_RECOMMENDED_INPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1035 Recommended chunk size to feed into decompresor functions |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1036 DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1037 Recommended chunk size for decompression output |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1038 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1039 FRAME_HEADER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1040 bytes containing header of the Zstandard frame |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1041 MAGIC_NUMBER |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1042 Frame header as an integer |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1043 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1044 WINDOWLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1045 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1046 WINDOWLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1047 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1048 CHAINLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1049 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1050 CHAINLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1051 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1052 HASHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1053 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1054 HASHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1055 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1056 SEARCHLOG_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1057 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1058 SEARCHLOG_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1059 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1060 SEARCHLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1061 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1062 SEARCHLENGTH_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1063 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1064 TARGETLENGTH_MIN |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1065 Minimum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1066 TARGETLENGTH_MAX |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1067 Maximum value for compression parameter |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1068 STRATEGY_FAST |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1069 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1070 STRATEGY_DFAST |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1071 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1072 STRATEGY_GREEDY |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1073 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1074 STRATEGY_LAZY |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1075 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1076 STRATEGY_LAZY2 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1077 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1078 STRATEGY_BTLAZY2 |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1079 Compression strategy |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1080 STRATEGY_BTOPT |
30895
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1081 Compression strategy |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1082 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1083 Performance Considerations |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1084 -------------------------- |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1085 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1086 The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1087 persistent compression or decompression *context*. Reusing a ``ZstdCompressor`` |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1088 or ``ZstdDecompressor`` instance for multiple operations is faster than |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1089 instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1090 operation. The differences are magnified as the size of data decreases. For |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1091 example, the difference between *context* reuse and non-reuse for 100,000 |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1092 100 byte inputs will be significant (possiby over 10x faster to reuse contexts) |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1093 whereas 10 1,000,000 byte inputs will be more similar in speed (because the |
c32454d69b85
zstd: vendor python-zstandard 0.7.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30822
diff
changeset
|
1094 time spent doing compression dwarfs time spent creating new *contexts*). |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1095 |
31796
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1096 Buffer Types |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1097 ------------ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1098 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1099 The API exposes a handful of custom types for interfacing with memory buffers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1100 The primary goal of these types is to facilitate efficient multi-object |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1101 operations. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1102 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1103 The essential idea is to have a single memory allocation provide backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1104 storage for multiple logical objects. This has 2 main advantages: fewer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1105 allocations and optimal memory access patterns. This avoids having to allocate |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1106 a Python object for each logical object and furthermore ensures that access of |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1107 data for objects can be sequential (read: fast) in memory. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1108 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1109 BufferWithSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1110 ^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1111 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1112 The ``BufferWithSegments`` type represents a memory buffer containing N |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1113 discrete items of known lengths (segments). It is essentially a fixed size |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1114 memory address and an array of 2-tuples of ``(offset, length)`` 64-bit |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1115 unsigned native endian integers defining the byte offset and length of each |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1116 segment within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1117 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1118 Instances behave like containers. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1119 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1120 ``len()`` returns the number of segments within the instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1121 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1122 ``o[index]`` or ``__getitem__`` obtains a ``BufferSegment`` representing an |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1123 individual segment within the backing buffer. That returned object references |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1124 (not copies) memory. This means that iterating all objects doesn't copy |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1125 data within the buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1126 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1127 The ``.size`` attribute contains the total size in bytes of the backing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1128 buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1129 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1130 Instances conform to the buffer protocol. So a reference to the backing bytes |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1131 can be obtained via ``memoryview(o)``. A *copy* of the backing bytes can also |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1132 be obtained via ``.tobytes()``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1133 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1134 The ``.segments`` attribute exposes the array of ``(offset, length)`` for |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1135 segments within the buffer. It is a ``BufferSegments`` type. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1136 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1137 BufferSegment |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1138 ^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1139 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1140 The ``BufferSegment`` type represents a segment within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1141 It is essentially a reference to N bytes within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1142 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1143 ``len()`` returns the length of the segment in bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1144 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1145 ``.offset`` contains the byte offset of this segment within its parent |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1146 ``BufferWithSegments`` instance. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1147 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1148 The object conforms to the buffer protocol. ``.tobytes()`` can be called to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1149 obtain a ``bytes`` instance with a copy of the backing bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1150 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1151 BufferSegments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1152 ^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1153 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1154 This type represents an array of ``(offset, length)`` integers defining segments |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1155 within a ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1156 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1157 The array members are 64-bit unsigned integers using host/native bit order. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1158 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1159 Instances conform to the buffer protocol. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1160 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1161 BufferWithSegmentsCollection |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1163 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1164 The ``BufferWithSegmentsCollection`` type represents a virtual spanning view |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1165 of multiple ``BufferWithSegments`` instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1166 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1167 Instances are constructed from 1 or more ``BufferWithSegments`` instances. The |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1168 resulting object behaves like an ordered sequence whose members are the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1169 segments within each ``BufferWithSegments``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1170 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1171 ``len()`` returns the number of segments within all ``BufferWithSegments`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1172 instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1173 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1174 ``o[index]`` and ``__getitem__(index)`` return the ``BufferSegment`` at |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1175 that offset as if all ``BufferWithSegments`` instances were a single |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1176 entity. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1177 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1178 If the object is composed of 2 ``BufferWithSegments`` instances with the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1179 first having 2 segments and the second have 3 segments, then ``b[0]`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1180 and ``b[1]`` access segments in the first object and ``b[2]``, ``b[3]``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1181 and ``b[4]`` access segments from the second. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1182 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1183 Choosing an API |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1184 =============== |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1185 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1186 There are multiple APIs for performing compression and decompression. This is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1187 because different applications have different needs and the library wants to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1188 facilitate optimal use in as many use cases as possible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1189 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1190 From a high-level, APIs are divided into *one-shot* and *streaming*. See |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1191 the ``Concepts`` section for a description of how these are different at |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1192 the C layer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1193 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1194 The *one-shot* APIs are useful for small data, where the input or output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1195 size is known. (The size can come from a buffer length, file size, or |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1196 stored in the zstd frame header.) A limitation of the *one-shot* APIs is that |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1197 input and output must fit in memory simultaneously. For say a 4 GB input, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1198 this is often not feasible. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1199 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1200 The *one-shot* APIs also perform all work as a single operation. So, if you |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1201 feed it large input, it could take a long time for the function to return. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1202 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1203 The streaming APIs do not have the limitations of the simple API. But the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1204 price you pay for this flexibility is that they are more complex than a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1205 single function call. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1206 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1207 The streaming APIs put the caller in control of compression and decompression |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1208 behavior by allowing them to directly control either the input or output side |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1209 of the operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1210 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1211 With the *streaming input*, *compressor*, and *decompressor* APIs, the caller |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1212 has full control over the input to the compression or decompression stream. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1213 They can directly choose when new data is operated on. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1214 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1215 With the *streaming ouput* APIs, the caller has full control over the output |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1216 of the compression or decompression stream. It can choose when to receive |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1217 new data. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1218 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1219 When using the *streaming* APIs that operate on file-like or stream objects, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1220 it is important to consider what happens in that object when I/O is requested. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1221 There is potential for long pauses as data is read or written from the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1222 underlying stream (say from interacting with a filesystem or network). This |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1223 could add considerable overhead. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1224 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1225 Concepts |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1226 ======== |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1227 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1228 It is important to have a basic understanding of how Zstandard works in order |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1229 to optimally use this library. In addition, there are some low-level Python |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1230 concepts that are worth explaining to aid understanding. This section aims to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1231 provide that knowledge. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1232 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1233 Zstandard Frames and Compression Format |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1234 --------------------------------------- |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1235 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1236 Compressed zstandard data almost always exists within a container called a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1237 *frame*. (For the technically curious, see the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1238 `specification <https://github.com/facebook/zstd/blob/3bee41a70eaf343fbcae3637b3f6edbe52f35ed8/doc/zstd_compression_format.md>_.) |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1239 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1240 The frame contains a header and optional trailer. The header contains a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1241 magic number to self-identify as a zstd frame and a description of the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1242 compressed data that follows. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1243 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1244 Among other things, the frame *optionally* contains the size of the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1245 decompressed data the frame represents, a 32-bit checksum of the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1246 decompressed data (to facilitate verification during decompression), |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1247 and the ID of the dictionary used to compress the data. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1248 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1249 Storing the original content size in the frame (``write_content_size=True`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1250 to ``ZstdCompressor``) is important for performance in some scenarios. Having |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1251 the decompressed size stored there (or storing it elsewhere) allows |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1252 decompression to perform a single memory allocation that is exactly sized to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1253 the output. This is faster than continuously growing a memory buffer to hold |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1254 output. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1255 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1256 Compression and Decompression Contexts |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1257 -------------------------------------- |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1258 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1259 In order to perform a compression or decompression operation with the zstd |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1260 C API, you need what's called a *context*. A context essentially holds |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1261 configuration and state for a compression or decompression operation. For |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1262 example, a compression context holds the configured compression level. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1263 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1264 Contexts can be reused for multiple operations. Since creating and |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1265 destroying contexts is not free, there are performance advantages to |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1266 reusing contexts. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1267 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1268 The ``ZstdCompressor`` and ``ZstdDecompressor`` types are essentially |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1269 wrappers around these contexts in the zstd C API. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1270 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1271 One-shot And Streaming Operations |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1272 --------------------------------- |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1273 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1274 A compression or decompression operation can either be performed as a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1275 single *one-shot* operation or as a continuous *streaming* operation. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1276 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1277 In one-shot mode (the *simple* APIs provided by the Python interface), |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1278 **all** input is handed to the compressor or decompressor as a single buffer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1279 and **all** output is returned as a single buffer. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1280 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1281 In streaming mode, input is delivered to the compressor or decompressor as |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1282 a series of chunks via multiple function calls. Likewise, output is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1283 obtained in chunks as well. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1284 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1285 Streaming operations require an additional *stream* object to be created |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1286 to track the operation. These are logical extensions of *context* |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1287 instances. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1288 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1289 There are advantages and disadvantages to each mode of operation. There |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1290 are scenarios where certain modes can't be used. See the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1291 ``Choosing an API`` section for more. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1292 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1293 Dictionaries |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1294 ------------ |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1295 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1296 A compression *dictionary* is essentially data used to seed the compressor |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1297 state so it can achieve better compression. The idea is that if you are |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1298 compressing a lot of similar pieces of data (e.g. JSON documents or anything |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1299 sharing similar structure), then you can find common patterns across multiple |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1300 objects then leverage those common patterns during compression and |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1301 decompression operations to achieve better compression ratios. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1302 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1303 Dictionary compression is generally only useful for small inputs - data no |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1304 larger than a few kilobytes. The upper bound on this range is highly dependent |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1305 on the input data and the dictionary. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1306 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1307 Python Buffer Protocol |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1308 ---------------------- |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1309 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1310 Many functions in the library operate on objects that implement Python's |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1311 `buffer protocol <https://docs.python.org/3.6/c-api/buffer.html>`_. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1312 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1313 The *buffer protocol* is an internal implementation detail of a Python |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1314 type that allows instances of that type (objects) to be exposed as a raw |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1315 pointer (or buffer) in the C API. In other words, it allows objects to be |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1316 exposed as an array of bytes. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1317 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1318 From the perspective of the C API, objects implementing the *buffer protocol* |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1319 all look the same: they are just a pointer to a memory address of a defined |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1320 length. This allows the C API to be largely type agnostic when accessing their |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1321 data. This allows custom types to be passed in without first converting them |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1322 to a specific type. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1323 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1324 Many Python types implement the buffer protocol. These include ``bytes`` |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1325 (``str`` on Python 2), ``bytearray``, ``array.array``, ``io.BytesIO``, |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1326 ``mmap.mmap``, and ``memoryview``. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1327 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1328 ``python-zstandard`` APIs that accept objects conforming to the buffer |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1329 protocol require that the buffer is *C contiguous* and has a single |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1330 dimension (``ndim==1``). This is usually the case. An example of where it |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1331 is not is a Numpy matrix type. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1332 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1333 Requiring Output Sizes for Non-Streaming Decompression APIs |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1334 ----------------------------------------------------------- |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1335 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1336 Non-streaming decompression APIs require that either the output size is |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1337 explicitly defined (either in the zstd frame header or passed into the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1338 function) or that a max output size is specified. This restriction is for |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1339 your safety. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1340 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1341 The *one-shot* decompression APIs store the decompressed result in a |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1342 single buffer. This means that a buffer needs to be pre-allocated to hold |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1343 the result. If the decompressed size is not known, then there is no universal |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1344 good default size to use. Any default will fail or will be highly sub-optimal |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1345 in some scenarios (it will either be too small or will put stress on the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1346 memory allocator to allocate a too large block). |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1347 |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1348 A *helpful* API may retry decompression with buffers of increasing size. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1349 While useful, there are obvious performance disadvantages, namely redoing |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1350 decompression N times until it works. In addition, there is a security |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1351 concern. Say the input came from highly compressible data, like 1 GB of the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1352 same byte value. The output size could be several magnitudes larger than the |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1353 input size. An input of <100KB could decompress to >1GB. Without a bounds |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1354 restriction on the decompressed size, certain inputs could exhaust all system |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1355 memory. That's not good and is why the maximum output size is limited. |
e0dc40530c5a
zstd: vendor python-zstandard 0.8.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
30895
diff
changeset
|
1356 |
30435
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1357 Note on Zstandard's *Experimental* API |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1358 ====================================== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1359 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1360 Many of the Zstandard APIs used by this module are marked as *experimental* |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1361 within the Zstandard project. This includes a large number of useful |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1362 features, such as compression and frame parameters and parts of dictionary |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1363 compression. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1364 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1365 It is unclear how Zstandard's C API will evolve over time, especially with |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1366 regards to this *experimental* functionality. We will try to maintain |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1367 backwards compatibility at the Python API level. However, we cannot |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1368 guarantee this for things not under our control. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1369 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1370 Since a copy of the Zstandard source code is distributed with this |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1371 module and since we compile against it, the behavior of a specific |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1372 version of this module should be constant for all of time. So if you |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1373 pin the version of this module used in your projects (which is a Python |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1374 best practice), you should be buffered from unwanted future changes. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1375 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1376 Donate |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1377 ====== |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1378 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1379 A lot of time has been invested into this project by the author. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1380 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1381 If you find this project useful and would like to thank the author for |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1382 their work, consider donating some money. Any amount is appreciated. |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1383 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1384 .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1385 :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1386 :alt: Donate via PayPal |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1387 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1388 .. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1389 :target: https://travis-ci.org/indygreg/python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1390 |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1391 .. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1392 :target: https://ci.appveyor.com/project/indygreg/python-zstandard |
b86a448a2965
zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1393 :alt: Windows build status |