Mercurial > hg
changeset 30435:b86a448a2965
zstd: vendor python-zstandard 0.5.0
As the commit message for the previous changeset says, we wish
for zstd to be a 1st class citizen in Mercurial. To make that
happen, we need to enable Python to talk to the zstd C API. And
that requires bindings.
This commit vendors a copy of existing Python bindings. Why do we
need to vendor? As the commit message of the previous commit says,
relying on systems in the wild to have the bindings or zstd present
is a losing proposition. By distributing the zstd and bindings with
Mercurial, we significantly increase our chances that zstd will
work. Since zstd will deliver a better end-user experience by
achieving better performance, this benefits our users. Another
reason is that the Python bindings still aren't stable and the
API is somewhat fluid. While Mercurial could be coded to target
multiple versions of the Python bindings, it is safer to bundle
an explicit, known working version.
The added Python bindings are mostly a fully-featured interface
to the zstd C API. They allow one-shot operations, streaming,
reading and writing from objects implements the file object
protocol, dictionary compression, control over low-level compression
parameters, and more. The Python bindings work on Python 2.6,
2.7, and 3.3+ and have been tested on Linux and Windows. There are
CFFI bindings, but they are lacking compared to the C extension.
Upstream work will be needed before we can support zstd with PyPy.
But it will be possible.
The files added in this commit come from Git commit
e637c1b214d5f869cf8116c550dcae23ec13b677 from
https://github.com/indygreg/python-zstandard and are added without
modifications. Some files from the upstream repository have been
omitted, namely files related to continuous integration.
In the spirit of full disclosure, I'm the maintainer of the
"python-zstandard" project and have authored 100% of the code
added in this commit. Unfortunately, the Python bindings have
not been formally code reviewed by anyone. While I've tested
much of the code thoroughly (I even have tests that fuzz APIs),
there's a good chance there are bugs, memory leaks, not well
thought out APIs, etc. If someone wants to review the code and
send feedback to the GitHub project, it would be greatly
appreciated.
Despite my involvement with both projects, my opinions of code
style differ from Mercurial's. The code in this commit introduces
numerous code style violations in Mercurial's linters. So, the code
is excluded from most lints. However, some violations I agree with.
These have been added to the known violations ignore list for now.
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/LICENSE Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,27 @@ +Copyright (c) 2016, Gregory Szorc +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors +may be used to endorse or promote products derived from this software without +specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR +ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/MANIFEST.in Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,2 @@ +graft zstd +include make_cffi.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/NEWS.rst Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,63 @@ +Version History +=============== + +0.5.0 (released 2016-11-10) +--------------------------- + +* Vendored version of zstd updated to 1.1.1. +* Continuous integration for Python 3.6 and 3.7 +* Continuous integration for Conda +* Added compression and decompression APIs providing similar interfaces + to the standard library ``zlib`` and ``bz2`` modules. This allows + coding to a common interface. +* ``zstd.__version__` is now defined. +* ``read_from()`` on various APIs now accepts objects implementing the buffer + protocol. +* ``read_from()`` has gained a ``skip_bytes`` argument. This allows callers + to pass in an existing buffer with a header without having to create a + slice or a new object. +* Implemented ``ZstdCompressionDict.as_bytes()``. +* Python's memory allocator is now used instead of ``malloc()``. +* Low-level zstd data structures are reused in more instances, cutting down + on overhead for certain operations. +* ``distutils`` boilerplate for obtaining an ``Extension`` instance + has now been refactored into a standalone ``setup_zstd.py`` file. This + allows other projects with ``setup.py`` files to reuse the + ``distutils`` code for this project without copying code. +* The monolithic ``zstd.c`` file has been split into a header file defining + types and separate ``.c`` source files for the implementation. + +History of the Project +====================== + +2016-08-31 - Zstandard 1.0.0 is released and Gregory starts hacking on a +Python extension for use by the Mercurial project. A very hacky prototype +is sent to the mercurial-devel list for RFC. + +2016-09-03 - Most functionality from Zstandard C API implemented. Source +code published on https://github.com/indygreg/python-zstandard. Travis-CI +automation configured. 0.0.1 release on PyPI. + +2016-09-05 - After the API was rounded out a bit and support for Python +2.6 and 2.7 was added, version 0.1 was released to PyPI. + +2016-09-05 - After the compressor and decompressor APIs were changed, 0.2 +was released to PyPI. + +2016-09-10 - 0.3 is released with a bunch of new features. ZstdCompressor +now accepts arguments controlling frame parameters. The source size can now +be declared when performing streaming compression. ZstdDecompressor.decompress() +is implemented. Compression dictionaries are now cached when using the simple +compression and decompression APIs. Memory size APIs added. +ZstdCompressor.read_from() and ZstdDecompressor.read_from() have been +implemented. This rounds out the major compression/decompression APIs planned +by the author. + +2016-10-02 - 0.3.3 is released with a bug fix for read_from not fully +decoding a zstd frame (issue #2). + +2016-10-02 - 0.4.0 is released with zstd 1.1.0, support for custom read and +write buffer sizes, and a few bug fixes involving failure to read/write +all data when buffer sizes were too small to hold remaining data. + +2016-11-10 - 0.5.0 is released with zstd 1.1.1 and other enhancements.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/README.rst Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,776 @@ +================ +python-zstandard +================ + +This project provides a Python C extension for interfacing with the +`Zstandard <http://www.zstd.net>`_ compression library. + +The primary goal of the extension is to provide a Pythonic interface to +the underlying C API. This means exposing most of the features and flexibility +of the C API while not sacrificing usability or safety that Python provides. + +| |ci-status| |win-ci-status| + +State of Project +================ + +The project is officially in beta state. The author is reasonably satisfied +with the current API and that functionality works as advertised. There +may be some backwards incompatible changes before 1.0. Though the author +does not intend to make any major changes to the Python API. + +There is continuous integration for Python versions 2.6, 2.7, and 3.3+ +on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably +confident the extension is stable and works as advertised on these +platforms. + +Expected Changes +---------------- + +The author is reasonably confident in the current state of what's +implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types. +Those APIs likely won't change significantly. Some low-level behavior +(such as naming and types expected by arguments) may change. + +There will likely be arguments added to control the input and output +buffer sizes (currently, certain operations read and write in chunk +sizes using zstd's preferred defaults). + +There should be an API that accepts an object that conforms to the buffer +interface and returns an iterator over compressed or decompressed output. + +The author is on the fence as to whether to support the extremely +low level compression and decompression APIs. It could be useful to +support compression without the framing headers. But the author doesn't +believe it a high priority at this time. + +The CFFI bindings are half-baked and need to be finished. + +Requirements +============ + +This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5 +on common platforms (Linux, Windows, and OS X). Only x86_64 is currently +well-tested as an architecture. + +Installing +========== + +This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. +So, to install this package:: + + $ pip install zstandard + +Binary wheels are made available for some platforms. If you need to +install from a source distribution, all you should need is a working C +compiler and the Python development headers/libraries. On many Linux +distributions, you can install a ``python-dev`` or ``python-devel`` +package to provide these dependencies. + +Packages are also uploaded to Anaconda Cloud at +https://anaconda.org/indygreg/zstandard. See that URL for how to install +this package with ``conda``. + +Performance +=========== + +Very crude and non-scientific benchmarking (most benchmarks fall in this +category because proper benchmarking is hard) show that the Python bindings +perform within 10% of the native C implementation. + +The following table compares the performance of compressing and decompressing +a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values +obtained with the ``zstd`` program are on the left. The remaining columns detail +performance of various compression APIs in the Python bindings. + ++-------+-----------------+-----------------+-----------------+---------------+ +| Level | Native | Simple | Stream In | Stream Out | +| | Comp / Decomp | Comp / Decomp | Comp / Decomp | Comp | ++=======+=================+=================+=================+===============+ +| 1 | 490 / 1338 MB/s | 458 / 1266 MB/s | 407 / 1156 MB/s | 405 MB/s | ++-------+-----------------+-----------------+-----------------+---------------+ +| 2 | 412 / 1288 MB/s | 381 / 1203 MB/s | 345 / 1128 MB/s | 349 MB/s | ++-------+-----------------+-----------------+-----------------+---------------+ +| 3 | 342 / 1312 MB/s | 319 / 1182 MB/s | 285 / 1165 MB/s | 287 MB/s | ++-------+-----------------+-----------------+-----------------+---------------+ +| 11 | 64 / 1506 MB/s | 66 / 1436 MB/s | 56 / 1342 MB/s | 57 MB/s | ++-------+-----------------+-----------------+-----------------+---------------+ + +Again, these are very unscientific. But it shows that Python is capable of +compressing at several hundred MB/s and decompressing at over 1 GB/s. + +Comparison to Other Python Bindings +=================================== + +https://pypi.python.org/pypi/zstd is an alternative Python binding to +Zstandard. At the time this was written, the latest release of that +package (1.0.0.2) had the following significant differences from this package: + +* It only exposes the simple API for compression and decompression operations. + This extension exposes the streaming API, dictionary training, and more. +* It adds a custom framing header to compressed data and there is no way to + disable it. This means that data produced with that module cannot be used by + other Zstandard implementations. + +Bundling of Zstandard Source Code +================================= + +The source repository for this project contains a vendored copy of the +Zstandard source code. This is done for a few reasons. + +First, Zstandard is relatively new and not yet widely available as a system +package. Providing a copy of the source code enables the Python C extension +to be compiled without requiring the user to obtain the Zstandard source code +separately. + +Second, Zstandard has both a stable *public* API and an *experimental* API. +The *experimental* API is actually quite useful (contains functionality for +training dictionaries for example), so it is something we wish to expose to +Python. However, the *experimental* API is only available via static linking. +Furthermore, the *experimental* API can change at any time. So, control over +the exact version of the Zstandard library linked against is important to +ensure known behavior. + +Instructions for Building and Testing +===================================== + +Once you have the source code, the extension can be built via setup.py:: + + $ python setup.py build_ext + +We recommend testing with ``nose``:: + + $ nosetests + +A Tox configuration is present to test against multiple Python versions:: + + $ tox + +Tests use the ``hypothesis`` Python package to perform fuzzing. If you +don't have it, those tests won't run. + +There is also an experimental CFFI module. You need the ``cffi`` Python +package installed to build and test that. + +To create a virtualenv with all development dependencies, do something +like the following:: + + # Python 2 + $ virtualenv venv + + # Python 3 + $ python3 -m venv venv + + $ source venv/bin/activate + $ pip install cffi hypothesis nose tox + +API +=== + +The compiled C extension provides a ``zstd`` Python module. This module +exposes the following interfaces. + +ZstdCompressor +-------------- + +The ``ZstdCompressor`` class provides an interface for performing +compression operations. + +Each instance is associated with parameters that control compression +behavior. These come from the following named arguments (all optional): + +level + Integer compression level. Valid values are between 1 and 22. +dict_data + Compression dictionary to use. + + Note: When using dictionary data and ``compress()`` is called multiple + times, the ``CompressionParameters`` derived from an integer compression + ``level`` and the first compressed data's size will be reused for all + subsequent operations. This may not be desirable if source data size + varies significantly. +compression_params + A ``CompressionParameters`` instance (overrides the ``level`` value). +write_checksum + Whether a 4 byte checksum should be written with the compressed data. + Defaults to False. If True, the decompressor can verify that decompressed + data matches the original input data. +write_content_size + Whether the size of the uncompressed data will be written into the + header of compressed data. Defaults to False. The data will only be + written if the compressor knows the size of the input data. This is + likely not true for streaming compression. +write_dict_id + Whether to write the dictionary ID into the compressed data. + Defaults to True. The dictionary ID is only written if a dictionary + is being used. + +Simple API +^^^^^^^^^^ + +``compress(data)`` compresses and returns data as a one-shot operation.:: + + cctx = zstd.ZsdCompressor() + compressed = cctx.compress(b'data to compress') + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``write_to(fh)`` (which behaves as a context manager) allows you to *stream* +data into a compressor.:: + + cctx = zstd.ZstdCompressor(level=10) + with cctx.write_to(fh) as compressor: + compressor.write(b'chunk 0') + compressor.write(b'chunk 1') + ... + +The argument to ``write_to()`` must have a ``write(data)`` method. As +compressed data is available, ``write()`` will be called with the comrpessed +data as its argument. Many common Python types implement ``write()``, including +open file handles and ``io.BytesIO``. + +``write_to()`` returns an object representing a streaming compressor instance. +It **must** be used as a context manager. That object's ``write(data)`` method +is used to feed data into the compressor. + +If the size of the data being fed to this streaming compressor is known, +you can declare it before compression begins:: + + cctx = zstd.ZstdCompressor() + with cctx.write_to(fh, size=data_len) as compressor: + compressor.write(chunk0) + compressor.write(chunk1) + ... + +Declaring the size of the source data allows compression parameters to +be tuned. And if ``write_content_size`` is used, it also results in the +content size being written into the frame header of the output data. + +The size of chunks being ``write()`` to the destination can be specified:: + + cctx = zstd.ZstdCompressor() + with cctx.write_to(fh, write_size=32768) as compressor: + ... + +To see how much memory is being used by the streaming compressor:: + + cctx = zstd.ZstdCompressor() + with cctx.write_to(fh) as compressor: + ... + byte_size = compressor.memory_size() + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_from(reader)`` provides a mechanism to stream data out of a compressor +as an iterator of data chunks.:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_from(fh): + # Do something with emitted data. + +``read_from()`` accepts an object that has a ``read(size)`` method or conforms +to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that +provide the buffer protocol.) + +Uncompressed data is fetched from the source either by calling ``read(size)`` +or by fetching a slice of data from the object directly (in the case where +the buffer protocol is being used). The returned iterator consists of chunks +of compressed data. + +Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument +declaring the size of the input stream:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_from(fh, size=some_int): + pass + +You can also control the size that data is ``read()`` from the source and +the ideal size of output chunks:: + + cctx = zstd.ZstdCompressor() + for chunk in cctx.read_from(fh, read_size=16384, write_size=8192): + pass + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while +compressing it.:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh) + +For example, say you wish to compress a file:: + + cctx = zstd.ZstdCompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + cctx.copy_stream(ifh, ofh) + +It is also possible to declare the size of the source stream:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, size=len_of_input) + +You can also specify how large the chunks that are ``read()`` and ``write()`` +from and to the streams:: + + cctx = zstd.ZstdCompressor() + cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) + +The stream copier returns a 2-tuple of bytes read and written:: + + cctx = zstd.ZstdCompressor() + read_count, write_count = cctx.copy_stream(ifh, ofh) + +Compressor API +^^^^^^^^^^^^^^ + +``compressobj()`` returns an object that exposes ``compress(data)`` and +``flush()`` methods. Each returns compressed data or an empty bytes. + +The purpose of ``compressobj()`` is to provide an API-compatible interface +with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to +swap in different compressor objects while using the same API. + +Once ``flush()`` is called, the compressor will no longer accept new data +to ``compress()``. ``flush()`` **must** be called to end the compression +context. If not called, the returned data may be incomplete. + +Here is how this API should be used:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj() + data = cobj.compress(b'raw input 0') + data = cobj.compress(b'raw input 1') + data = cobj.flush() + +For best performance results, keep input chunks under 256KB. This avoids +extra allocations for a large output object. + +It is possible to declare the input size of the data that will be fed into +the compressor:: + + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj(size=6) + data = cobj.compress(b'foobar') + data = cobj.flush() + +ZstdDecompressor +---------------- + +The ``ZstdDecompressor`` class provides an interface for performing +decompression. + +Each instance is associated with parameters that control decompression. These +come from the following named arguments (all optional): + +dict_data + Compression dictionary to use. + +The interface of this class is very similar to ``ZstdCompressor`` (by design). + +Simple API +^^^^^^^^^^ + +``decompress(data)`` can be used to decompress an entire compressed zstd +frame in a single operation.:: + + dctx = zstd.ZstdDecompressor() + decompressed = dctx.decompress(data) + +By default, ``decompress(data)`` will only work on data written with the content +size encoded in its header. This can be achieved by creating a +``ZstdCompressor`` with ``write_content_size=True``. If compressed data without +an embedded content size is seen, ``zstd.ZstdError`` will be raised. + +If the compressed data doesn't have its content size embedded within it, +decompression can be attempted by specifying the ``max_output_size`` +argument.:: + + dctx = zstd.ZstdDecompressor() + uncompressed = dctx.decompress(data, max_output_size=1048576) + +Ideally, ``max_output_size`` will be identical to the decompressed output +size. + +If ``max_output_size`` is too small to hold the decompressed data, +``zstd.ZstdError`` will be raised. + +If ``max_output_size`` is larger than the decompressed data, the allocated +output buffer will be resized to only use the space required. + +Please note that an allocation of the requested ``max_output_size`` will be +performed every time the method is called. Setting to a very large value could +result in a lot of work for the memory allocator and may result in +``MemoryError`` being raised if the allocation fails. + +If the exact size of decompressed data is unknown, it is **strongly** +recommended to use a streaming API. + +Streaming Input API +^^^^^^^^^^^^^^^^^^^ + +``write_to(fh)`` can be used to incrementally send compressed data to a +decompressor.:: + + dctx = zstd.ZstdDecompressor() + with dctx.write_to(fh) as decompressor: + decompressor.write(compressed_data) + +This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to +the decompressor by calling ``write(data)`` and decompressed output is written +to the output object by calling its ``write(data)`` method. + +The size of chunks being ``write()`` to the destination can be specified:: + + dctx = zstd.ZstdDecompressor() + with dctx.write_to(fh, write_size=16384) as decompressor: + pass + +You can see how much memory is being used by the decompressor:: + + dctx = zstd.ZstdDecompressor() + with dctx.write_to(fh) as decompressor: + byte_size = decompressor.memory_size() + +Streaming Output API +^^^^^^^^^^^^^^^^^^^^ + +``read_from(fh)`` provides a mechanism to stream decompressed data out of a +compressed source as an iterator of data chunks.:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_from(fh): + # Do something with original data. + +``read_from()`` accepts a) an object with a ``read(size)`` method that will +return compressed bytes b) an object conforming to the buffer protocol that +can expose its data as a contiguous range of bytes. The ``bytes`` and +``memoryview`` types expose this buffer protocol. + +``read_from()`` returns an iterator whose elements are chunks of the +decompressed data. + +The size of requested ``read()`` from the source can be specified:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_from(fh, read_size=16384): + pass + +It is also possible to skip leading bytes in the input data:: + + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_from(fh, skip_bytes=1): + pass + +Skipping leading bytes is useful if the source data contains extra +*header* data but you want to avoid the overhead of making a buffer copy +or allocating a new ``memoryview`` object in order to decompress the data. + +Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator +controls when data is decompressed. If the iterator isn't consumed, +decompression is put on hold. + +When ``read_from()`` is passed an object conforming to the buffer protocol, +the behavior may seem similar to what occurs when the simple decompression +API is used. However, this API works when the decompressed size is unknown. +Furthermore, if feeding large inputs, the decompressor will work in chunks +instead of performing a single operation. + +Stream Copying API +^^^^^^^^^^^^^^^^^^ + +``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while +performing decompression.:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh) + +e.g. to decompress a file to another file:: + + dctx = zstd.ZstdDecompressor() + with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: + dctx.copy_stream(ifh, ofh) + +The size of chunks being ``read()`` and ``write()`` from and to the streams +can be specified:: + + dctx = zstd.ZstdDecompressor() + dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) + +Decompressor API +^^^^^^^^^^^^^^^^ + +``decompressobj()`` returns an object that exposes a ``decompress(data)`` +methods. Compressed data chunks are fed into ``decompress(data)`` and +uncompressed output (or an empty bytes) is returned. Output from subsequent +calls needs to be concatenated to reassemble the full decompressed byte +sequence. + +The purpose of ``decompressobj()`` is to provide an API-compatible interface +with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers +to swap in different decompressor objects while using the same API. + +Each object is single use: once an input frame is decoded, ``decompress()`` +can no longer be called. + +Here is how this API should be used:: + + dctx = zstd.ZstdDeompressor() + dobj = cctx.decompressobj() + data = dobj.decompress(compressed_chunk_0) + data = dobj.decompress(compressed_chunk_1) + +Choosing an API +--------------- + +Various forms of compression and decompression APIs are provided because each +are suitable for different use cases. + +The simple/one-shot APIs are useful for small data, when the decompressed +data size is known (either recorded in the zstd frame header via +``write_content_size`` or known via an out-of-band mechanism, such as a file +size). + +A limitation of the simple APIs is that input or output data must fit in memory. +And unless using advanced tricks with Python *buffer objects*, both input and +output must fit in memory simultaneously. + +Another limitation is that compression or decompression is performed as a single +operation. So if you feed large input, it could take a long time for the +function to return. + +The streaming APIs do not have the limitations of the simple API. The cost to +this is they are more complex to use than a single function call. + +The streaming APIs put the caller in control of compression and decompression +behavior by allowing them to directly control either the input or output side +of the operation. + +With the streaming input APIs, the caller feeds data into the compressor or +decompressor as they see fit. Output data will only be written after the caller +has explicitly written data. + +With the streaming output APIs, the caller consumes output from the compressor +or decompressor as they see fit. The compressor or decompressor will only +consume data from the source when the caller is ready to receive it. + +One end of the streaming APIs involves a file-like object that must +``write()`` output data or ``read()`` input data. Depending on what the +backing storage for these objects is, those operations may not complete quickly. +For example, when streaming compressed data to a file, the ``write()`` into +a streaming compressor could result in a ``write()`` to the filesystem, which +may take a long time to finish due to slow I/O on the filesystem. So, there +may be overhead in streaming APIs beyond the compression and decompression +operations. + +Dictionary Creation and Management +---------------------------------- + +Zstandard allows *dictionaries* to be used when compressing and +decompressing data. The idea is that if you are compressing a lot of similar +data, you can precompute common properties of that data (such as recurring +byte sequences) to achieve better compression ratios. + +In Python, compression dictionaries are represented as the +``ZstdCompressionDict`` type. + +Instances can be constructed from bytes:: + + dict_data = zstd.ZstdCompressionDict(data) + +More interestingly, instances can be created by *training* on sample data:: + + dict_data = zstd.train_dictionary(size, samples) + +This takes a list of bytes instances and creates and returns a +``ZstdCompressionDict``. + +You can see how many bytes are in the dictionary by calling ``len()``:: + + dict_data = zstd.train_dictionary(size, samples) + dict_size = len(dict_data) # will not be larger than ``size`` + +Once you have a dictionary, you can pass it to the objects performing +compression and decompression:: + + dict_data = zstd.train_dictionary(16384, samples) + + cctx = zstd.ZstdCompressor(dict_data=dict_data) + for source_data in input_data: + compressed = cctx.compress(source_data) + # Do something with compressed data. + + dctx = zstd.ZstdDecompressor(dict_data=dict_data) + for compressed_data in input_data: + buffer = io.BytesIO() + with dctx.write_to(buffer) as decompressor: + decompressor.write(compressed_data) + # Do something with raw data in ``buffer``. + +Dictionaries have unique integer IDs. You can retrieve this ID via:: + + dict_id = zstd.dictionary_id(dict_data) + +You can obtain the raw data in the dict (useful for persisting and constructing +a ``ZstdCompressionDict`` later) via ``as_bytes()``:: + + dict_data = zstd.train_dictionary(size, samples) + raw_data = dict_data.as_bytes() + +Explicit Compression Parameters +------------------------------- + +Zstandard's integer compression levels along with the input size and dictionary +size are converted into a data structure defining multiple parameters to tune +behavior of the compression algorithm. It is possible to use define this +data structure explicitly to have lower-level control over compression behavior. + +The ``zstd.CompressionParameters`` type represents this data structure. +You can see how Zstandard converts compression levels to this data structure +by calling ``zstd.get_compression_parameters()``. e.g.:: + + params = zstd.get_compression_parameters(5) + +This function also accepts the uncompressed data size and dictionary size +to adjust parameters:: + + params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data)) + +You can also construct compression parameters from their low-level components:: + + params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST) + +You can then configure a compressor to use the custom parameters:: + + cctx = zstd.ZstdCompressor(compression_params=params) + +The members of the ``CompressionParameters`` tuple are as follows:: + +* 0 - Window log +* 1 - Chain log +* 2 - Hash log +* 3 - Search log +* 4 - Search length +* 5 - Target length +* 6 - Strategy (one of the ``zstd.STRATEGY_`` constants) + +You'll need to read the Zstandard documentation for what these parameters +do. + +Misc Functionality +------------------ + +estimate_compression_context_size(CompressionParameters) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Given a ``CompressionParameters`` struct, estimate the memory size required +to perform compression. + +estimate_decompression_context_size() +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Estimate the memory size requirements for a decompressor instance. + +Constants +--------- + +The following module constants/attributes are exposed: + +ZSTD_VERSION + This module attribute exposes a 3-tuple of the Zstandard version. e.g. + ``(1, 0, 0)`` +MAX_COMPRESSION_LEVEL + Integer max compression level accepted by compression functions +COMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed to compressor functions +COMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for compression output +DECOMPRESSION_RECOMMENDED_INPUT_SIZE + Recommended chunk size to feed into decompresor functions +DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE + Recommended chunk size for decompression output + +FRAME_HEADER + bytes containing header of the Zstandard frame +MAGIC_NUMBER + Frame header as an integer + +WINDOWLOG_MIN + Minimum value for compression parameter +WINDOWLOG_MAX + Maximum value for compression parameter +CHAINLOG_MIN + Minimum value for compression parameter +CHAINLOG_MAX + Maximum value for compression parameter +HASHLOG_MIN + Minimum value for compression parameter +HASHLOG_MAX + Maximum value for compression parameter +SEARCHLOG_MIN + Minimum value for compression parameter +SEARCHLOG_MAX + Maximum value for compression parameter +SEARCHLENGTH_MIN + Minimum value for compression parameter +SEARCHLENGTH_MAX + Maximum value for compression parameter +TARGETLENGTH_MIN + Minimum value for compression parameter +TARGETLENGTH_MAX + Maximum value for compression parameter +STRATEGY_FAST + Compression strategory +STRATEGY_DFAST + Compression strategory +STRATEGY_GREEDY + Compression strategory +STRATEGY_LAZY + Compression strategory +STRATEGY_LAZY2 + Compression strategory +STRATEGY_BTLAZY2 + Compression strategory +STRATEGY_BTOPT + Compression strategory + +Note on Zstandard's *Experimental* API +====================================== + +Many of the Zstandard APIs used by this module are marked as *experimental* +within the Zstandard project. This includes a large number of useful +features, such as compression and frame parameters and parts of dictionary +compression. + +It is unclear how Zstandard's C API will evolve over time, especially with +regards to this *experimental* functionality. We will try to maintain +backwards compatibility at the Python API level. However, we cannot +guarantee this for things not under our control. + +Since a copy of the Zstandard source code is distributed with this +module and since we compile against it, the behavior of a specific +version of this module should be constant for all of time. So if you +pin the version of this module used in your projects (which is a Python +best practice), you should be buffered from unwanted future changes. + +Donate +====== + +A lot of time has been invested into this project by the author. + +If you find this project useful and would like to thank the author for +their work, consider donating some money. Any amount is appreciated. + +.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif + :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted + :alt: Donate via PayPal + +.. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master + :target: https://travis-ci.org/indygreg/python-zstandard + +.. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true + :target: https://ci.appveyor.com/project/indygreg/python-zstandard + :alt: Windows build status
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressiondict.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,247 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs) { + static char *kwlist[] = { "dict_size", "samples", "parameters", NULL }; + size_t capacity; + PyObject* samples; + Py_ssize_t samplesLen; + PyObject* parameters = NULL; + ZDICT_params_t zparams; + Py_ssize_t sampleIndex; + Py_ssize_t sampleSize; + PyObject* sampleItem; + size_t zresult; + void* sampleBuffer; + void* sampleOffset; + size_t samplesSize = 0; + size_t* sampleSizes; + void* dict; + ZstdCompressionDict* result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|O!", kwlist, + &capacity, + &PyList_Type, &samples, + (PyObject*)&DictParametersType, ¶meters)) { + return NULL; + } + + /* Validate parameters first since it is easiest. */ + zparams.selectivityLevel = 0; + zparams.compressionLevel = 0; + zparams.notificationLevel = 0; + zparams.dictID = 0; + zparams.reserved[0] = 0; + zparams.reserved[1] = 0; + + if (parameters) { + /* TODO validate data ranges */ + zparams.selectivityLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 0)); + zparams.compressionLevel = PyLong_AsLong(PyTuple_GetItem(parameters, 1)); + zparams.notificationLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 2)); + zparams.dictID = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 3)); + } + + /* Figure out the size of the raw samples */ + samplesLen = PyList_Size(samples); + for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) { + sampleItem = PyList_GetItem(samples, sampleIndex); + if (!PyBytes_Check(sampleItem)) { + PyErr_SetString(PyExc_ValueError, "samples must be bytes"); + /* TODO probably need to perform DECREF here */ + return NULL; + } + samplesSize += PyBytes_GET_SIZE(sampleItem); + } + + /* Now that we know the total size of the raw simples, we can allocate + a buffer for the raw data */ + sampleBuffer = malloc(samplesSize); + if (!sampleBuffer) { + PyErr_NoMemory(); + return NULL; + } + sampleSizes = malloc(samplesLen * sizeof(size_t)); + if (!sampleSizes) { + free(sampleBuffer); + PyErr_NoMemory(); + return NULL; + } + + sampleOffset = sampleBuffer; + /* Now iterate again and assemble the samples in the buffer */ + for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) { + sampleItem = PyList_GetItem(samples, sampleIndex); + sampleSize = PyBytes_GET_SIZE(sampleItem); + sampleSizes[sampleIndex] = sampleSize; + memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize); + sampleOffset = (char*)sampleOffset + sampleSize; + } + + dict = malloc(capacity); + if (!dict) { + free(sampleSizes); + free(sampleBuffer); + PyErr_NoMemory(); + return NULL; + } + + zresult = ZDICT_trainFromBuffer_advanced(dict, capacity, + sampleBuffer, sampleSizes, (unsigned int)samplesLen, + zparams); + if (ZDICT_isError(zresult)) { + PyErr_Format(ZstdError, "Cannot train dict: %s", ZDICT_getErrorName(zresult)); + free(dict); + free(sampleSizes); + free(sampleBuffer); + return NULL; + } + + result = PyObject_New(ZstdCompressionDict, &ZstdCompressionDictType); + if (!result) { + return NULL; + } + + result->dictData = dict; + result->dictSize = zresult; + return result; +} + + +PyDoc_STRVAR(ZstdCompressionDict__doc__, +"ZstdCompressionDict(data) - Represents a computed compression dictionary\n" +"\n" +"This type holds the results of a computed Zstandard compression dictionary.\n" +"Instances are obtained by calling ``train_dictionary()`` or by passing bytes\n" +"obtained from another source into the constructor.\n" +); + +static int ZstdCompressionDict_init(ZstdCompressionDict* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + + self->dictData = NULL; + self->dictSize = 0; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { +#else + if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { +#endif + return -1; + } + + self->dictData = malloc(sourceSize); + if (!self->dictData) { + PyErr_NoMemory(); + return -1; + } + + memcpy(self->dictData, source, sourceSize); + self->dictSize = sourceSize; + + return 0; + } + +static void ZstdCompressionDict_dealloc(ZstdCompressionDict* self) { + if (self->dictData) { + free(self->dictData); + self->dictData = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionDict_dict_id(ZstdCompressionDict* self) { + unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize); + + return PyLong_FromLong(dictID); +} + +static PyObject* ZstdCompressionDict_as_bytes(ZstdCompressionDict* self) { + return PyBytes_FromStringAndSize(self->dictData, self->dictSize); +} + +static PyMethodDef ZstdCompressionDict_methods[] = { + { "dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS, + PyDoc_STR("dict_id() -- obtain the numeric dictionary ID") }, + { "as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS, + PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary data") }, + { NULL, NULL } +}; + +static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict* self) { + return self->dictSize; +} + +static PySequenceMethods ZstdCompressionDict_sq = { + (lenfunc)ZstdCompressionDict_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + 0, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +PyTypeObject ZstdCompressionDictType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionDict", /* tp_name */ + sizeof(ZstdCompressionDict), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionDict_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &ZstdCompressionDict_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionDict__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionDict_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)ZstdCompressionDict_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressiondict_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressionDictType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionDictType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdCompressionDictType); + PyModule_AddObject(mod, "ZstdCompressionDict", + (PyObject*)&ZstdCompressionDictType); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressionparams.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,226 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams) { + zparams->windowLog = params->windowLog; + zparams->chainLog = params->chainLog; + zparams->hashLog = params->hashLog; + zparams->searchLog = params->searchLog; + zparams->searchLength = params->searchLength; + zparams->targetLength = params->targetLength; + zparams->strategy = params->strategy; +} + +CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args) { + int compressionLevel; + unsigned PY_LONG_LONG sourceSize = 0; + Py_ssize_t dictSize = 0; + ZSTD_compressionParameters params; + CompressionParametersObject* result; + + if (!PyArg_ParseTuple(args, "i|Kn", &compressionLevel, &sourceSize, &dictSize)) { + return NULL; + } + + params = ZSTD_getCParams(compressionLevel, sourceSize, dictSize); + + result = PyObject_New(CompressionParametersObject, &CompressionParametersType); + if (!result) { + return NULL; + } + + result->windowLog = params.windowLog; + result->chainLog = params.chainLog; + result->hashLog = params.hashLog; + result->searchLog = params.searchLog; + result->searchLength = params.searchLength; + result->targetLength = params.targetLength; + result->strategy = params.strategy; + + return result; +} + +PyObject* estimate_compression_context_size(PyObject* self, PyObject* args) { + CompressionParametersObject* params; + ZSTD_compressionParameters zparams; + PyObject* result; + + if (!PyArg_ParseTuple(args, "O!", &CompressionParametersType, ¶ms)) { + return NULL; + } + + ztopy_compression_parameters(params, &zparams); + result = PyLong_FromSize_t(ZSTD_estimateCCtxSize(zparams)); + return result; +} + +PyDoc_STRVAR(CompressionParameters__doc__, +"CompressionParameters: low-level control over zstd compression"); + +static PyObject* CompressionParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) { + CompressionParametersObject* self; + unsigned windowLog; + unsigned chainLog; + unsigned hashLog; + unsigned searchLog; + unsigned searchLength; + unsigned targetLength; + unsigned strategy; + + if (!PyArg_ParseTuple(args, "IIIIIII", &windowLog, &chainLog, &hashLog, &searchLog, + &searchLength, &targetLength, &strategy)) { + return NULL; + } + + if (windowLog < ZSTD_WINDOWLOG_MIN || windowLog > ZSTD_WINDOWLOG_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid window log value"); + return NULL; + } + + if (chainLog < ZSTD_CHAINLOG_MIN || chainLog > ZSTD_CHAINLOG_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid chain log value"); + return NULL; + } + + if (hashLog < ZSTD_HASHLOG_MIN || hashLog > ZSTD_HASHLOG_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid hash log value"); + return NULL; + } + + if (searchLog < ZSTD_SEARCHLOG_MIN || searchLog > ZSTD_SEARCHLOG_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid search log value"); + return NULL; + } + + if (searchLength < ZSTD_SEARCHLENGTH_MIN || searchLength > ZSTD_SEARCHLENGTH_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid search length value"); + return NULL; + } + + if (targetLength < ZSTD_TARGETLENGTH_MIN || targetLength > ZSTD_TARGETLENGTH_MAX) { + PyErr_SetString(PyExc_ValueError, "invalid target length value"); + return NULL; + } + + if (strategy < ZSTD_fast || strategy > ZSTD_btopt) { + PyErr_SetString(PyExc_ValueError, "invalid strategy value"); + return NULL; + } + + self = (CompressionParametersObject*)subtype->tp_alloc(subtype, 1); + if (!self) { + return NULL; + } + + self->windowLog = windowLog; + self->chainLog = chainLog; + self->hashLog = hashLog; + self->searchLog = searchLog; + self->searchLength = searchLength; + self->targetLength = targetLength; + self->strategy = strategy; + + return (PyObject*)self; +} + +static void CompressionParameters_dealloc(PyObject* self) { + PyObject_Del(self); +} + +static Py_ssize_t CompressionParameters_length(PyObject* self) { + return 7; +}; + +static PyObject* CompressionParameters_item(PyObject* o, Py_ssize_t i) { + CompressionParametersObject* self = (CompressionParametersObject*)o; + + switch (i) { + case 0: + return PyLong_FromLong(self->windowLog); + case 1: + return PyLong_FromLong(self->chainLog); + case 2: + return PyLong_FromLong(self->hashLog); + case 3: + return PyLong_FromLong(self->searchLog); + case 4: + return PyLong_FromLong(self->searchLength); + case 5: + return PyLong_FromLong(self->targetLength); + case 6: + return PyLong_FromLong(self->strategy); + default: + PyErr_SetString(PyExc_IndexError, "index out of range"); + return NULL; + } +} + +static PySequenceMethods CompressionParameters_sq = { + CompressionParameters_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + CompressionParameters_item, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +PyTypeObject CompressionParametersType = { + PyVarObject_HEAD_INIT(NULL, 0) + "CompressionParameters", /* tp_name */ + sizeof(CompressionParametersObject), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)CompressionParameters_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &CompressionParameters_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + CompressionParameters__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + CompressionParameters_new, /* tp_new */ +}; + +void compressionparams_module_init(PyObject* mod) { + Py_TYPE(&CompressionParametersType) = &PyType_Type; + if (PyType_Ready(&CompressionParametersType) < 0) { + return; + } + + Py_IncRef((PyObject*)&CompressionParametersType); + PyModule_AddObject(mod, "CompressionParameters", + (PyObject*)&CompressionParametersType); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressionwriter.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,235 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompresssionWriter__doc__, +"""A context manager used for writing compressed output to a writer.\n" +); + +static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter* self) { + Py_XDECREF(self->compressor); + Py_XDECREF(self->writer); + + if (self->cstream) { + ZSTD_freeCStream(self->cstream); + self->cstream = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) { + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->cstream = CStream_from_ZstdCompressor(self->compressor, self->sourceSize); + if (!self->cstream) { + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject*)self; +} + +static PyObject* ZstdCompressionWriter_exit(ZstdCompressionWriter* self, PyObject* args) { + PyObject* exc_type; + PyObject* exc_value; + PyObject* exc_tb; + size_t zresult; + + ZSTD_outBuffer output; + PyObject* res; + + if (!PyArg_ParseTuple(args, "OOO", &exc_type, &exc_value, &exc_tb)) { + return NULL; + } + + self->entered = 0; + + if (self->cstream && exc_type == Py_None && exc_value == Py_None && + exc_tb == Py_None) { + + output.dst = malloc(self->outSize); + if (!output.dst) { + return PyErr_NoMemory(); + } + output.size = self->outSize; + output.pos = 0; + + while (1) { + zresult = ZSTD_endStream(self->cstream, &output); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + free(output.dst); + return NULL; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(res); + } + + if (!zresult) { + break; + } + + output.pos = 0; + } + + free(output.dst); + ZSTD_freeCStream(self->cstream); + self->cstream = NULL; + } + + Py_RETURN_FALSE; +} + +static PyObject* ZstdCompressionWriter_memory_size(ZstdCompressionWriter* self) { + if (!self->cstream) { + PyErr_SetString(ZstdError, "cannot determine size of an inactive compressor; " + "call when a context manager is active"); + return NULL; + } + + return PyLong_FromSize_t(ZSTD_sizeof_CStream(self->cstream)); +} + +static PyObject* ZstdCompressionWriter_write(ZstdCompressionWriter* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + size_t zresult; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject* res; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { +#else + if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { +#endif + return NULL; + } + + if (!self->entered) { + PyErr_SetString(ZstdError, "compress must be called from an active context manager"); + return NULL; + } + + output.dst = malloc(self->outSize); + if (!output.dst) { + return PyErr_NoMemory(); + } + output.size = self->outSize; + output.pos = 0; + + input.src = source; + input.size = sourceSize; + input.pos = 0; + + while ((ssize_t)input.pos < sourceSize) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream(self->cstream, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + free(output.dst); + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + /* Copy data from output buffer to writer. */ + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(res); + } + output.pos = 0; + } + + free(output.dst); + + /* TODO return bytes written */ + Py_RETURN_NONE; + } + +static PyMethodDef ZstdCompressionWriter_methods[] = { + { "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a compression context.") }, + { "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a compression context.") }, + { "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS, + PyDoc_STR("Obtain the memory size of the underlying compressor") }, + { "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS, + PyDoc_STR("Compress data") }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressionWriterType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionWriter", /* tp_name */ + sizeof(ZstdCompressionWriter), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionWriter_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompresssionWriter__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionWriter_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressionwriter_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressionWriterType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionWriterType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressobj.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,205 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompressionObj__doc__, +"Perform compression using a standard library compatible API.\n" +); + +static void ZstdCompressionObj_dealloc(ZstdCompressionObj* self) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + + if (self->cstream) { + ZSTD_freeCStream(self->cstream); + self->cstream = NULL; + } + + Py_XDECREF(self->compressor); + + PyObject_Del(self); +} + +static PyObject* ZstdCompressionObj_compress(ZstdCompressionObj* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + ZSTD_inBuffer input; + size_t zresult; + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + + if (self->flushed) { + PyErr_SetString(ZstdError, "cannot call compress() after flush() has been called"); + return NULL; + } + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { +#else + if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { +#endif + return NULL; + } + + input.src = source; + input.size = sourceSize; + input.pos = 0; + + while ((ssize_t)input.pos < sourceSize) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream(self->cstream, &self->output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) { + return NULL; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + self->output.dst, self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + if (!result) { + return NULL; + } + } + + self->output.pos = 0; + } + } + + if (result) { + return result; + } + else { + return PyBytes_FromString(""); + } +} + +static PyObject* ZstdCompressionObj_flush(ZstdCompressionObj* self) { + size_t zresult; + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + + if (self->flushed) { + PyErr_SetString(ZstdError, "flush() already called"); + return NULL; + } + + self->flushed = 1; + + while (1) { + zresult = ZSTD_endStream(self->cstream, &self->output); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (self->output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) { + return NULL; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + self->output.dst, self->output.pos); + } + else { + result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + if (!result) { + return NULL; + } + } + + self->output.pos = 0; + } + + if (!zresult) { + break; + } + } + + ZSTD_freeCStream(self->cstream); + self->cstream = NULL; + + if (result) { + return result; + } + else { + return PyBytes_FromString(""); + } +} + +static PyMethodDef ZstdCompressionObj_methods[] = { + { "compress", (PyCFunction)ZstdCompressionObj_compress, METH_VARARGS, + PyDoc_STR("compress data") }, + { "flush", (PyCFunction)ZstdCompressionObj_flush, METH_NOARGS, + PyDoc_STR("finish compression operation") }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressionObjType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressionObj", /* tp_name */ + sizeof(ZstdCompressionObj), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressionObj_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressionObj__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressionObj_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressobj_module_init(PyObject* module) { + Py_TYPE(&ZstdCompressionObjType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressionObjType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressor.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,757 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +/** +* Initialize a zstd CStream from a ZstdCompressor instance. +* +* Returns a ZSTD_CStream on success or NULL on failure. If NULL, a Python +* exception will be set. +*/ +ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize) { + ZSTD_CStream* cstream; + ZSTD_parameters zparams; + void* dictData = NULL; + size_t dictSize = 0; + size_t zresult; + + cstream = ZSTD_createCStream(); + if (!cstream) { + PyErr_SetString(ZstdError, "cannot create CStream"); + return NULL; + } + + if (compressor->dict) { + dictData = compressor->dict->dictData; + dictSize = compressor->dict->dictSize; + } + + memset(&zparams, 0, sizeof(zparams)); + if (compressor->cparams) { + ztopy_compression_parameters(compressor->cparams, &zparams.cParams); + /* Do NOT call ZSTD_adjustCParams() here because the compression params + come from the user. */ + } + else { + zparams.cParams = ZSTD_getCParams(compressor->compressionLevel, sourceSize, dictSize); + } + + zparams.fParams = compressor->fparams; + + zresult = ZSTD_initCStream_advanced(cstream, dictData, dictSize, zparams, sourceSize); + + if (ZSTD_isError(zresult)) { + ZSTD_freeCStream(cstream); + PyErr_Format(ZstdError, "cannot init CStream: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + return cstream; +} + + +PyDoc_STRVAR(ZstdCompressor__doc__, +"ZstdCompressor(level=None, dict_data=None, compression_params=None)\n" +"\n" +"Create an object used to perform Zstandard compression.\n" +"\n" +"An instance can compress data various ways. Instances can be used multiple\n" +"times. Each compression operation will use the compression parameters\n" +"defined at construction time.\n" +"\n" +"Compression can be configured via the following names arguments:\n" +"\n" +"level\n" +" Integer compression level.\n" +"dict_data\n" +" A ``ZstdCompressionDict`` to be used to compress with dictionary data.\n" +"compression_params\n" +" A ``CompressionParameters`` instance defining low-level compression" +" parameters. If defined, this will overwrite the ``level`` argument.\n" +"write_checksum\n" +" If True, a 4 byte content checksum will be written with the compressed\n" +" data, allowing the decompressor to perform content verification.\n" +"write_content_size\n" +" If True, the decompressed content size will be included in the header of\n" +" the compressed data. This data will only be written if the compressor\n" +" knows the size of the input data.\n" +"write_dict_id\n" +" Determines whether the dictionary ID will be written into the compressed\n" +" data. Defaults to True. Only adds content to the compressed data if\n" +" a dictionary is being used.\n" +); + +static int ZstdCompressor_init(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "level", + "dict_data", + "compression_params", + "write_checksum", + "write_content_size", + "write_dict_id", + NULL + }; + + int level = 3; + ZstdCompressionDict* dict = NULL; + CompressionParametersObject* params = NULL; + PyObject* writeChecksum = NULL; + PyObject* writeContentSize = NULL; + PyObject* writeDictID = NULL; + + self->dict = NULL; + self->cparams = NULL; + self->cdict = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOO", kwlist, + &level, &ZstdCompressionDictType, &dict, + &CompressionParametersType, ¶ms, + &writeChecksum, &writeContentSize, &writeDictID)) { + return -1; + } + + if (level < 1) { + PyErr_SetString(PyExc_ValueError, "level must be greater than 0"); + return -1; + } + + if (level > ZSTD_maxCLevel()) { + PyErr_Format(PyExc_ValueError, "level must be less than %d", + ZSTD_maxCLevel() + 1); + return -1; + } + + self->compressionLevel = level; + + if (dict) { + self->dict = dict; + Py_INCREF(dict); + } + + if (params) { + self->cparams = params; + Py_INCREF(params); + } + + memset(&self->fparams, 0, sizeof(self->fparams)); + + if (writeChecksum && PyObject_IsTrue(writeChecksum)) { + self->fparams.checksumFlag = 1; + } + if (writeContentSize && PyObject_IsTrue(writeContentSize)) { + self->fparams.contentSizeFlag = 1; + } + if (writeDictID && PyObject_Not(writeDictID)) { + self->fparams.noDictIDFlag = 1; + } + + return 0; +} + +static void ZstdCompressor_dealloc(ZstdCompressor* self) { + Py_XDECREF(self->cparams); + Py_XDECREF(self->dict); + + if (self->cdict) { + ZSTD_freeCDict(self->cdict); + self->cdict = NULL; + } + + PyObject_Del(self); +} + +PyDoc_STRVAR(ZstdCompressor_copy_stream__doc__, +"copy_stream(ifh, ofh[, size=0, read_size=default, write_size=default])\n" +"compress data between streams\n" +"\n" +"Data will be read from ``ifh``, compressed, and written to ``ofh``.\n" +"``ifh`` must have a ``read(size)`` method. ``ofh`` must have a ``write(data)``\n" +"method.\n" +"\n" +"An optional ``size`` argument specifies the size of the source stream.\n" +"If defined, compression parameters will be tuned based on the size.\n" +"\n" +"Optional arguments ``read_size`` and ``write_size`` define the chunk sizes\n" +"of ``read()`` and ``write()`` operations, respectively. By default, they use\n" +"the default compression stream input and output sizes, respectively.\n" +); + +static PyObject* ZstdCompressor_copy_stream(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "ifh", + "ofh", + "size", + "read_size", + "write_size", + NULL + }; + + PyObject* source; + PyObject* dest; + Py_ssize_t sourceSize = 0; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZSTD_CStream* cstream; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char* readBuffer; + Py_ssize_t readSize; + PyObject* readResult; + PyObject* res = NULL; + size_t zresult; + PyObject* writeResult; + PyObject* totalReadPy; + PyObject* totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nkk", kwlist, &source, &dest, &sourceSize, + &inSize, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); + return NULL; + } + + cstream = CStream_from_ZstdCompressor(self, sourceSize); + if (!cstream) { + res = NULL; + goto finally; + } + + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + while (1) { + /* Try to read from source stream. */ + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to compressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream(cstream, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + res = NULL; + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + } + + /* We've finished reading. Now flush the compressor stream. */ + while (1) { + zresult = ZSTD_endStream(cstream, &output); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + totalWrite += output.pos; + Py_XDECREF(writeResult); + output.pos = 0; + } + + if (!zresult) { + break; + } + } + + ZSTD_freeCStream(cstream); + cstream = NULL; + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DecRef(totalReadPy); + Py_DecRef(totalWritePy); + +finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + if (cstream) { + ZSTD_freeCStream(cstream); + } + + return res; +} + +PyDoc_STRVAR(ZstdCompressor_compress__doc__, +"compress(data)\n" +"\n" +"Compress data in a single operation.\n" +"\n" +"This is the simplest mechanism to perform compression: simply pass in a\n" +"value and get a compressed value back. It is almost the most prone to abuse.\n" +"The input and output values must fit in memory, so passing in very large\n" +"values can result in excessive memory usage. For this reason, one of the\n" +"streaming based APIs is preferred for larger values.\n" +); + +static PyObject* ZstdCompressor_compress(ZstdCompressor* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + size_t destSize; + ZSTD_CCtx* cctx; + PyObject* output; + char* dest; + void* dictData = NULL; + size_t dictSize = 0; + size_t zresult; + ZSTD_parameters zparams; + ZSTD_customMem zmem; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { +#else + if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { +#endif + return NULL; + } + + destSize = ZSTD_compressBound(sourceSize); + output = PyBytes_FromStringAndSize(NULL, destSize); + if (!output) { + return NULL; + } + + dest = PyBytes_AsString(output); + + cctx = ZSTD_createCCtx(); + if (!cctx) { + Py_DECREF(output); + PyErr_SetString(ZstdError, "could not create CCtx"); + return NULL; + } + + if (self->dict) { + dictData = self->dict->dictData; + dictSize = self->dict->dictSize; + } + + memset(&zparams, 0, sizeof(zparams)); + if (!self->cparams) { + zparams.cParams = ZSTD_getCParams(self->compressionLevel, sourceSize, dictSize); + } + else { + ztopy_compression_parameters(self->cparams, &zparams.cParams); + /* Do NOT call ZSTD_adjustCParams() here because the compression params + come from the user. */ + } + + zparams.fParams = self->fparams; + + /* The raw dict data has to be processed before it can be used. Since this + adds overhead - especially if multiple dictionary compression operations + are performed on the same ZstdCompressor instance - we create a + ZSTD_CDict once and reuse it for all operations. */ + + /* TODO the zparams (which can be derived from the source data size) used + on first invocation are effectively reused for subsequent operations. This + may not be appropriate if input sizes vary significantly and could affect + chosen compression parameters. + https://github.com/facebook/zstd/issues/358 tracks this issue. */ + if (dictData && !self->cdict) { + Py_BEGIN_ALLOW_THREADS + memset(&zmem, 0, sizeof(zmem)); + self->cdict = ZSTD_createCDict_advanced(dictData, dictSize, zparams, zmem); + Py_END_ALLOW_THREADS + + if (!self->cdict) { + Py_DECREF(output); + ZSTD_freeCCtx(cctx); + PyErr_SetString(ZstdError, "could not create compression dictionary"); + return NULL; + } + } + + Py_BEGIN_ALLOW_THREADS + /* By avoiding ZSTD_compress(), we don't necessarily write out content + size. This means the argument to ZstdCompressor to control frame + parameters is honored. */ + if (self->cdict) { + zresult = ZSTD_compress_usingCDict(cctx, dest, destSize, + source, sourceSize, self->cdict); + } + else { + zresult = ZSTD_compress_advanced(cctx, dest, destSize, + source, sourceSize, dictData, dictSize, zparams); + } + Py_END_ALLOW_THREADS + + ZSTD_freeCCtx(cctx); + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "cannot compress: %s", ZSTD_getErrorName(zresult)); + Py_CLEAR(output); + return NULL; + } + else { + Py_SIZE(output) = zresult; + } + + return output; +} + +PyDoc_STRVAR(ZstdCompressionObj__doc__, +"compressobj()\n" +"\n" +"Return an object exposing ``compress(data)`` and ``flush()`` methods.\n" +"\n" +"The returned object exposes an API similar to ``zlib.compressobj`` and\n" +"``bz2.BZ2Compressor`` so that callers can swap in the zstd compressor\n" +"without changing how compression is performed.\n" +); + +static ZstdCompressionObj* ZstdCompressor_compressobj(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "size", + NULL + }; + + Py_ssize_t inSize = 0; + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressionObj* result = PyObject_New(ZstdCompressionObj, &ZstdCompressionObjType); + if (!result) { + return NULL; + } + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &inSize)) { + return NULL; + } + + result->cstream = CStream_from_ZstdCompressor(self, inSize); + if (!result->cstream) { + Py_DECREF(result); + return NULL; + } + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + Py_DECREF(result); + return NULL; + } + result->output.size = outSize; + result->output.pos = 0; + + result->compressor = self; + Py_INCREF(result->compressor); + + result->flushed = 0; + + return result; +} + +PyDoc_STRVAR(ZstdCompressor_read_from__doc__, +"read_from(reader, [size=0, read_size=default, write_size=default])\n" +"Read uncompress data from a reader and return an iterator\n" +"\n" +"Returns an iterator of compressed data produced from reading from ``reader``.\n" +"\n" +"Uncompressed data will be obtained from ``reader`` by calling the\n" +"``read(size)`` method of it. The source data will be streamed into a\n" +"compressor. As compressed data is available, it will be exposed to the\n" +"iterator.\n" +"\n" +"Data is read from the source in chunks of ``read_size``. Compressed chunks\n" +"are at most ``write_size`` bytes. Both values default to the zstd input and\n" +"and output defaults, respectively.\n" +"\n" +"The caller is partially in control of how fast data is fed into the\n" +"compressor by how it consumes the returned iterator. The compressor will\n" +"not consume from the reader unless the caller consumes from the iterator.\n" +); + +static ZstdCompressorIterator* ZstdCompressor_read_from(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "reader", + "size", + "read_size", + "write_size", + NULL + }; + + PyObject* reader; + Py_ssize_t sourceSize = 0; + size_t inSize = ZSTD_CStreamInSize(); + size_t outSize = ZSTD_CStreamOutSize(); + ZstdCompressorIterator* result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nkk", kwlist, &reader, &sourceSize, + &inSize, &outSize)) { + return NULL; + } + + result = PyObject_New(ZstdCompressorIterator, &ZstdCompressorIteratorType); + if (!result) { + return NULL; + } + + result->compressor = NULL; + result->reader = NULL; + result->buffer = NULL; + result->cstream = NULL; + result->input.src = NULL; + result->output.dst = NULL; + result->readResult = NULL; + + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + result->buffer = PyMem_Malloc(sizeof(Py_buffer)); + if (!result->buffer) { + goto except; + } + + memset(result->buffer, 0, sizeof(Py_buffer)); + + if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + result->bufferOffset = 0; + sourceSize = result->buffer->len; + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms to buffer protocol"); + goto except; + } + + result->compressor = self; + Py_INCREF(result->compressor); + + result->sourceSize = sourceSize; + result->cstream = CStream_from_ZstdCompressor(self, sourceSize); + if (!result->cstream) { + goto except; + } + + result->inSize = inSize; + result->outSize = outSize; + + result->output.dst = PyMem_Malloc(outSize); + if (!result->output.dst) { + PyErr_NoMemory(); + goto except; + } + result->output.size = outSize; + result->output.pos = 0; + + result->input.src = NULL; + result->input.size = 0; + result->input.pos = 0; + + result->finishedInput = 0; + result->finishedOutput = 0; + + goto finally; + +except: + if (result->cstream) { + ZSTD_freeCStream(result->cstream); + result->cstream = NULL; + } + + Py_DecRef((PyObject*)result->compressor); + Py_DecRef(result->reader); + + Py_DECREF(result); + result = NULL; + +finally: + return result; +} + +PyDoc_STRVAR(ZstdCompressor_write_to___doc__, +"Create a context manager to write compressed data to an object.\n" +"\n" +"The passed object must have a ``write()`` method.\n" +"\n" +"The caller feeds input data to the object by calling ``compress(data)``.\n" +"Compressed data is written to the argument given to this function.\n" +"\n" +"The function takes an optional ``size`` argument indicating the total size\n" +"of the eventual input. If specified, the size will influence compression\n" +"parameter tuning and could result in the size being written into the\n" +"header of the compressed data.\n" +"\n" +"An optional ``write_size`` argument is also accepted. It defines the maximum\n" +"byte size of chunks fed to ``write()``. By default, it uses the zstd default\n" +"for a compressor output stream.\n" +); + +static ZstdCompressionWriter* ZstdCompressor_write_to(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "writer", + "size", + "write_size", + NULL + }; + + PyObject* writer; + ZstdCompressionWriter* result; + Py_ssize_t sourceSize = 0; + size_t outSize = ZSTD_CStreamOutSize(); + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nk", kwlist, &writer, &sourceSize, + &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); + return NULL; + } + + result = PyObject_New(ZstdCompressionWriter, &ZstdCompressionWriterType); + if (!result) { + return NULL; + } + + result->compressor = self; + Py_INCREF(result->compressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->sourceSize = sourceSize; + + result->outSize = outSize; + + result->entered = 0; + result->cstream = NULL; + + return result; +} + +static PyMethodDef ZstdCompressor_methods[] = { + { "compress", (PyCFunction)ZstdCompressor_compress, METH_VARARGS, + ZstdCompressor_compress__doc__ }, + { "compressobj", (PyCFunction)ZstdCompressor_compressobj, + METH_VARARGS | METH_KEYWORDS, ZstdCompressionObj__doc__ }, + { "copy_stream", (PyCFunction)ZstdCompressor_copy_stream, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_copy_stream__doc__ }, + { "read_from", (PyCFunction)ZstdCompressor_read_from, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_read_from__doc__ }, + { "write_to", (PyCFunction)ZstdCompressor_write_to, + METH_VARARGS | METH_KEYWORDS, ZstdCompressor_write_to___doc__ }, + { NULL, NULL } +}; + +PyTypeObject ZstdCompressorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressor", /* tp_name */ + sizeof(ZstdCompressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressor_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdCompressor_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)ZstdCompressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressor_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressorType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressorType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdCompressorType); + PyModule_AddObject(mod, "ZstdCompressor", + (PyObject*)&ZstdCompressorType); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/compressoriterator.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,234 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdCompressorIterator__doc__, +"Represents an iterator of compressed data.\n" +); + +static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator* self) { + Py_XDECREF(self->readResult); + Py_XDECREF(self->compressor); + Py_XDECREF(self->reader); + + if (self->buffer) { + PyBuffer_Release(self->buffer); + PyMem_FREE(self->buffer); + self->buffer = NULL; + } + + if (self->cstream) { + ZSTD_freeCStream(self->cstream); + self->cstream = NULL; + } + + if (self->output.dst) { + PyMem_Free(self->output.dst); + self->output.dst = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdCompressorIterator_iter(PyObject* self) { + Py_INCREF(self); + return self; +} + +static PyObject* ZstdCompressorIterator_iternext(ZstdCompressorIterator* self) { + size_t zresult; + PyObject* readResult = NULL; + PyObject* chunk; + char* readBuffer; + Py_ssize_t readSize = 0; + Py_ssize_t bufferRemaining; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + +feedcompressor: + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input); + Py_END_ALLOW_THREADS + + /* Release the Python object holding the input buffer. */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_DECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + /* If it produced output data, emit it. */ + if (self->output.pos) { + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + } + + /* We should never have output data sitting around after a previous call. */ + assert(self->output.pos == 0); + + /* The code above should have either emitted a chunk and returned or consumed + the entire input buffer. So the state of the input buffer is not + relevant. */ + if (!self->finishedInput) { + if (self->reader) { + readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer && self->buffer->buf); + + /* Only support contiguous C arrays. */ + assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL); + assert(self->buffer->itemsize == 1); + + readBuffer = (char*)self->buffer->buf + self->bufferOffset; + bufferRemaining = self->buffer->len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (0 == readSize) { + Py_XDECREF(readResult); + self->finishedInput = 1; + } + else { + self->readResult = readResult; + } + } + + /* EOF */ + if (0 == readSize) { + zresult = ZSTD_endStream(self->cstream, &self->output); + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "error ending compression stream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->output.pos); + + if (0 == zresult) { + self->finishedOutput = 1; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; + } + + /* New data from reader. Feed into compressor. */ + self->input.src = readBuffer; + self->input.size = readSize; + self->input.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input); + Py_END_ALLOW_THREADS + + /* The input buffer currently points to memory managed by Python + (readBuffer). This object was allocated by this function. If it wasn't + fully consumed, we need to release it in a subsequent function call. + If it is fully consumed, do that now. + */ + if (self->input.pos == self->input.size) { + self->input.src = NULL; + self->input.pos = 0; + self->input.size = 0; + Py_XDECREF(self->readResult); + self->readResult = NULL; + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); + return NULL; + } + + assert(self->input.pos <= self->input.size); + + /* If we didn't write anything, start the process over. */ + if (0 == self->output.pos) { + goto feedcompressor; + } + + chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); + self->output.pos = 0; + return chunk; +} + +PyTypeObject ZstdCompressorIteratorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdCompressorIterator", /* tp_name */ + sizeof(ZstdCompressorIterator), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdCompressorIterator_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdCompressorIterator__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + ZstdCompressorIterator_iter, /* tp_iter */ + (iternextfunc)ZstdCompressorIterator_iternext, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void compressoriterator_module_init(PyObject* mod) { + Py_TYPE(&ZstdCompressorIteratorType) = &PyType_Type; + if (PyType_Ready(&ZstdCompressorIteratorType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/constants.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,84 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +static char frame_header[] = { + '\x28', + '\xb5', + '\x2f', + '\xfd', +}; + +void constants_module_init(PyObject* mod) { + PyObject* version; + PyObject* zstdVersion; + PyObject* frameHeader; + +#if PY_MAJOR_VERSION >= 3 + version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION); +#else + version = PyString_FromString(PYTHON_ZSTANDARD_VERSION); +#endif + Py_INCREF(version); + PyModule_AddObject(mod, "__version__", version); + + ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL); + PyModule_AddObject(mod, "ZstdError", ZstdError); + + /* For now, the version is a simple tuple instead of a dedicated type. */ + zstdVersion = PyTuple_New(3); + PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR)); + PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR)); + PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE)); + Py_IncRef(zstdVersion); + PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion); + + frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header)); + if (frameHeader) { + PyModule_AddObject(mod, "FRAME_HEADER", frameHeader); + } + else { + PyErr_Format(PyExc_ValueError, "could not create frame header object"); + } + + PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_CStreamInSize()); + PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_CStreamOutSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE", + (long)ZSTD_DStreamInSize()); + PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE", + (long)ZSTD_DStreamOutSize()); + + PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER); + PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN); + PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX); + PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN); + PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX); + PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN); + PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX); + PyModule_AddIntConstant(mod, "HASHLOG3_MAX", ZSTD_HASHLOG3_MAX); + PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN); + PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX); + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_SEARCHLENGTH_MIN); + PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_SEARCHLENGTH_MAX); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN); + PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX); + + PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast); + PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast); + PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy); + PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2); + PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/decompressionwriter.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,187 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdDecompressionWriter__doc, +"""A context manager used for writing decompressed output.\n" +); + +static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter* self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->writer); + + if (self->dstream) { + ZSTD_freeDStream(self->dstream); + self->dstream = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) { + if (self->entered) { + PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); + return NULL; + } + + self->dstream = DStream_from_ZstdDecompressor(self->decompressor); + if (!self->dstream) { + return NULL; + } + + self->entered = 1; + + Py_INCREF(self); + return (PyObject*)self; +} + +static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) { + self->entered = 0; + + if (self->dstream) { + ZSTD_freeDStream(self->dstream); + self->dstream = NULL; + } + + Py_RETURN_FALSE; +} + +static PyObject* ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter* self) { + if (!self->dstream) { + PyErr_SetString(ZstdError, "cannot determine size of inactive decompressor; " + "call when context manager is active"); + return NULL; + } + + return PyLong_FromSize_t(ZSTD_sizeof_DStream(self->dstream)); +} + +static PyObject* ZstdDecompressionWriter_write(ZstdDecompressionWriter* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + size_t zresult = 0; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + PyObject* res; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { +#else + if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { +#endif + return NULL; + } + + if (!self->entered) { + PyErr_SetString(ZstdError, "write must be called from an active context manager"); + return NULL; + } + + output.dst = malloc(self->outSize); + if (!output.dst) { + return PyErr_NoMemory(); + } + output.size = self->outSize; + output.pos = 0; + + input.src = source; + input.size = sourceSize; + input.pos = 0; + + while ((ssize_t)input.pos < sourceSize) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dstream, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + free(output.dst); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + res = PyObject_CallMethod(self->writer, "write", "y#", +#else + res = PyObject_CallMethod(self->writer, "write", "s#", +#endif + output.dst, output.pos); + Py_XDECREF(res); + output.pos = 0; + } + } + + free(output.dst); + + /* TODO return bytes written */ + Py_RETURN_NONE; + } + +static PyMethodDef ZstdDecompressionWriter_methods[] = { + { "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS, + PyDoc_STR("Enter a decompression context.") }, + { "__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS, + PyDoc_STR("Exit a decompression context.") }, + { "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS, + PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") }, + { "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS, + PyDoc_STR("Compress data") }, + { NULL, NULL } +}; + +PyTypeObject ZstdDecompressionWriterType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressionWriter", /* tp_name */ + sizeof(ZstdDecompressionWriter),/* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdDecompressionWriter_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdDecompressionWriter__doc, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + ZstdDecompressionWriter_methods,/* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressionwriter_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressionWriterType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressionWriterType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/decompressobj.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,170 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +PyDoc_STRVAR(DecompressionObj__doc__, +"Perform decompression using a standard library compatible API.\n" +); + +static void DecompressionObj_dealloc(ZstdDecompressionObj* self) { + if (self->dstream) { + ZSTD_freeDStream(self->dstream); + self->dstream = NULL; + } + + Py_XDECREF(self->decompressor); + + PyObject_Del(self); +} + +static PyObject* DecompressionObj_decompress(ZstdDecompressionObj* self, PyObject* args) { + const char* source; + Py_ssize_t sourceSize; + size_t zresult; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + size_t outSize = ZSTD_DStreamOutSize(); + PyObject* result = NULL; + Py_ssize_t resultSize = 0; + + if (self->finished) { + PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times"); + return NULL; + } + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTuple(args, "y#", +#else + if (!PyArg_ParseTuple(args, "s#", +#endif + &source, &sourceSize)) { + return NULL; + } + + input.src = source; + input.size = sourceSize; + input.pos = 0; + + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + return NULL; + } + output.size = outSize; + output.pos = 0; + + /* Read input until exhausted. */ + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dstream, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + result = NULL; + goto finally; + } + + if (0 == zresult) { + self->finished = 1; + } + + if (output.pos) { + if (result) { + resultSize = PyBytes_GET_SIZE(result); + if (-1 == _PyBytes_Resize(&result, resultSize + output.pos)) { + goto except; + } + + memcpy(PyBytes_AS_STRING(result) + resultSize, + output.dst, output.pos); + } + else { + result = PyBytes_FromStringAndSize(output.dst, output.pos); + if (!result) { + goto except; + } + } + + output.pos = 0; + } + } + + if (!result) { + result = PyBytes_FromString(""); + } + + goto finally; + +except: + Py_DecRef(result); + result = NULL; + +finally: + PyMem_Free(output.dst); + + return result; +} + +static PyMethodDef DecompressionObj_methods[] = { + { "decompress", (PyCFunction)DecompressionObj_decompress, + METH_VARARGS, PyDoc_STR("decompress data") }, + { NULL, NULL } +}; + +PyTypeObject ZstdDecompressionObjType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressionObj", /* tp_name */ + sizeof(ZstdDecompressionObj), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)DecompressionObj_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + DecompressionObj__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + DecompressionObj_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressobj_module_init(PyObject* module) { + Py_TYPE(&ZstdDecompressionObjType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressionObjType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/decompressor.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,669 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +extern PyObject* ZstdError; + +ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor) { + ZSTD_DStream* dstream; + void* dictData = NULL; + size_t dictSize = 0; + size_t zresult; + + dstream = ZSTD_createDStream(); + if (!dstream) { + PyErr_SetString(ZstdError, "could not create DStream"); + return NULL; + } + + if (decompressor->dict) { + dictData = decompressor->dict->dictData; + dictSize = decompressor->dict->dictSize; + } + + if (dictData) { + zresult = ZSTD_initDStream_usingDict(dstream, dictData, dictSize); + } + else { + zresult = ZSTD_initDStream(dstream); + } + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "could not initialize DStream: %s", + ZSTD_getErrorName(zresult)); + return NULL; + } + + return dstream; +} + +PyDoc_STRVAR(Decompressor__doc__, +"ZstdDecompressor(dict_data=None)\n" +"\n" +"Create an object used to perform Zstandard decompression.\n" +"\n" +"An instance can perform multiple decompression operations." +); + +static int Decompressor_init(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "dict_data", + NULL + }; + + ZstdCompressionDict* dict = NULL; + + self->refdctx = NULL; + self->dict = NULL; + self->ddict = NULL; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!", kwlist, + &ZstdCompressionDictType, &dict)) { + return -1; + } + + /* Instead of creating a ZSTD_DCtx for every decompression operation, + we create an instance at object creation time and recycle it via + ZSTD_copyDCTx() on each use. This means each use is a malloc+memcpy + instead of a malloc+init. */ + /* TODO lazily initialize the reference ZSTD_DCtx on first use since + not instances of ZstdDecompressor will use a ZSTD_DCtx. */ + self->refdctx = ZSTD_createDCtx(); + if (!self->refdctx) { + PyErr_NoMemory(); + goto except; + } + + if (dict) { + self->dict = dict; + Py_INCREF(dict); + } + + return 0; + +except: + if (self->refdctx) { + ZSTD_freeDCtx(self->refdctx); + self->refdctx = NULL; + } + + return -1; +} + +static void Decompressor_dealloc(ZstdDecompressor* self) { + if (self->refdctx) { + ZSTD_freeDCtx(self->refdctx); + } + + Py_XDECREF(self->dict); + + if (self->ddict) { + ZSTD_freeDDict(self->ddict); + self->ddict = NULL; + } + + PyObject_Del(self); +} + +PyDoc_STRVAR(Decompressor_copy_stream__doc__, + "copy_stream(ifh, ofh[, read_size=default, write_size=default]) -- decompress data between streams\n" + "\n" + "Compressed data will be read from ``ifh``, decompressed, and written to\n" + "``ofh``. ``ifh`` must have a ``read(size)`` method. ``ofh`` must have a\n" + "``write(data)`` method.\n" + "\n" + "The optional ``read_size`` and ``write_size`` arguments control the chunk\n" + "size of data that is ``read()`` and ``write()`` between streams. They default\n" + "to the default input and output sizes of zstd decompressor streams.\n" +); + +static PyObject* Decompressor_copy_stream(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "ifh", + "ofh", + "read_size", + "write_size", + NULL + }; + + PyObject* source; + PyObject* dest; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZSTD_DStream* dstream; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t totalRead = 0; + Py_ssize_t totalWrite = 0; + char* readBuffer; + Py_ssize_t readSize; + PyObject* readResult; + PyObject* res = NULL; + size_t zresult = 0; + PyObject* writeResult; + PyObject* totalReadPy; + PyObject* totalWritePy; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk", kwlist, &source, + &dest, &inSize, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(source, "read")) { + PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); + return NULL; + } + + if (!PyObject_HasAttrString(dest, "write")) { + PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); + return NULL; + } + + dstream = DStream_from_ZstdDecompressor(self); + if (!dstream) { + res = NULL; + goto finally; + } + + output.dst = PyMem_Malloc(outSize); + if (!output.dst) { + PyErr_NoMemory(); + res = NULL; + goto finally; + } + output.size = outSize; + output.pos = 0; + + /* Read source stream until EOF */ + while (1) { + readResult = PyObject_CallMethod(source, "read", "n", inSize); + if (!readResult) { + PyErr_SetString(ZstdError, "could not read() from source"); + goto finally; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + + /* If no data was read, we're at EOF. */ + if (0 == readSize) { + break; + } + + totalRead += readSize; + + /* Send data to decompressor */ + input.src = readBuffer; + input.size = readSize; + input.pos = 0; + + while (input.pos < input.size) { + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(dstream, &output, &input); + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "zstd decompressor error: %s", + ZSTD_getErrorName(zresult)); + res = NULL; + goto finally; + } + + if (output.pos) { +#if PY_MAJOR_VERSION >= 3 + writeResult = PyObject_CallMethod(dest, "write", "y#", +#else + writeResult = PyObject_CallMethod(dest, "write", "s#", +#endif + output.dst, output.pos); + + Py_XDECREF(writeResult); + totalWrite += output.pos; + output.pos = 0; + } + } + } + + /* Source stream is exhausted. Finish up. */ + + ZSTD_freeDStream(dstream); + dstream = NULL; + + totalReadPy = PyLong_FromSsize_t(totalRead); + totalWritePy = PyLong_FromSsize_t(totalWrite); + res = PyTuple_Pack(2, totalReadPy, totalWritePy); + Py_DecRef(totalReadPy); + Py_DecRef(totalWritePy); + + finally: + if (output.dst) { + PyMem_Free(output.dst); + } + + if (dstream) { + ZSTD_freeDStream(dstream); + } + + return res; +} + +PyDoc_STRVAR(Decompressor_decompress__doc__, +"decompress(data[, max_output_size=None]) -- Decompress data in its entirety\n" +"\n" +"This method will decompress the entirety of the argument and return the\n" +"result.\n" +"\n" +"The input bytes are expected to contain a full Zstandard frame (something\n" +"compressed with ``ZstdCompressor.compress()`` or similar). If the input does\n" +"not contain a full frame, an exception will be raised.\n" +"\n" +"If the frame header of the compressed data does not contain the content size\n" +"``max_output_size`` must be specified or ``ZstdError`` will be raised. An\n" +"allocation of size ``max_output_size`` will be performed and an attempt will\n" +"be made to perform decompression into that buffer. If the buffer is too\n" +"small or cannot be allocated, ``ZstdError`` will be raised. The buffer will\n" +"be resized if it is too large.\n" +"\n" +"Uncompressed data could be much larger than compressed data. As a result,\n" +"calling this function could result in a very large memory allocation being\n" +"performed to hold the uncompressed data. Therefore it is **highly**\n" +"recommended to use a streaming decompression method instead of this one.\n" +); + +PyObject* Decompressor_decompress(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "data", + "max_output_size", + NULL + }; + + const char* source; + Py_ssize_t sourceSize; + Py_ssize_t maxOutputSize = 0; + unsigned long long decompressedSize; + size_t destCapacity; + PyObject* result = NULL; + ZSTD_DCtx* dctx = NULL; + void* dictData = NULL; + size_t dictSize = 0; + size_t zresult; + +#if PY_MAJOR_VERSION >= 3 + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|n", kwlist, +#else + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|n", kwlist, +#endif + &source, &sourceSize, &maxOutputSize)) { + return NULL; + } + + dctx = PyMem_Malloc(ZSTD_sizeof_DCtx(self->refdctx)); + if (!dctx) { + PyErr_NoMemory(); + return NULL; + } + + ZSTD_copyDCtx(dctx, self->refdctx); + + if (self->dict) { + dictData = self->dict->dictData; + dictSize = self->dict->dictSize; + } + + if (dictData && !self->ddict) { + Py_BEGIN_ALLOW_THREADS + self->ddict = ZSTD_createDDict(dictData, dictSize); + Py_END_ALLOW_THREADS + + if (!self->ddict) { + PyErr_SetString(ZstdError, "could not create decompression dict"); + goto except; + } + } + + decompressedSize = ZSTD_getDecompressedSize(source, sourceSize); + /* 0 returned if content size not in the zstd frame header */ + if (0 == decompressedSize) { + if (0 == maxOutputSize) { + PyErr_SetString(ZstdError, "input data invalid or missing content size " + "in frame header"); + goto except; + } + else { + result = PyBytes_FromStringAndSize(NULL, maxOutputSize); + destCapacity = maxOutputSize; + } + } + else { + result = PyBytes_FromStringAndSize(NULL, decompressedSize); + destCapacity = decompressedSize; + } + + if (!result) { + goto except; + } + + Py_BEGIN_ALLOW_THREADS + if (self->ddict) { + zresult = ZSTD_decompress_usingDDict(dctx, PyBytes_AsString(result), destCapacity, + source, sourceSize, self->ddict); + } + else { + zresult = ZSTD_decompressDCtx(dctx, PyBytes_AsString(result), destCapacity, source, sourceSize); + } + Py_END_ALLOW_THREADS + + if (ZSTD_isError(zresult)) { + PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult)); + goto except; + } + else if (decompressedSize && zresult != decompressedSize) { + PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu", + zresult, decompressedSize); + goto except; + } + else if (zresult < destCapacity) { + if (_PyBytes_Resize(&result, zresult)) { + goto except; + } + } + + goto finally; + +except: + Py_DecRef(result); + result = NULL; + +finally: + if (dctx) { + PyMem_FREE(dctx); + } + + return result; +} + +PyDoc_STRVAR(Decompressor_decompressobj__doc__, +"decompressobj()\n" +"\n" +"Incrementally feed data into a decompressor.\n" +"\n" +"The returned object exposes a ``decompress(data)`` method. This makes it\n" +"compatible with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor`` so that\n" +"callers can swap in the zstd decompressor while using the same API.\n" +); + +static ZstdDecompressionObj* Decompressor_decompressobj(ZstdDecompressor* self) { + ZstdDecompressionObj* result = PyObject_New(ZstdDecompressionObj, &ZstdDecompressionObjType); + if (!result) { + return NULL; + } + + result->dstream = DStream_from_ZstdDecompressor(self); + if (!result->dstream) { + Py_DecRef((PyObject*)result); + return NULL; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->finished = 0; + + return result; +} + +PyDoc_STRVAR(Decompressor_read_from__doc__, +"read_from(reader[, read_size=default, write_size=default, skip_bytes=0])\n" +"Read compressed data and return an iterator\n" +"\n" +"Returns an iterator of decompressed data chunks produced from reading from\n" +"the ``reader``.\n" +"\n" +"Compressed data will be obtained from ``reader`` by calling the\n" +"``read(size)`` method of it. The source data will be streamed into a\n" +"decompressor. As decompressed data is available, it will be exposed to the\n" +"returned iterator.\n" +"\n" +"Data is ``read()`` in chunks of size ``read_size`` and exposed to the\n" +"iterator in chunks of size ``write_size``. The default values are the input\n" +"and output sizes for a zstd streaming decompressor.\n" +"\n" +"There is also support for skipping the first ``skip_bytes`` of data from\n" +"the source.\n" +); + +static ZstdDecompressorIterator* Decompressor_read_from(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "reader", + "read_size", + "write_size", + "skip_bytes", + NULL + }; + + PyObject* reader; + size_t inSize = ZSTD_DStreamInSize(); + size_t outSize = ZSTD_DStreamOutSize(); + ZstdDecompressorIterator* result; + size_t skipBytes = 0; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk", kwlist, &reader, + &inSize, &outSize, &skipBytes)) { + return NULL; + } + + if (skipBytes >= inSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes must be smaller than read_size"); + return NULL; + } + + result = PyObject_New(ZstdDecompressorIterator, &ZstdDecompressorIteratorType); + if (!result) { + return NULL; + } + + result->decompressor = NULL; + result->reader = NULL; + result->buffer = NULL; + result->dstream = NULL; + result->input.src = NULL; + result->output.dst = NULL; + + if (PyObject_HasAttrString(reader, "read")) { + result->reader = reader; + Py_INCREF(result->reader); + } + else if (1 == PyObject_CheckBuffer(reader)) { + /* Object claims it is a buffer. Try to get a handle to it. */ + result->buffer = PyMem_Malloc(sizeof(Py_buffer)); + if (!result->buffer) { + goto except; + } + + memset(result->buffer, 0, sizeof(Py_buffer)); + + if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) { + goto except; + } + + result->bufferOffset = 0; + } + else { + PyErr_SetString(PyExc_ValueError, + "must pass an object with a read() method or conforms to buffer protocol"); + goto except; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->inSize = inSize; + result->outSize = outSize; + result->skipBytes = skipBytes; + + result->dstream = DStream_from_ZstdDecompressor(self); + if (!result->dstream) { + goto except; + } + + result->input.src = PyMem_Malloc(inSize); + if (!result->input.src) { + PyErr_NoMemory(); + goto except; + } + result->input.size = 0; + result->input.pos = 0; + + result->output.dst = NULL; + result->output.size = 0; + result->output.pos = 0; + + result->readCount = 0; + result->finishedInput = 0; + result->finishedOutput = 0; + + goto finally; + +except: + if (result->reader) { + Py_DECREF(result->reader); + result->reader = NULL; + } + + if (result->buffer) { + PyBuffer_Release(result->buffer); + Py_DECREF(result->buffer); + result->buffer = NULL; + } + + Py_DECREF(result); + result = NULL; + +finally: + + return result; +} + +PyDoc_STRVAR(Decompressor_write_to__doc__, +"Create a context manager to write decompressed data to an object.\n" +"\n" +"The passed object must have a ``write()`` method.\n" +"\n" +"The caller feeds intput data to the object by calling ``write(data)``.\n" +"Decompressed data is written to the argument given as it is decompressed.\n" +"\n" +"An optional ``write_size`` argument defines the size of chunks to\n" +"``write()`` to the writer. It defaults to the default output size for a zstd\n" +"streaming decompressor.\n" +); + +static ZstdDecompressionWriter* Decompressor_write_to(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { + static char* kwlist[] = { + "writer", + "write_size", + NULL + }; + + PyObject* writer; + size_t outSize = ZSTD_DStreamOutSize(); + ZstdDecompressionWriter* result; + + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k", kwlist, &writer, &outSize)) { + return NULL; + } + + if (!PyObject_HasAttrString(writer, "write")) { + PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); + return NULL; + } + + result = PyObject_New(ZstdDecompressionWriter, &ZstdDecompressionWriterType); + if (!result) { + return NULL; + } + + result->decompressor = self; + Py_INCREF(result->decompressor); + + result->writer = writer; + Py_INCREF(result->writer); + + result->outSize = outSize; + + result->entered = 0; + result->dstream = NULL; + + return result; +} + +static PyMethodDef Decompressor_methods[] = { + { "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS | METH_KEYWORDS, + Decompressor_copy_stream__doc__ }, + { "decompress", (PyCFunction)Decompressor_decompress, METH_VARARGS | METH_KEYWORDS, + Decompressor_decompress__doc__ }, + { "decompressobj", (PyCFunction)Decompressor_decompressobj, METH_NOARGS, + Decompressor_decompressobj__doc__ }, + { "read_from", (PyCFunction)Decompressor_read_from, METH_VARARGS | METH_KEYWORDS, + Decompressor_read_from__doc__ }, + { "write_to", (PyCFunction)Decompressor_write_to, METH_VARARGS | METH_KEYWORDS, + Decompressor_write_to__doc__ }, + { NULL, NULL } +}; + +PyTypeObject ZstdDecompressorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressor", /* tp_name */ + sizeof(ZstdDecompressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)Decompressor_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + Decompressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + Decompressor_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)Decompressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressor_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressorType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressorType) < 0) { + return; + } + + Py_INCREF((PyObject*)&ZstdDecompressorType); + PyModule_AddObject(mod, "ZstdDecompressor", + (PyObject*)&ZstdDecompressorType); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/decompressoriterator.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,254 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +#define min(a, b) (((a) < (b)) ? (a) : (b)) + +extern PyObject* ZstdError; + +PyDoc_STRVAR(ZstdDecompressorIterator__doc__, +"Represents an iterator of decompressed data.\n" +); + +static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator* self) { + Py_XDECREF(self->decompressor); + Py_XDECREF(self->reader); + + if (self->buffer) { + PyBuffer_Release(self->buffer); + PyMem_FREE(self->buffer); + self->buffer = NULL; + } + + if (self->dstream) { + ZSTD_freeDStream(self->dstream); + self->dstream = NULL; + } + + if (self->input.src) { + PyMem_Free((void*)self->input.src); + self->input.src = NULL; + } + + PyObject_Del(self); +} + +static PyObject* ZstdDecompressorIterator_iter(PyObject* self) { + Py_INCREF(self); + return self; +} + +static DecompressorIteratorResult read_decompressor_iterator(ZstdDecompressorIterator* self) { + size_t zresult; + PyObject* chunk; + DecompressorIteratorResult result; + size_t oldInputPos = self->input.pos; + + result.chunk = NULL; + + chunk = PyBytes_FromStringAndSize(NULL, self->outSize); + if (!chunk) { + result.errored = 1; + return result; + } + + self->output.dst = PyBytes_AsString(chunk); + self->output.size = self->outSize; + self->output.pos = 0; + + Py_BEGIN_ALLOW_THREADS + zresult = ZSTD_decompressStream(self->dstream, &self->output, &self->input); + Py_END_ALLOW_THREADS + + /* We're done with the pointer. Nullify to prevent anyone from getting a + handle on a Python object. */ + self->output.dst = NULL; + + if (ZSTD_isError(zresult)) { + Py_DECREF(chunk); + PyErr_Format(ZstdError, "zstd decompress error: %s", + ZSTD_getErrorName(zresult)); + result.errored = 1; + return result; + } + + self->readCount += self->input.pos - oldInputPos; + + /* Frame is fully decoded. Input exhausted and output sitting in buffer. */ + if (0 == zresult) { + self->finishedInput = 1; + self->finishedOutput = 1; + } + + /* If it produced output data, return it. */ + if (self->output.pos) { + if (self->output.pos < self->outSize) { + if (_PyBytes_Resize(&chunk, self->output.pos)) { + result.errored = 1; + return result; + } + } + } + else { + Py_DECREF(chunk); + chunk = NULL; + } + + result.errored = 0; + result.chunk = chunk; + + return result; +} + +static PyObject* ZstdDecompressorIterator_iternext(ZstdDecompressorIterator* self) { + PyObject* readResult = NULL; + char* readBuffer; + Py_ssize_t readSize; + Py_ssize_t bufferRemaining; + DecompressorIteratorResult result; + + if (self->finishedOutput) { + PyErr_SetString(PyExc_StopIteration, "output flushed"); + return NULL; + } + + /* If we have data left in the input, consume it. */ + if (self->input.pos < self->input.size) { + result = read_decompressor_iterator(self); + if (result.chunk || result.errored) { + return result.chunk; + } + + /* Else fall through to get more data from input. */ + } + +read_from_source: + + if (!self->finishedInput) { + if (self->reader) { + readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); + if (!readResult) { + return NULL; + } + + PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); + } + else { + assert(self->buffer && self->buffer->buf); + + /* Only support contiguous C arrays for now */ + assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL); + assert(self->buffer->itemsize == 1); + + /* TODO avoid memcpy() below */ + readBuffer = (char *)self->buffer->buf + self->bufferOffset; + bufferRemaining = self->buffer->len - self->bufferOffset; + readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); + self->bufferOffset += readSize; + } + + if (readSize) { + if (!self->readCount && self->skipBytes) { + assert(self->skipBytes < self->inSize); + if ((Py_ssize_t)self->skipBytes >= readSize) { + PyErr_SetString(PyExc_ValueError, + "skip_bytes larger than first input chunk; " + "this scenario is currently unsupported"); + Py_DecRef(readResult); + return NULL; + } + + readBuffer = readBuffer + self->skipBytes; + readSize -= self->skipBytes; + } + + /* Copy input into previously allocated buffer because it can live longer + than a single function call and we don't want to keep a ref to a Python + object around. This could be changed... */ + memcpy((void*)self->input.src, readBuffer, readSize); + self->input.size = readSize; + self->input.pos = 0; + } + /* No bytes on first read must mean an empty input stream. */ + else if (!self->readCount) { + self->finishedInput = 1; + self->finishedOutput = 1; + Py_DecRef(readResult); + PyErr_SetString(PyExc_StopIteration, "empty input"); + return NULL; + } + else { + self->finishedInput = 1; + } + + /* We've copied the data managed by memory. Discard the Python object. */ + Py_DecRef(readResult); + } + + result = read_decompressor_iterator(self); + if (result.errored || result.chunk) { + return result.chunk; + } + + /* No new output data. Try again unless we know there is no more data. */ + if (!self->finishedInput) { + goto read_from_source; + } + + PyErr_SetString(PyExc_StopIteration, "input exhausted"); + return NULL; +} + +PyTypeObject ZstdDecompressorIteratorType = { + PyVarObject_HEAD_INIT(NULL, 0) + "zstd.ZstdDecompressorIterator", /* tp_name */ + sizeof(ZstdDecompressorIterator), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)ZstdDecompressorIterator_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + ZstdDecompressorIterator__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + ZstdDecompressorIterator_iter, /* tp_iter */ + (iternextfunc)ZstdDecompressorIterator_iternext, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + +void decompressoriterator_module_init(PyObject* mod) { + Py_TYPE(&ZstdDecompressorIteratorType) = &PyType_Type; + if (PyType_Ready(&ZstdDecompressorIteratorType) < 0) { + return; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/dictparams.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,125 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#include "python-zstandard.h" + +PyDoc_STRVAR(DictParameters__doc__, +"DictParameters: low-level control over dictionary generation"); + +static PyObject* DictParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) { + DictParametersObject* self; + unsigned selectivityLevel; + int compressionLevel; + unsigned notificationLevel; + unsigned dictID; + + if (!PyArg_ParseTuple(args, "IiII", &selectivityLevel, &compressionLevel, + ¬ificationLevel, &dictID)) { + return NULL; + } + + self = (DictParametersObject*)subtype->tp_alloc(subtype, 1); + if (!self) { + return NULL; + } + + self->selectivityLevel = selectivityLevel; + self->compressionLevel = compressionLevel; + self->notificationLevel = notificationLevel; + self->dictID = dictID; + + return (PyObject*)self; +} + +static void DictParameters_dealloc(PyObject* self) { + PyObject_Del(self); +} + +static Py_ssize_t DictParameters_length(PyObject* self) { + return 4; +}; + +static PyObject* DictParameters_item(PyObject* o, Py_ssize_t i) { + DictParametersObject* self = (DictParametersObject*)o; + + switch (i) { + case 0: + return PyLong_FromLong(self->selectivityLevel); + case 1: + return PyLong_FromLong(self->compressionLevel); + case 2: + return PyLong_FromLong(self->notificationLevel); + case 3: + return PyLong_FromLong(self->dictID); + default: + PyErr_SetString(PyExc_IndexError, "index out of range"); + return NULL; + } +} + +static PySequenceMethods DictParameters_sq = { + DictParameters_length, /* sq_length */ + 0, /* sq_concat */ + 0, /* sq_repeat */ + DictParameters_item, /* sq_item */ + 0, /* sq_ass_item */ + 0, /* sq_contains */ + 0, /* sq_inplace_concat */ + 0 /* sq_inplace_repeat */ +}; + +PyTypeObject DictParametersType = { + PyVarObject_HEAD_INIT(NULL, 0) + "DictParameters", /* tp_name */ + sizeof(DictParametersObject), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)DictParameters_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + &DictParameters_sq, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + DictParameters__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + DictParameters_new, /* tp_new */ +}; + +void dictparams_module_init(PyObject* mod) { + Py_TYPE(&DictParametersType) = &PyType_Type; + if (PyType_Ready(&DictParametersType) < 0) { + return; + } + + Py_IncRef((PyObject*)&DictParametersType); + PyModule_AddObject(mod, "DictParameters", (PyObject*)&DictParametersType); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/c-ext/python-zstandard.h Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,172 @@ +/** +* Copyright (c) 2016-present, Gregory Szorc +* All rights reserved. +* +* This software may be modified and distributed under the terms +* of the BSD license. See the LICENSE file for details. +*/ + +#define PY_SSIZE_T_CLEAN +#include <Python.h> + +#define ZSTD_STATIC_LINKING_ONLY +#define ZDICT_STATIC_LINKING_ONLY +#include "mem.h" +#include "zstd.h" +#include "zdict.h" + +#define PYTHON_ZSTANDARD_VERSION "0.5.0" + +typedef struct { + PyObject_HEAD + unsigned windowLog; + unsigned chainLog; + unsigned hashLog; + unsigned searchLog; + unsigned searchLength; + unsigned targetLength; + ZSTD_strategy strategy; +} CompressionParametersObject; + +extern PyTypeObject CompressionParametersType; + +typedef struct { + PyObject_HEAD + unsigned selectivityLevel; + int compressionLevel; + unsigned notificationLevel; + unsigned dictID; +} DictParametersObject; + +extern PyTypeObject DictParametersType; + +typedef struct { + PyObject_HEAD + + void* dictData; + size_t dictSize; +} ZstdCompressionDict; + +extern PyTypeObject ZstdCompressionDictType; + +typedef struct { + PyObject_HEAD + + int compressionLevel; + ZstdCompressionDict* dict; + ZSTD_CDict* cdict; + CompressionParametersObject* cparams; + ZSTD_frameParameters fparams; +} ZstdCompressor; + +extern PyTypeObject ZstdCompressorType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + ZSTD_CStream* cstream; + ZSTD_outBuffer output; + int flushed; +} ZstdCompressionObj; + +extern PyTypeObject ZstdCompressionObjType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + PyObject* writer; + Py_ssize_t sourceSize; + size_t outSize; + ZSTD_CStream* cstream; + int entered; +} ZstdCompressionWriter; + +extern PyTypeObject ZstdCompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdCompressor* compressor; + PyObject* reader; + Py_buffer* buffer; + Py_ssize_t bufferOffset; + Py_ssize_t sourceSize; + size_t inSize; + size_t outSize; + + ZSTD_CStream* cstream; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + int finishedOutput; + int finishedInput; + PyObject* readResult; +} ZstdCompressorIterator; + +extern PyTypeObject ZstdCompressorIteratorType; + +typedef struct { + PyObject_HEAD + + ZSTD_DCtx* refdctx; + + ZstdCompressionDict* dict; + ZSTD_DDict* ddict; +} ZstdDecompressor; + +extern PyTypeObject ZstdDecompressorType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + ZSTD_DStream* dstream; + int finished; +} ZstdDecompressionObj; + +extern PyTypeObject ZstdDecompressionObjType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + PyObject* writer; + size_t outSize; + ZSTD_DStream* dstream; + int entered; +} ZstdDecompressionWriter; + +extern PyTypeObject ZstdDecompressionWriterType; + +typedef struct { + PyObject_HEAD + + ZstdDecompressor* decompressor; + PyObject* reader; + Py_buffer* buffer; + Py_ssize_t bufferOffset; + size_t inSize; + size_t outSize; + size_t skipBytes; + ZSTD_DStream* dstream; + ZSTD_inBuffer input; + ZSTD_outBuffer output; + Py_ssize_t readCount; + int finishedInput; + int finishedOutput; +} ZstdDecompressorIterator; + +extern PyTypeObject ZstdDecompressorIteratorType; + +typedef struct { + int errored; + PyObject* chunk; +} DecompressorIteratorResult; + +void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams); +CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args); +PyObject* estimate_compression_context_size(PyObject* self, PyObject* args); +ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize); +ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor); +ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/make_cffi.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,110 @@ +# Copyright (c) 2016-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +from __future__ import absolute_import + +import cffi +import os + + +HERE = os.path.abspath(os.path.dirname(__file__)) + +SOURCES = ['zstd/%s' % p for p in ( + 'common/entropy_common.c', + 'common/error_private.c', + 'common/fse_decompress.c', + 'common/xxhash.c', + 'common/zstd_common.c', + 'compress/fse_compress.c', + 'compress/huf_compress.c', + 'compress/zbuff_compress.c', + 'compress/zstd_compress.c', + 'decompress/huf_decompress.c', + 'decompress/zbuff_decompress.c', + 'decompress/zstd_decompress.c', + 'dictBuilder/divsufsort.c', + 'dictBuilder/zdict.c', +)] + +INCLUDE_DIRS = [os.path.join(HERE, d) for d in ( + 'zstd', + 'zstd/common', + 'zstd/compress', + 'zstd/decompress', + 'zstd/dictBuilder', +)] + +with open(os.path.join(HERE, 'zstd', 'zstd.h'), 'rb') as fh: + zstd_h = fh.read() + +ffi = cffi.FFI() +ffi.set_source('_zstd_cffi', ''' +/* needed for typedefs like U32 references in zstd.h */ +#include "mem.h" +#define ZSTD_STATIC_LINKING_ONLY +#include "zstd.h" +''', + sources=SOURCES, include_dirs=INCLUDE_DIRS) + +# Rather than define the API definitions from zstd.h inline, munge the +# source in a way that cdef() will accept. +lines = zstd_h.splitlines() +lines = [l.rstrip() for l in lines if l.strip()] + +# Strip preprocessor directives - they aren't important for our needs. +lines = [l for l in lines + if not l.startswith((b'#if', b'#else', b'#endif', b'#include'))] + +# Remove extern C block +lines = [l for l in lines if l not in (b'extern "C" {', b'}')] + +# The version #defines don't parse and aren't necessary. Strip them. +lines = [l for l in lines if not l.startswith(( + b'#define ZSTD_H_235446', + b'#define ZSTD_LIB_VERSION', + b'#define ZSTD_QUOTE', + b'#define ZSTD_EXPAND_AND_QUOTE', + b'#define ZSTD_VERSION_STRING', + b'#define ZSTD_VERSION_NUMBER'))] + +# The C parser also doesn't like some constant defines referencing +# other constants. +# TODO we pick the 64-bit constants here. We should assert somewhere +# we're compiling for 64-bit. +def fix_constants(l): + if l.startswith(b'#define ZSTD_WINDOWLOG_MAX '): + return b'#define ZSTD_WINDOWLOG_MAX 27' + elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '): + return b'#define ZSTD_CHAINLOG_MAX 28' + elif l.startswith(b'#define ZSTD_HASHLOG_MAX '): + return b'#define ZSTD_HASHLOG_MAX 27' + elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '): + return b'#define ZSTD_CHAINLOG_MAX 28' + elif l.startswith(b'#define ZSTD_CHAINLOG_MIN '): + return b'#define ZSTD_CHAINLOG_MIN 6' + elif l.startswith(b'#define ZSTD_SEARCHLOG_MAX '): + return b'#define ZSTD_SEARCHLOG_MAX 26' + elif l.startswith(b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX '): + return b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX 131072' + else: + return l +lines = map(fix_constants, lines) + +# ZSTDLIB_API isn't handled correctly. Strip it. +lines = [l for l in lines if not l.startswith(b'# define ZSTDLIB_API')] +def strip_api(l): + if l.startswith(b'ZSTDLIB_API '): + return l[len(b'ZSTDLIB_API '):] + else: + return l +lines = map(strip_api, lines) + +source = b'\n'.join(lines) +ffi.cdef(source.decode('latin1')) + + +if __name__ == '__main__': + ffi.compile()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/setup.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,62 @@ +#!/usr/bin/env python +# Copyright (c) 2016-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +from setuptools import setup + +try: + import cffi +except ImportError: + cffi = None + +import setup_zstd + +# Code for obtaining the Extension instance is in its own module to +# facilitate reuse in other projects. +extensions = [setup_zstd.get_c_extension()] + +if cffi: + import make_cffi + extensions.append(make_cffi.ffi.distutils_extension()) + +version = None + +with open('c-ext/python-zstandard.h', 'r') as fh: + for line in fh: + if not line.startswith('#define PYTHON_ZSTANDARD_VERSION'): + continue + + version = line.split()[2][1:-1] + break + +if not version: + raise Exception('could not resolve package version; ' + 'this should never happen') + +setup( + name='zstandard', + version=version, + description='Zstandard bindings for Python', + long_description=open('README.rst', 'r').read(), + url='https://github.com/indygreg/python-zstandard', + author='Gregory Szorc', + author_email='gregory.szorc@gmail.com', + license='BSD', + classifiers=[ + 'Development Status :: 4 - Beta', + 'Intended Audience :: Developers', + 'License :: OSI Approved :: BSD License', + 'Programming Language :: C', + 'Programming Language :: Python :: 2.6', + 'Programming Language :: Python :: 2.7', + 'Programming Language :: Python :: 3.3', + 'Programming Language :: Python :: 3.4', + 'Programming Language :: Python :: 3.5', + ], + keywords='zstandard zstd compression', + ext_modules=extensions, + test_suite='tests', +)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/setup_zstd.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,64 @@ +# Copyright (c) 2016-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +import os +from distutils.extension import Extension + + +zstd_sources = ['zstd/%s' % p for p in ( + 'common/entropy_common.c', + 'common/error_private.c', + 'common/fse_decompress.c', + 'common/xxhash.c', + 'common/zstd_common.c', + 'compress/fse_compress.c', + 'compress/huf_compress.c', + 'compress/zbuff_compress.c', + 'compress/zstd_compress.c', + 'decompress/huf_decompress.c', + 'decompress/zbuff_decompress.c', + 'decompress/zstd_decompress.c', + 'dictBuilder/divsufsort.c', + 'dictBuilder/zdict.c', +)] + + +zstd_includes = [ + 'c-ext', + 'zstd', + 'zstd/common', + 'zstd/compress', + 'zstd/decompress', + 'zstd/dictBuilder', +] + +ext_sources = [ + 'zstd.c', + 'c-ext/compressiondict.c', + 'c-ext/compressobj.c', + 'c-ext/compressor.c', + 'c-ext/compressoriterator.c', + 'c-ext/compressionparams.c', + 'c-ext/compressionwriter.c', + 'c-ext/constants.c', + 'c-ext/decompressobj.c', + 'c-ext/decompressor.c', + 'c-ext/decompressoriterator.c', + 'c-ext/decompressionwriter.c', + 'c-ext/dictparams.c', +] + + +def get_c_extension(name='zstd'): + """Obtain a distutils.extension.Extension for the C extension.""" + root = os.path.abspath(os.path.dirname(__file__)) + + sources = [os.path.join(root, p) for p in zstd_sources + ext_sources] + include_dirs = [os.path.join(root, d) for d in zstd_includes] + + # TODO compile with optimizations. + return Extension(name, sources, + include_dirs=include_dirs)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/common.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,15 @@ +import io + +class OpCountingBytesIO(io.BytesIO): + def __init__(self, *args, **kwargs): + self._read_count = 0 + self._write_count = 0 + return super(OpCountingBytesIO, self).__init__(*args, **kwargs) + + def read(self, *args): + self._read_count += 1 + return super(OpCountingBytesIO, self).read(*args) + + def write(self, data): + self._write_count += 1 + return super(OpCountingBytesIO, self).write(data)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_cffi.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,35 @@ +import io + +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + +try: + import zstd_cffi +except ImportError: + raise unittest.SkipTest('cffi version of zstd not available') + + +class TestCFFIWriteToToCDecompressor(unittest.TestCase): + def test_simple(self): + orig = io.BytesIO() + orig.write(b'foo') + orig.write(b'bar') + orig.write(b'foobar' * 16384) + + dest = io.BytesIO() + cctx = zstd_cffi.ZstdCompressor() + with cctx.write_to(dest) as compressor: + compressor.write(orig.getvalue()) + + uncompressed = io.BytesIO() + dctx = zstd.ZstdDecompressor() + with dctx.write_to(uncompressed) as decompressor: + decompressor.write(dest.getvalue()) + + self.assertEqual(uncompressed.getvalue(), orig.getvalue()) + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_compressor.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,465 @@ +import hashlib +import io +import struct +import sys + +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + +from .common import OpCountingBytesIO + + +if sys.version_info[0] >= 3: + next = lambda it: it.__next__() +else: + next = lambda it: it.next() + + +class TestCompressor(unittest.TestCase): + def test_level_bounds(self): + with self.assertRaises(ValueError): + zstd.ZstdCompressor(level=0) + + with self.assertRaises(ValueError): + zstd.ZstdCompressor(level=23) + + +class TestCompressor_compress(unittest.TestCase): + def test_compress_empty(self): + cctx = zstd.ZstdCompressor(level=1) + cctx.compress(b'') + + cctx = zstd.ZstdCompressor(level=22) + cctx.compress(b'') + + def test_compress_empty(self): + cctx = zstd.ZstdCompressor(level=1) + self.assertEqual(cctx.compress(b''), + b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') + + def test_compress_large(self): + chunks = [] + for i in range(255): + chunks.append(struct.Struct('>B').pack(i) * 16384) + + cctx = zstd.ZstdCompressor(level=3) + result = cctx.compress(b''.join(chunks)) + self.assertEqual(len(result), 999) + self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd') + + def test_write_checksum(self): + cctx = zstd.ZstdCompressor(level=1) + no_checksum = cctx.compress(b'foobar') + cctx = zstd.ZstdCompressor(level=1, write_checksum=True) + with_checksum = cctx.compress(b'foobar') + + self.assertEqual(len(with_checksum), len(no_checksum) + 4) + + def test_write_content_size(self): + cctx = zstd.ZstdCompressor(level=1) + no_size = cctx.compress(b'foobar' * 256) + cctx = zstd.ZstdCompressor(level=1, write_content_size=True) + with_size = cctx.compress(b'foobar' * 256) + + self.assertEqual(len(with_size), len(no_size) + 1) + + def test_no_dict_id(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(1024, samples) + + cctx = zstd.ZstdCompressor(level=1, dict_data=d) + with_dict_id = cctx.compress(b'foobarfoobar') + + cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False) + no_dict_id = cctx.compress(b'foobarfoobar') + + self.assertEqual(len(with_dict_id), len(no_dict_id) + 4) + + def test_compress_dict_multiple(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(8192, samples) + + cctx = zstd.ZstdCompressor(level=1, dict_data=d) + + for i in range(32): + cctx.compress(b'foo bar foobar foo bar foobar') + + +class TestCompressor_compressobj(unittest.TestCase): + def test_compressobj_empty(self): + cctx = zstd.ZstdCompressor(level=1) + cobj = cctx.compressobj() + self.assertEqual(cobj.compress(b''), b'') + self.assertEqual(cobj.flush(), + b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') + + def test_compressobj_large(self): + chunks = [] + for i in range(255): + chunks.append(struct.Struct('>B').pack(i) * 16384) + + cctx = zstd.ZstdCompressor(level=3) + cobj = cctx.compressobj() + + result = cobj.compress(b''.join(chunks)) + cobj.flush() + self.assertEqual(len(result), 999) + self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd') + + def test_write_checksum(self): + cctx = zstd.ZstdCompressor(level=1) + cobj = cctx.compressobj() + no_checksum = cobj.compress(b'foobar') + cobj.flush() + cctx = zstd.ZstdCompressor(level=1, write_checksum=True) + cobj = cctx.compressobj() + with_checksum = cobj.compress(b'foobar') + cobj.flush() + + self.assertEqual(len(with_checksum), len(no_checksum) + 4) + + def test_write_content_size(self): + cctx = zstd.ZstdCompressor(level=1) + cobj = cctx.compressobj(size=len(b'foobar' * 256)) + no_size = cobj.compress(b'foobar' * 256) + cobj.flush() + cctx = zstd.ZstdCompressor(level=1, write_content_size=True) + cobj = cctx.compressobj(size=len(b'foobar' * 256)) + with_size = cobj.compress(b'foobar' * 256) + cobj.flush() + + self.assertEqual(len(with_size), len(no_size) + 1) + + def test_compress_after_flush(self): + cctx = zstd.ZstdCompressor() + cobj = cctx.compressobj() + + cobj.compress(b'foo') + cobj.flush() + + with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress\(\) after flush'): + cobj.compress(b'foo') + + with self.assertRaisesRegexp(zstd.ZstdError, 'flush\(\) already called'): + cobj.flush() + + +class TestCompressor_copy_stream(unittest.TestCase): + def test_no_read(self): + source = object() + dest = io.BytesIO() + + cctx = zstd.ZstdCompressor() + with self.assertRaises(ValueError): + cctx.copy_stream(source, dest) + + def test_no_write(self): + source = io.BytesIO() + dest = object() + + cctx = zstd.ZstdCompressor() + with self.assertRaises(ValueError): + cctx.copy_stream(source, dest) + + def test_empty(self): + source = io.BytesIO() + dest = io.BytesIO() + + cctx = zstd.ZstdCompressor(level=1) + r, w = cctx.copy_stream(source, dest) + self.assertEqual(int(r), 0) + self.assertEqual(w, 9) + + self.assertEqual(dest.getvalue(), + b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') + + def test_large_data(self): + source = io.BytesIO() + for i in range(255): + source.write(struct.Struct('>B').pack(i) * 16384) + source.seek(0) + + dest = io.BytesIO() + cctx = zstd.ZstdCompressor() + r, w = cctx.copy_stream(source, dest) + + self.assertEqual(r, 255 * 16384) + self.assertEqual(w, 999) + + def test_write_checksum(self): + source = io.BytesIO(b'foobar') + no_checksum = io.BytesIO() + + cctx = zstd.ZstdCompressor(level=1) + cctx.copy_stream(source, no_checksum) + + source.seek(0) + with_checksum = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1, write_checksum=True) + cctx.copy_stream(source, with_checksum) + + self.assertEqual(len(with_checksum.getvalue()), + len(no_checksum.getvalue()) + 4) + + def test_write_content_size(self): + source = io.BytesIO(b'foobar' * 256) + no_size = io.BytesIO() + + cctx = zstd.ZstdCompressor(level=1) + cctx.copy_stream(source, no_size) + + source.seek(0) + with_size = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1, write_content_size=True) + cctx.copy_stream(source, with_size) + + # Source content size is unknown, so no content size written. + self.assertEqual(len(with_size.getvalue()), + len(no_size.getvalue())) + + source.seek(0) + with_size = io.BytesIO() + cctx.copy_stream(source, with_size, size=len(source.getvalue())) + + # We specified source size, so content size header is present. + self.assertEqual(len(with_size.getvalue()), + len(no_size.getvalue()) + 1) + + def test_read_write_size(self): + source = OpCountingBytesIO(b'foobarfoobar') + dest = OpCountingBytesIO() + cctx = zstd.ZstdCompressor() + r, w = cctx.copy_stream(source, dest, read_size=1, write_size=1) + + self.assertEqual(r, len(source.getvalue())) + self.assertEqual(w, 21) + self.assertEqual(source._read_count, len(source.getvalue()) + 1) + self.assertEqual(dest._write_count, len(dest.getvalue())) + + +def compress(data, level): + buffer = io.BytesIO() + cctx = zstd.ZstdCompressor(level=level) + with cctx.write_to(buffer) as compressor: + compressor.write(data) + return buffer.getvalue() + + +class TestCompressor_write_to(unittest.TestCase): + def test_empty(self): + self.assertEqual(compress(b'', 1), + b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') + + def test_multiple_compress(self): + buffer = io.BytesIO() + cctx = zstd.ZstdCompressor(level=5) + with cctx.write_to(buffer) as compressor: + compressor.write(b'foo') + compressor.write(b'bar') + compressor.write(b'x' * 8192) + + result = buffer.getvalue() + self.assertEqual(result, + b'\x28\xb5\x2f\xfd\x00\x50\x75\x00\x00\x38\x66\x6f' + b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23') + + def test_dictionary(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(8192, samples) + + buffer = io.BytesIO() + cctx = zstd.ZstdCompressor(level=9, dict_data=d) + with cctx.write_to(buffer) as compressor: + compressor.write(b'foo') + compressor.write(b'bar') + compressor.write(b'foo' * 16384) + + compressed = buffer.getvalue() + h = hashlib.sha1(compressed).hexdigest() + self.assertEqual(h, '1c5bcd25181bcd8c1a73ea8773323e0056129f92') + + def test_compression_params(self): + params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST) + + buffer = io.BytesIO() + cctx = zstd.ZstdCompressor(compression_params=params) + with cctx.write_to(buffer) as compressor: + compressor.write(b'foo') + compressor.write(b'bar') + compressor.write(b'foobar' * 16384) + + compressed = buffer.getvalue() + h = hashlib.sha1(compressed).hexdigest() + self.assertEqual(h, '1ae31f270ed7de14235221a604b31ecd517ebd99') + + def test_write_checksum(self): + no_checksum = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1) + with cctx.write_to(no_checksum) as compressor: + compressor.write(b'foobar') + + with_checksum = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1, write_checksum=True) + with cctx.write_to(with_checksum) as compressor: + compressor.write(b'foobar') + + self.assertEqual(len(with_checksum.getvalue()), + len(no_checksum.getvalue()) + 4) + + def test_write_content_size(self): + no_size = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1) + with cctx.write_to(no_size) as compressor: + compressor.write(b'foobar' * 256) + + with_size = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1, write_content_size=True) + with cctx.write_to(with_size) as compressor: + compressor.write(b'foobar' * 256) + + # Source size is not known in streaming mode, so header not + # written. + self.assertEqual(len(with_size.getvalue()), + len(no_size.getvalue())) + + # Declaring size will write the header. + with_size = io.BytesIO() + with cctx.write_to(with_size, size=len(b'foobar' * 256)) as compressor: + compressor.write(b'foobar' * 256) + + self.assertEqual(len(with_size.getvalue()), + len(no_size.getvalue()) + 1) + + def test_no_dict_id(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(1024, samples) + + with_dict_id = io.BytesIO() + cctx = zstd.ZstdCompressor(level=1, dict_data=d) + with cctx.write_to(with_dict_id) as compressor: + compressor.write(b'foobarfoobar') + + cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False) + no_dict_id = io.BytesIO() + with cctx.write_to(no_dict_id) as compressor: + compressor.write(b'foobarfoobar') + + self.assertEqual(len(with_dict_id.getvalue()), + len(no_dict_id.getvalue()) + 4) + + def test_memory_size(self): + cctx = zstd.ZstdCompressor(level=3) + buffer = io.BytesIO() + with cctx.write_to(buffer) as compressor: + size = compressor.memory_size() + + self.assertGreater(size, 100000) + + def test_write_size(self): + cctx = zstd.ZstdCompressor(level=3) + dest = OpCountingBytesIO() + with cctx.write_to(dest, write_size=1) as compressor: + compressor.write(b'foo') + compressor.write(b'bar') + compressor.write(b'foobar') + + self.assertEqual(len(dest.getvalue()), dest._write_count) + + +class TestCompressor_read_from(unittest.TestCase): + def test_type_validation(self): + cctx = zstd.ZstdCompressor() + + # Object with read() works. + cctx.read_from(io.BytesIO()) + + # Buffer protocol works. + cctx.read_from(b'foobar') + + with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'): + cctx.read_from(True) + + def test_read_empty(self): + cctx = zstd.ZstdCompressor(level=1) + + source = io.BytesIO() + it = cctx.read_from(source) + chunks = list(it) + self.assertEqual(len(chunks), 1) + compressed = b''.join(chunks) + self.assertEqual(compressed, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') + + # And again with the buffer protocol. + it = cctx.read_from(b'') + chunks = list(it) + self.assertEqual(len(chunks), 1) + compressed2 = b''.join(chunks) + self.assertEqual(compressed2, compressed) + + def test_read_large(self): + cctx = zstd.ZstdCompressor(level=1) + + source = io.BytesIO() + source.write(b'f' * zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE) + source.write(b'o') + source.seek(0) + + # Creating an iterator should not perform any compression until + # first read. + it = cctx.read_from(source, size=len(source.getvalue())) + self.assertEqual(source.tell(), 0) + + # We should have exactly 2 output chunks. + chunks = [] + chunk = next(it) + self.assertIsNotNone(chunk) + self.assertEqual(source.tell(), zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE) + chunks.append(chunk) + chunk = next(it) + self.assertIsNotNone(chunk) + chunks.append(chunk) + + self.assertEqual(source.tell(), len(source.getvalue())) + + with self.assertRaises(StopIteration): + next(it) + + # And again for good measure. + with self.assertRaises(StopIteration): + next(it) + + # We should get the same output as the one-shot compression mechanism. + self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue())) + + # Now check the buffer protocol. + it = cctx.read_from(source.getvalue()) + chunks = list(it) + self.assertEqual(len(chunks), 2) + self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue())) + + def test_read_write_size(self): + source = OpCountingBytesIO(b'foobarfoobar') + cctx = zstd.ZstdCompressor(level=3) + for chunk in cctx.read_from(source, read_size=1, write_size=1): + self.assertEqual(len(chunk), 1) + + self.assertEqual(source._read_count, len(source.getvalue()) + 1)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_data_structures.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,107 @@ +import io + +try: + import unittest2 as unittest +except ImportError: + import unittest + +try: + import hypothesis + import hypothesis.strategies as strategies +except ImportError: + hypothesis = None + +import zstd + +class TestCompressionParameters(unittest.TestCase): + def test_init_bad_arg_type(self): + with self.assertRaises(TypeError): + zstd.CompressionParameters() + + with self.assertRaises(TypeError): + zstd.CompressionParameters(0, 1) + + def test_bounds(self): + zstd.CompressionParameters(zstd.WINDOWLOG_MIN, + zstd.CHAINLOG_MIN, + zstd.HASHLOG_MIN, + zstd.SEARCHLOG_MIN, + zstd.SEARCHLENGTH_MIN, + zstd.TARGETLENGTH_MIN, + zstd.STRATEGY_FAST) + + zstd.CompressionParameters(zstd.WINDOWLOG_MAX, + zstd.CHAINLOG_MAX, + zstd.HASHLOG_MAX, + zstd.SEARCHLOG_MAX, + zstd.SEARCHLENGTH_MAX, + zstd.TARGETLENGTH_MAX, + zstd.STRATEGY_BTOPT) + + def test_get_compression_parameters(self): + p = zstd.get_compression_parameters(1) + self.assertIsInstance(p, zstd.CompressionParameters) + + self.assertEqual(p[0], 19) + +if hypothesis: + s_windowlog = strategies.integers(min_value=zstd.WINDOWLOG_MIN, + max_value=zstd.WINDOWLOG_MAX) + s_chainlog = strategies.integers(min_value=zstd.CHAINLOG_MIN, + max_value=zstd.CHAINLOG_MAX) + s_hashlog = strategies.integers(min_value=zstd.HASHLOG_MIN, + max_value=zstd.HASHLOG_MAX) + s_searchlog = strategies.integers(min_value=zstd.SEARCHLOG_MIN, + max_value=zstd.SEARCHLOG_MAX) + s_searchlength = strategies.integers(min_value=zstd.SEARCHLENGTH_MIN, + max_value=zstd.SEARCHLENGTH_MAX) + s_targetlength = strategies.integers(min_value=zstd.TARGETLENGTH_MIN, + max_value=zstd.TARGETLENGTH_MAX) + s_strategy = strategies.sampled_from((zstd.STRATEGY_FAST, + zstd.STRATEGY_DFAST, + zstd.STRATEGY_GREEDY, + zstd.STRATEGY_LAZY, + zstd.STRATEGY_LAZY2, + zstd.STRATEGY_BTLAZY2, + zstd.STRATEGY_BTOPT)) + + class TestCompressionParametersHypothesis(unittest.TestCase): + @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog, + s_searchlength, s_targetlength, s_strategy) + def test_valid_init(self, windowlog, chainlog, hashlog, searchlog, + searchlength, targetlength, strategy): + p = zstd.CompressionParameters(windowlog, chainlog, hashlog, + searchlog, searchlength, + targetlength, strategy) + self.assertEqual(tuple(p), + (windowlog, chainlog, hashlog, searchlog, + searchlength, targetlength, strategy)) + + # Verify we can instantiate a compressor with the supplied values. + # ZSTD_checkCParams moves the goal posts on us from what's advertised + # in the constants. So move along with them. + if searchlength == zstd.SEARCHLENGTH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY): + searchlength += 1 + p = zstd.CompressionParameters(windowlog, chainlog, hashlog, + searchlog, searchlength, + targetlength, strategy) + elif searchlength == zstd.SEARCHLENGTH_MAX and strategy != zstd.STRATEGY_FAST: + searchlength -= 1 + p = zstd.CompressionParameters(windowlog, chainlog, hashlog, + searchlog, searchlength, + targetlength, strategy) + + cctx = zstd.ZstdCompressor(compression_params=p) + with cctx.write_to(io.BytesIO()): + pass + + @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog, + s_searchlength, s_targetlength, s_strategy) + def test_estimate_compression_context_size(self, windowlog, chainlog, + hashlog, searchlog, + searchlength, targetlength, + strategy): + p = zstd.CompressionParameters(windowlog, chainlog, hashlog, + searchlog, searchlength, + targetlength, strategy) + size = zstd.estimate_compression_context_size(p)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_decompressor.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,478 @@ +import io +import random +import struct +import sys + +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + +from .common import OpCountingBytesIO + + +if sys.version_info[0] >= 3: + next = lambda it: it.__next__() +else: + next = lambda it: it.next() + + +class TestDecompressor_decompress(unittest.TestCase): + def test_empty_input(self): + dctx = zstd.ZstdDecompressor() + + with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): + dctx.decompress(b'') + + def test_invalid_input(self): + dctx = zstd.ZstdDecompressor() + + with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): + dctx.decompress(b'foobar') + + def test_no_content_size_in_frame(self): + cctx = zstd.ZstdCompressor(write_content_size=False) + compressed = cctx.compress(b'foobar') + + dctx = zstd.ZstdDecompressor() + with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): + dctx.decompress(compressed) + + def test_content_size_present(self): + cctx = zstd.ZstdCompressor(write_content_size=True) + compressed = cctx.compress(b'foobar') + + dctx = zstd.ZstdDecompressor() + decompressed = dctx.decompress(compressed) + self.assertEqual(decompressed, b'foobar') + + def test_max_output_size(self): + cctx = zstd.ZstdCompressor(write_content_size=False) + source = b'foobar' * 256 + compressed = cctx.compress(source) + + dctx = zstd.ZstdDecompressor() + # Will fit into buffer exactly the size of input. + decompressed = dctx.decompress(compressed, max_output_size=len(source)) + self.assertEqual(decompressed, source) + + # Input size - 1 fails + with self.assertRaisesRegexp(zstd.ZstdError, 'Destination buffer is too small'): + dctx.decompress(compressed, max_output_size=len(source) - 1) + + # Input size + 1 works + decompressed = dctx.decompress(compressed, max_output_size=len(source) + 1) + self.assertEqual(decompressed, source) + + # A much larger buffer works. + decompressed = dctx.decompress(compressed, max_output_size=len(source) * 64) + self.assertEqual(decompressed, source) + + def test_stupidly_large_output_buffer(self): + cctx = zstd.ZstdCompressor(write_content_size=False) + compressed = cctx.compress(b'foobar' * 256) + dctx = zstd.ZstdDecompressor() + + # Will get OverflowError on some Python distributions that can't + # handle really large integers. + with self.assertRaises((MemoryError, OverflowError)): + dctx.decompress(compressed, max_output_size=2**62) + + def test_dictionary(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(8192, samples) + + orig = b'foobar' * 16384 + cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True) + compressed = cctx.compress(orig) + + dctx = zstd.ZstdDecompressor(dict_data=d) + decompressed = dctx.decompress(compressed) + + self.assertEqual(decompressed, orig) + + def test_dictionary_multiple(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(8192, samples) + + sources = (b'foobar' * 8192, b'foo' * 8192, b'bar' * 8192) + compressed = [] + cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True) + for source in sources: + compressed.append(cctx.compress(source)) + + dctx = zstd.ZstdDecompressor(dict_data=d) + for i in range(len(sources)): + decompressed = dctx.decompress(compressed[i]) + self.assertEqual(decompressed, sources[i]) + + +class TestDecompressor_copy_stream(unittest.TestCase): + def test_no_read(self): + source = object() + dest = io.BytesIO() + + dctx = zstd.ZstdDecompressor() + with self.assertRaises(ValueError): + dctx.copy_stream(source, dest) + + def test_no_write(self): + source = io.BytesIO() + dest = object() + + dctx = zstd.ZstdDecompressor() + with self.assertRaises(ValueError): + dctx.copy_stream(source, dest) + + def test_empty(self): + source = io.BytesIO() + dest = io.BytesIO() + + dctx = zstd.ZstdDecompressor() + # TODO should this raise an error? + r, w = dctx.copy_stream(source, dest) + + self.assertEqual(r, 0) + self.assertEqual(w, 0) + self.assertEqual(dest.getvalue(), b'') + + def test_large_data(self): + source = io.BytesIO() + for i in range(255): + source.write(struct.Struct('>B').pack(i) * 16384) + source.seek(0) + + compressed = io.BytesIO() + cctx = zstd.ZstdCompressor() + cctx.copy_stream(source, compressed) + + compressed.seek(0) + dest = io.BytesIO() + dctx = zstd.ZstdDecompressor() + r, w = dctx.copy_stream(compressed, dest) + + self.assertEqual(r, len(compressed.getvalue())) + self.assertEqual(w, len(source.getvalue())) + + def test_read_write_size(self): + source = OpCountingBytesIO(zstd.ZstdCompressor().compress( + b'foobarfoobar')) + + dest = OpCountingBytesIO() + dctx = zstd.ZstdDecompressor() + r, w = dctx.copy_stream(source, dest, read_size=1, write_size=1) + + self.assertEqual(r, len(source.getvalue())) + self.assertEqual(w, len(b'foobarfoobar')) + self.assertEqual(source._read_count, len(source.getvalue()) + 1) + self.assertEqual(dest._write_count, len(dest.getvalue())) + + +class TestDecompressor_decompressobj(unittest.TestCase): + def test_simple(self): + data = zstd.ZstdCompressor(level=1).compress(b'foobar') + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj() + self.assertEqual(dobj.decompress(data), b'foobar') + + def test_reuse(self): + data = zstd.ZstdCompressor(level=1).compress(b'foobar') + + dctx = zstd.ZstdDecompressor() + dobj = dctx.decompressobj() + dobj.decompress(data) + + with self.assertRaisesRegexp(zstd.ZstdError, 'cannot use a decompressobj'): + dobj.decompress(data) + + +def decompress_via_writer(data): + buffer = io.BytesIO() + dctx = zstd.ZstdDecompressor() + with dctx.write_to(buffer) as decompressor: + decompressor.write(data) + return buffer.getvalue() + + +class TestDecompressor_write_to(unittest.TestCase): + def test_empty_roundtrip(self): + cctx = zstd.ZstdCompressor() + empty = cctx.compress(b'') + self.assertEqual(decompress_via_writer(empty), b'') + + def test_large_roundtrip(self): + chunks = [] + for i in range(255): + chunks.append(struct.Struct('>B').pack(i) * 16384) + orig = b''.join(chunks) + cctx = zstd.ZstdCompressor() + compressed = cctx.compress(orig) + + self.assertEqual(decompress_via_writer(compressed), orig) + + def test_multiple_calls(self): + chunks = [] + for i in range(255): + for j in range(255): + chunks.append(struct.Struct('>B').pack(j) * i) + + orig = b''.join(chunks) + cctx = zstd.ZstdCompressor() + compressed = cctx.compress(orig) + + buffer = io.BytesIO() + dctx = zstd.ZstdDecompressor() + with dctx.write_to(buffer) as decompressor: + pos = 0 + while pos < len(compressed): + pos2 = pos + 8192 + decompressor.write(compressed[pos:pos2]) + pos += 8192 + self.assertEqual(buffer.getvalue(), orig) + + def test_dictionary(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + + d = zstd.train_dictionary(8192, samples) + + orig = b'foobar' * 16384 + buffer = io.BytesIO() + cctx = zstd.ZstdCompressor(dict_data=d) + with cctx.write_to(buffer) as compressor: + compressor.write(orig) + + compressed = buffer.getvalue() + buffer = io.BytesIO() + + dctx = zstd.ZstdDecompressor(dict_data=d) + with dctx.write_to(buffer) as decompressor: + decompressor.write(compressed) + + self.assertEqual(buffer.getvalue(), orig) + + def test_memory_size(self): + dctx = zstd.ZstdDecompressor() + buffer = io.BytesIO() + with dctx.write_to(buffer) as decompressor: + size = decompressor.memory_size() + + self.assertGreater(size, 100000) + + def test_write_size(self): + source = zstd.ZstdCompressor().compress(b'foobarfoobar') + dest = OpCountingBytesIO() + dctx = zstd.ZstdDecompressor() + with dctx.write_to(dest, write_size=1) as decompressor: + s = struct.Struct('>B') + for c in source: + if not isinstance(c, str): + c = s.pack(c) + decompressor.write(c) + + + self.assertEqual(dest.getvalue(), b'foobarfoobar') + self.assertEqual(dest._write_count, len(dest.getvalue())) + + +class TestDecompressor_read_from(unittest.TestCase): + def test_type_validation(self): + dctx = zstd.ZstdDecompressor() + + # Object with read() works. + dctx.read_from(io.BytesIO()) + + # Buffer protocol works. + dctx.read_from(b'foobar') + + with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'): + dctx.read_from(True) + + def test_empty_input(self): + dctx = zstd.ZstdDecompressor() + + source = io.BytesIO() + it = dctx.read_from(source) + # TODO this is arguably wrong. Should get an error about missing frame foo. + with self.assertRaises(StopIteration): + next(it) + + it = dctx.read_from(b'') + with self.assertRaises(StopIteration): + next(it) + + def test_invalid_input(self): + dctx = zstd.ZstdDecompressor() + + source = io.BytesIO(b'foobar') + it = dctx.read_from(source) + with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'): + next(it) + + it = dctx.read_from(b'foobar') + with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'): + next(it) + + def test_empty_roundtrip(self): + cctx = zstd.ZstdCompressor(level=1, write_content_size=False) + empty = cctx.compress(b'') + + source = io.BytesIO(empty) + source.seek(0) + + dctx = zstd.ZstdDecompressor() + it = dctx.read_from(source) + + # No chunks should be emitted since there is no data. + with self.assertRaises(StopIteration): + next(it) + + # Again for good measure. + with self.assertRaises(StopIteration): + next(it) + + def test_skip_bytes_too_large(self): + dctx = zstd.ZstdDecompressor() + + with self.assertRaisesRegexp(ValueError, 'skip_bytes must be smaller than read_size'): + dctx.read_from(b'', skip_bytes=1, read_size=1) + + with self.assertRaisesRegexp(ValueError, 'skip_bytes larger than first input chunk'): + b''.join(dctx.read_from(b'foobar', skip_bytes=10)) + + def test_skip_bytes(self): + cctx = zstd.ZstdCompressor(write_content_size=False) + compressed = cctx.compress(b'foobar') + + dctx = zstd.ZstdDecompressor() + output = b''.join(dctx.read_from(b'hdr' + compressed, skip_bytes=3)) + self.assertEqual(output, b'foobar') + + def test_large_output(self): + source = io.BytesIO() + source.write(b'f' * zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE) + source.write(b'o') + source.seek(0) + + cctx = zstd.ZstdCompressor(level=1) + compressed = io.BytesIO(cctx.compress(source.getvalue())) + compressed.seek(0) + + dctx = zstd.ZstdDecompressor() + it = dctx.read_from(compressed) + + chunks = [] + chunks.append(next(it)) + chunks.append(next(it)) + + with self.assertRaises(StopIteration): + next(it) + + decompressed = b''.join(chunks) + self.assertEqual(decompressed, source.getvalue()) + + # And again with buffer protocol. + it = dctx.read_from(compressed.getvalue()) + chunks = [] + chunks.append(next(it)) + chunks.append(next(it)) + + with self.assertRaises(StopIteration): + next(it) + + decompressed = b''.join(chunks) + self.assertEqual(decompressed, source.getvalue()) + + def test_large_input(self): + bytes = list(struct.Struct('>B').pack(i) for i in range(256)) + compressed = io.BytesIO() + input_size = 0 + cctx = zstd.ZstdCompressor(level=1) + with cctx.write_to(compressed) as compressor: + while True: + compressor.write(random.choice(bytes)) + input_size += 1 + + have_compressed = len(compressed.getvalue()) > zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE + have_raw = input_size > zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE * 2 + if have_compressed and have_raw: + break + + compressed.seek(0) + self.assertGreater(len(compressed.getvalue()), + zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE) + + dctx = zstd.ZstdDecompressor() + it = dctx.read_from(compressed) + + chunks = [] + chunks.append(next(it)) + chunks.append(next(it)) + chunks.append(next(it)) + + with self.assertRaises(StopIteration): + next(it) + + decompressed = b''.join(chunks) + self.assertEqual(len(decompressed), input_size) + + # And again with buffer protocol. + it = dctx.read_from(compressed.getvalue()) + + chunks = [] + chunks.append(next(it)) + chunks.append(next(it)) + chunks.append(next(it)) + + with self.assertRaises(StopIteration): + next(it) + + decompressed = b''.join(chunks) + self.assertEqual(len(decompressed), input_size) + + def test_interesting(self): + # Found this edge case via fuzzing. + cctx = zstd.ZstdCompressor(level=1) + + source = io.BytesIO() + + compressed = io.BytesIO() + with cctx.write_to(compressed) as compressor: + for i in range(256): + chunk = b'\0' * 1024 + compressor.write(chunk) + source.write(chunk) + + dctx = zstd.ZstdDecompressor() + + simple = dctx.decompress(compressed.getvalue(), + max_output_size=len(source.getvalue())) + self.assertEqual(simple, source.getvalue()) + + compressed.seek(0) + streamed = b''.join(dctx.read_from(compressed)) + self.assertEqual(streamed, source.getvalue()) + + def test_read_write_size(self): + source = OpCountingBytesIO(zstd.ZstdCompressor().compress(b'foobarfoobar')) + dctx = zstd.ZstdDecompressor() + for chunk in dctx.read_from(source, read_size=1, write_size=1): + self.assertEqual(len(chunk), 1) + + self.assertEqual(source._read_count, len(source.getvalue()))
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_estimate_sizes.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,17 @@ +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + + +class TestSizes(unittest.TestCase): + def test_decompression_size(self): + size = zstd.estimate_decompression_context_size() + self.assertGreater(size, 100000) + + def test_compression_size(self): + params = zstd.get_compression_parameters(3) + size = zstd.estimate_compression_context_size(params) + self.assertGreater(size, 100000)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_module_attributes.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,48 @@ +from __future__ import unicode_literals + +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + +class TestModuleAttributes(unittest.TestCase): + def test_version(self): + self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 1)) + + def test_constants(self): + self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22) + self.assertEqual(zstd.FRAME_HEADER, b'\x28\xb5\x2f\xfd') + + def test_hasattr(self): + attrs = ( + 'COMPRESSION_RECOMMENDED_INPUT_SIZE', + 'COMPRESSION_RECOMMENDED_OUTPUT_SIZE', + 'DECOMPRESSION_RECOMMENDED_INPUT_SIZE', + 'DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE', + 'MAGIC_NUMBER', + 'WINDOWLOG_MIN', + 'WINDOWLOG_MAX', + 'CHAINLOG_MIN', + 'CHAINLOG_MAX', + 'HASHLOG_MIN', + 'HASHLOG_MAX', + 'HASHLOG3_MAX', + 'SEARCHLOG_MIN', + 'SEARCHLOG_MAX', + 'SEARCHLENGTH_MIN', + 'SEARCHLENGTH_MAX', + 'TARGETLENGTH_MIN', + 'TARGETLENGTH_MAX', + 'STRATEGY_FAST', + 'STRATEGY_DFAST', + 'STRATEGY_GREEDY', + 'STRATEGY_LAZY', + 'STRATEGY_LAZY2', + 'STRATEGY_BTLAZY2', + 'STRATEGY_BTOPT', + ) + + for a in attrs: + self.assertTrue(hasattr(zstd, a))
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_roundtrip.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,64 @@ +import io + +try: + import unittest2 as unittest +except ImportError: + import unittest + +try: + import hypothesis + import hypothesis.strategies as strategies +except ImportError: + raise unittest.SkipTest('hypothesis not available') + +import zstd + + +compression_levels = strategies.integers(min_value=1, max_value=22) + + +class TestRoundTrip(unittest.TestCase): + @hypothesis.given(strategies.binary(), compression_levels) + def test_compress_write_to(self, data, level): + """Random data from compress() roundtrips via write_to.""" + cctx = zstd.ZstdCompressor(level=level) + compressed = cctx.compress(data) + + buffer = io.BytesIO() + dctx = zstd.ZstdDecompressor() + with dctx.write_to(buffer) as decompressor: + decompressor.write(compressed) + + self.assertEqual(buffer.getvalue(), data) + + @hypothesis.given(strategies.binary(), compression_levels) + def test_compressor_write_to_decompressor_write_to(self, data, level): + """Random data from compressor write_to roundtrips via write_to.""" + compress_buffer = io.BytesIO() + decompressed_buffer = io.BytesIO() + + cctx = zstd.ZstdCompressor(level=level) + with cctx.write_to(compress_buffer) as compressor: + compressor.write(data) + + dctx = zstd.ZstdDecompressor() + with dctx.write_to(decompressed_buffer) as decompressor: + decompressor.write(compress_buffer.getvalue()) + + self.assertEqual(decompressed_buffer.getvalue(), data) + + @hypothesis.given(strategies.binary(average_size=1048576)) + @hypothesis.settings(perform_health_check=False) + def test_compressor_write_to_decompressor_write_to_larger(self, data): + compress_buffer = io.BytesIO() + decompressed_buffer = io.BytesIO() + + cctx = zstd.ZstdCompressor(level=5) + with cctx.write_to(compress_buffer) as compressor: + compressor.write(data) + + dctx = zstd.ZstdDecompressor() + with dctx.write_to(decompressed_buffer) as decompressor: + decompressor.write(compress_buffer.getvalue()) + + self.assertEqual(decompressed_buffer.getvalue(), data)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/tests/test_train_dictionary.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,46 @@ +import sys + +try: + import unittest2 as unittest +except ImportError: + import unittest + +import zstd + + +if sys.version_info[0] >= 3: + int_type = int +else: + int_type = long + + +class TestTrainDictionary(unittest.TestCase): + def test_no_args(self): + with self.assertRaises(TypeError): + zstd.train_dictionary() + + def test_bad_args(self): + with self.assertRaises(TypeError): + zstd.train_dictionary(8192, u'foo') + + with self.assertRaises(ValueError): + zstd.train_dictionary(8192, [u'foo']) + + def test_basic(self): + samples = [] + for i in range(128): + samples.append(b'foo' * 64) + samples.append(b'bar' * 64) + samples.append(b'foobar' * 64) + samples.append(b'baz' * 64) + samples.append(b'foobaz' * 64) + samples.append(b'bazfoo' * 64) + + d = zstd.train_dictionary(8192, samples) + self.assertLessEqual(len(d), 8192) + + dict_id = d.dict_id() + self.assertIsInstance(dict_id, int_type) + + data = d.as_bytes() + self.assertEqual(data[0:4], b'\x37\xa4\x30\xec')
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/zstd.c Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,112 @@ +/** + * Copyright (c) 2016-present, Gregory Szorc + * All rights reserved. + * + * This software may be modified and distributed under the terms + * of the BSD license. See the LICENSE file for details. + */ + +/* A Python C extension for Zstandard. */ + +#include "python-zstandard.h" + +PyObject *ZstdError; + +PyDoc_STRVAR(estimate_compression_context_size__doc__, +"estimate_compression_context_size(compression_parameters)\n" +"\n" +"Give the amount of memory allocated for a compression context given a\n" +"CompressionParameters instance"); + +PyDoc_STRVAR(estimate_decompression_context_size__doc__, +"estimate_decompression_context_size()\n" +"\n" +"Estimate the amount of memory allocated to a decompression context.\n" +); + +static PyObject* estimate_decompression_context_size(PyObject* self) { + return PyLong_FromSize_t(ZSTD_estimateDCtxSize()); +} + +PyDoc_STRVAR(get_compression_parameters__doc__, +"get_compression_parameters(compression_level[, source_size[, dict_size]])\n" +"\n" +"Obtains a ``CompressionParameters`` instance from a compression level and\n" +"optional input size and dictionary size"); + +PyDoc_STRVAR(train_dictionary__doc__, +"train_dictionary(dict_size, samples)\n" +"\n" +"Train a dictionary from sample data.\n" +"\n" +"A compression dictionary of size ``dict_size`` will be created from the\n" +"iterable of samples provided by ``samples``.\n" +"\n" +"The raw dictionary content will be returned\n"); + +static char zstd_doc[] = "Interface to zstandard"; + +static PyMethodDef zstd_methods[] = { + { "estimate_compression_context_size", (PyCFunction)estimate_compression_context_size, + METH_VARARGS, estimate_compression_context_size__doc__ }, + { "estimate_decompression_context_size", (PyCFunction)estimate_decompression_context_size, + METH_NOARGS, estimate_decompression_context_size__doc__ }, + { "get_compression_parameters", (PyCFunction)get_compression_parameters, + METH_VARARGS, get_compression_parameters__doc__ }, + { "train_dictionary", (PyCFunction)train_dictionary, + METH_VARARGS | METH_KEYWORDS, train_dictionary__doc__ }, + { NULL, NULL } +}; + +void compressobj_module_init(PyObject* mod); +void compressor_module_init(PyObject* mod); +void compressionparams_module_init(PyObject* mod); +void constants_module_init(PyObject* mod); +void dictparams_module_init(PyObject* mod); +void compressiondict_module_init(PyObject* mod); +void compressionwriter_module_init(PyObject* mod); +void compressoriterator_module_init(PyObject* mod); +void decompressor_module_init(PyObject* mod); +void decompressobj_module_init(PyObject* mod); +void decompressionwriter_module_init(PyObject* mod); +void decompressoriterator_module_init(PyObject* mod); + +void zstd_module_init(PyObject* m) { + compressionparams_module_init(m); + dictparams_module_init(m); + compressiondict_module_init(m); + compressobj_module_init(m); + compressor_module_init(m); + compressionwriter_module_init(m); + compressoriterator_module_init(m); + constants_module_init(m); + decompressor_module_init(m); + decompressobj_module_init(m); + decompressionwriter_module_init(m); + decompressoriterator_module_init(m); +} + +#if PY_MAJOR_VERSION >= 3 +static struct PyModuleDef zstd_module = { + PyModuleDef_HEAD_INIT, + "zstd", + zstd_doc, + -1, + zstd_methods +}; + +PyMODINIT_FUNC PyInit_zstd(void) { + PyObject *m = PyModule_Create(&zstd_module); + if (m) { + zstd_module_init(m); + } + return m; +} +#else +PyMODINIT_FUNC initzstd(void) { + PyObject *m = Py_InitModule3("zstd", zstd_methods, zstd_doc); + if (m) { + zstd_module_init(m); + } +} +#endif
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/contrib/python-zstandard/zstd_cffi.py Thu Nov 10 22:15:58 2016 -0800 @@ -0,0 +1,152 @@ +# Copyright (c) 2016-present, Gregory Szorc +# All rights reserved. +# +# This software may be modified and distributed under the terms +# of the BSD license. See the LICENSE file for details. + +"""Python interface to the Zstandard (zstd) compression library.""" + +from __future__ import absolute_import, unicode_literals + +import io + +from _zstd_cffi import ( + ffi, + lib, +) + + +_CSTREAM_IN_SIZE = lib.ZSTD_CStreamInSize() +_CSTREAM_OUT_SIZE = lib.ZSTD_CStreamOutSize() + + +class _ZstdCompressionWriter(object): + def __init__(self, cstream, writer): + self._cstream = cstream + self._writer = writer + + def __enter__(self): + return self + + def __exit__(self, exc_type, exc_value, exc_tb): + if not exc_type and not exc_value and not exc_tb: + out_buffer = ffi.new('ZSTD_outBuffer *') + out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) + out_buffer.size = _CSTREAM_OUT_SIZE + out_buffer.pos = 0 + + while True: + res = lib.ZSTD_endStream(self._cstream, out_buffer) + if lib.ZSTD_isError(res): + raise Exception('error ending compression stream: %s' % lib.ZSTD_getErrorName) + + if out_buffer.pos: + self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) + out_buffer.pos = 0 + + if res == 0: + break + + return False + + def write(self, data): + out_buffer = ffi.new('ZSTD_outBuffer *') + out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) + out_buffer.size = _CSTREAM_OUT_SIZE + out_buffer.pos = 0 + + # TODO can we reuse existing memory? + in_buffer = ffi.new('ZSTD_inBuffer *') + in_buffer.src = ffi.new('char[]', data) + in_buffer.size = len(data) + in_buffer.pos = 0 + while in_buffer.pos < in_buffer.size: + res = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer) + if lib.ZSTD_isError(res): + raise Exception('zstd compress error: %s' % lib.ZSTD_getErrorName(res)) + + if out_buffer.pos: + self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) + out_buffer.pos = 0 + + +class ZstdCompressor(object): + def __init__(self, level=3, dict_data=None, compression_params=None): + if dict_data: + raise Exception('dict_data not yet supported') + if compression_params: + raise Exception('compression_params not yet supported') + + self._compression_level = level + + def compress(self, data): + # Just use the stream API for now. + output = io.BytesIO() + with self.write_to(output) as compressor: + compressor.write(data) + return output.getvalue() + + def copy_stream(self, ifh, ofh): + cstream = self._get_cstream() + + in_buffer = ffi.new('ZSTD_inBuffer *') + out_buffer = ffi.new('ZSTD_outBuffer *') + + out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) + out_buffer.size = _CSTREAM_OUT_SIZE + out_buffer.pos = 0 + + total_read, total_write = 0, 0 + + while True: + data = ifh.read(_CSTREAM_IN_SIZE) + if not data: + break + + total_read += len(data) + + in_buffer.src = ffi.new('char[]', data) + in_buffer.size = len(data) + in_buffer.pos = 0 + + while in_buffer.pos < in_buffer.size: + res = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer) + if lib.ZSTD_isError(res): + raise Exception('zstd compress error: %s' % + lib.ZSTD_getErrorName(res)) + + if out_buffer.pos: + ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) + total_write = out_buffer.pos + out_buffer.pos = 0 + + # We've finished reading. Flush the compressor. + while True: + res = lib.ZSTD_endStream(cstream, out_buffer) + if lib.ZSTD_isError(res): + raise Exception('error ending compression stream: %s' % + lib.ZSTD_getErrorName(res)) + + if out_buffer.pos: + ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) + total_write += out_buffer.pos + out_buffer.pos = 0 + + if res == 0: + break + + return total_read, total_write + + def write_to(self, writer): + return _ZstdCompressionWriter(self._get_cstream(), writer) + + def _get_cstream(self): + cstream = lib.ZSTD_createCStream() + cstream = ffi.gc(cstream, lib.ZSTD_freeCStream) + + res = lib.ZSTD_initCStream(cstream, self._compression_level) + if lib.ZSTD_isError(res): + raise Exception('cannot init CStream: %s' % + lib.ZSTD_getErrorName(res)) + + return cstream
--- a/tests/test-check-code.t Thu Nov 10 21:45:29 2016 -0800 +++ b/tests/test-check-code.t Thu Nov 10 22:15:58 2016 -0800 @@ -7,7 +7,7 @@ New errors are not allowed. Warnings are strongly discouraged. (The writing "no-che?k-code" is for not skipping this file when checking.) - $ hg locate | sed 's-\\-/-g' | + $ hg locate -X contrib/python-zstandard | sed 's-\\-/-g' | > xargs "$check_code" --warnings --per-file=0 || false Skipping hgext/fsmonitor/pywatchman/__init__.py it has no-che?k-code (glob) Skipping hgext/fsmonitor/pywatchman/bser.c it has no-che?k-code (glob)
--- a/tests/test-check-module-imports.t Thu Nov 10 21:45:29 2016 -0800 +++ b/tests/test-check-module-imports.t Thu Nov 10 22:15:58 2016 -0800 @@ -159,6 +159,7 @@ $ hg locate 'set:**.py or grep(r"^#!.*?python")' \ > 'tests/**.t' \ > -X contrib/debugshell.py \ + > -X contrib/python-zstandard/ \ > -X contrib/win32/hgwebdir_wsgi.py \ > -X doc/gendoc.py \ > -X doc/hgmanpage.py \
--- a/tests/test-check-py3-compat.t Thu Nov 10 21:45:29 2016 -0800 +++ b/tests/test-check-py3-compat.t Thu Nov 10 22:15:58 2016 -0800 @@ -4,6 +4,17 @@ $ cd "$TESTDIR"/.. $ hg files 'set:(**.py)' | sed 's|\\|/|g' | xargs python contrib/check-py3-compat.py + contrib/python-zstandard/setup.py not using absolute_import + contrib/python-zstandard/setup_zstd.py not using absolute_import + contrib/python-zstandard/tests/common.py not using absolute_import + contrib/python-zstandard/tests/test_cffi.py not using absolute_import + contrib/python-zstandard/tests/test_compressor.py not using absolute_import + contrib/python-zstandard/tests/test_data_structures.py not using absolute_import + contrib/python-zstandard/tests/test_decompressor.py not using absolute_import + contrib/python-zstandard/tests/test_estimate_sizes.py not using absolute_import + contrib/python-zstandard/tests/test_module_attributes.py not using absolute_import + contrib/python-zstandard/tests/test_roundtrip.py not using absolute_import + contrib/python-zstandard/tests/test_train_dictionary.py not using absolute_import hgext/fsmonitor/pywatchman/__init__.py not using absolute_import hgext/fsmonitor/pywatchman/__init__.py requires print_function hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import
--- a/tests/test-check-pyflakes.t Thu Nov 10 21:45:29 2016 -0800 +++ b/tests/test-check-pyflakes.t Thu Nov 10 22:15:58 2016 -0800 @@ -10,6 +10,6 @@ > -X mercurial/pycompat.py \ > 2>/dev/null \ > | xargs pyflakes 2>/dev/null | "$TESTDIR/filterpyflakes.py" + contrib/python-zstandard/tests/test_data_structures.py:107: local variable 'size' is assigned to but never used tests/filterpyflakes.py:39: undefined name 'undefinedname' -