zstandard: vendor python-zstandard 0.12
The upstream source distribution from PyPI was extracted. Unwanted
files were removed.
The clang-format ignore list was updated to reflect the new source
of files.
test-repo-compengines.t was updated to reflect a change in behavior
of the zstd library.
The project contains a vendored copy of zstandard 1.4.3. The old
version was 1.3.8. This should result in some minor performance wins.
# no-check-commit because 3rd party code has different style guidelines
Differential Revision: https://phab.mercurial-scm.org/D6858
--- a/contrib/clang-format-ignorelist Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/clang-format-ignorelist Sun Sep 15 20:04:00 2019 -0700
@@ -49,6 +49,10 @@
contrib/python-zstandard/zstd/compress/huf_compress.c
contrib/python-zstandard/zstd/compress/zstd_compress.c
contrib/python-zstandard/zstd/compress/zstd_compress_internal.h
+contrib/python-zstandard/zstd/compress/zstd_compress_literals.c
+contrib/python-zstandard/zstd/compress/zstd_compress_literals.h
+contrib/python-zstandard/zstd/compress/zstd_compress_sequences.c
+contrib/python-zstandard/zstd/compress/zstd_compress_sequences.h
contrib/python-zstandard/zstd/compress/zstd_double_fast.c
contrib/python-zstandard/zstd/compress/zstd_double_fast.h
contrib/python-zstandard/zstd/compress/zstd_fast.c
--- a/contrib/python-zstandard/NEWS.rst Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/NEWS.rst Sun Sep 15 20:04:00 2019 -0700
@@ -44,6 +44,7 @@
zstd API.
* Expose ``ZSTD_CLEVEL_DEFAULT`` constant.
* Support ``ZSTD_p_forceAttachDict`` compression parameter.
+* Support ``ZSTD_c_literalCompressionMode `` compression parameter.
* Use ``ZSTD_CCtx_getParameter()``/``ZSTD_CCtxParam_getParameter()`` for retrieving
compression parameters.
* Consider exposing ``ZSTDMT_toFlushNow()``.
@@ -66,10 +67,39 @@
* API for ensuring max memory ceiling isn't exceeded.
* Move off nose for testing.
+0.12.0 (released 2019-09-15)
+============================
+
+Backwards Compatibility Notes
+-----------------------------
+
+* Support for Python 3.4 has been dropped since Python 3.4 is no longer
+ a supported Python version upstream. (But it will likely continue to
+ work until Python 2.7 support is dropped and we port to Python 3.5+
+ APIs.)
+
+Bug Fixes
+---------
+
+* Fix ``ZstdDecompressor.__init__`` on 64-bit big-endian systems (#91).
+* Fix memory leak in ``ZstdDecompressionReader.seek()`` (#82).
+
+Changes
+-------
+
+* CI transitioned to Azure Pipelines (from AppVeyor and Travis CI).
+* Switched to ``pytest`` for running tests (from ``nose``).
+* Bundled zstandard library upgraded from 1.3.8 to 1.4.3.
+
+0.11.1 (released 2019-05-14)
+============================
+
+* Fix memory leak in ``ZstdDecompressionReader.seek()`` (#82).
+
0.11.0 (released 2019-02-24)
============================
-Backwards Compatibility Nodes
+Backwards Compatibility Notes
-----------------------------
* ``ZstdDecompressor.read()`` now allows reading sizes of ``-1`` or ``0``
--- a/contrib/python-zstandard/README.rst Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/README.rst Sun Sep 15 20:04:00 2019 -0700
@@ -15,7 +15,7 @@
the author. For convenience, that repository is frequently synchronized to
https://github.com/indygreg/python-zstandard.
-| |ci-status| |win-ci-status|
+| |ci-status|
Requirements
============
@@ -1598,9 +1598,5 @@
:target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted
:alt: Donate via PayPal
-.. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master
- :target: https://travis-ci.org/indygreg/python-zstandard
-
-.. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true
- :target: https://ci.appveyor.com/project/indygreg/python-zstandard
- :alt: Windows build status
+.. |ci-status| image:: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master
+ :target: https://dev.azure.com/gregoryszorc/python-zstandard/_apis/build/status/indygreg.python-zstandard?branchName=master
--- a/contrib/python-zstandard/c-ext/compressionparams.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/c-ext/compressionparams.c Sun Sep 15 20:04:00 2019 -0700
@@ -11,7 +11,7 @@
extern PyObject* ZstdError;
int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value) {
- size_t zresult = ZSTD_CCtxParam_setParameter(params, param, value);
+ size_t zresult = ZSTD_CCtxParams_setParameter(params, param, value);
if (ZSTD_isError(zresult)) {
PyErr_Format(ZstdError, "unable to set compression context parameter: %s",
ZSTD_getErrorName(zresult));
@@ -25,11 +25,11 @@
#define TRY_COPY_PARAMETER(source, dest, param) { \
int result; \
- size_t zresult = ZSTD_CCtxParam_getParameter(source, param, &result); \
+ size_t zresult = ZSTD_CCtxParams_getParameter(source, param, &result); \
if (ZSTD_isError(zresult)) { \
return 1; \
} \
- zresult = ZSTD_CCtxParam_setParameter(dest, param, result); \
+ zresult = ZSTD_CCtxParams_setParameter(dest, param, result); \
if (ZSTD_isError(zresult)) { \
return 1; \
} \
@@ -78,7 +78,7 @@
}
#define TRY_GET_PARAMETER(params, param, value) { \
- size_t zresult = ZSTD_CCtxParam_getParameter(params, param, value); \
+ size_t zresult = ZSTD_CCtxParams_getParameter(params, param, value); \
if (ZSTD_isError(zresult)) { \
PyErr_Format(ZstdError, "unable to retrieve parameter: %s", ZSTD_getErrorName(zresult)); \
return 1; \
@@ -436,7 +436,7 @@
int result; \
size_t zresult; \
ZstdCompressionParametersObject* p = (ZstdCompressionParametersObject*)(self); \
- zresult = ZSTD_CCtxParam_getParameter(p->params, param, &result); \
+ zresult = ZSTD_CCtxParams_getParameter(p->params, param, &result); \
if (ZSTD_isError(zresult)) { \
PyErr_Format(ZstdError, "unable to get compression parameter: %s", \
ZSTD_getErrorName(zresult)); \
--- a/contrib/python-zstandard/c-ext/decompressionreader.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/c-ext/decompressionreader.c Sun Sep 15 20:04:00 2019 -0700
@@ -653,6 +653,8 @@
readSize = PyBytes_GET_SIZE(readResult);
+ Py_CLEAR(readResult);
+
/* Empty read means EOF. */
if (!readSize) {
break;
--- a/contrib/python-zstandard/c-ext/python-zstandard.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/c-ext/python-zstandard.h Sun Sep 15 20:04:00 2019 -0700
@@ -16,7 +16,7 @@
#include <zdict.h>
/* Remember to change the string in zstandard/__init__ as well */
-#define PYTHON_ZSTANDARD_VERSION "0.11.0"
+#define PYTHON_ZSTANDARD_VERSION "0.12.0"
typedef enum {
compressorobj_flush_finish,
--- a/contrib/python-zstandard/make_cffi.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/make_cffi.py Sun Sep 15 20:04:00 2019 -0700
@@ -29,6 +29,8 @@
'compress/hist.c',
'compress/huf_compress.c',
'compress/zstd_compress.c',
+ 'compress/zstd_compress_literals.c',
+ 'compress/zstd_compress_sequences.c',
'compress/zstd_double_fast.c',
'compress/zstd_fast.c',
'compress/zstd_lazy.c',
@@ -119,7 +121,11 @@
os.close(fd)
try:
- process = subprocess.Popen(args + [input_file], stdout=subprocess.PIPE)
+ env = dict(os.environ)
+ if getattr(compiler, '_paths', None):
+ env['PATH'] = compiler._paths
+ process = subprocess.Popen(args + [input_file], stdout=subprocess.PIPE,
+ env=env)
output = process.communicate()[0]
ret = process.poll()
if ret:
--- a/contrib/python-zstandard/setup.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/setup.py Sun Sep 15 20:04:00 2019 -0700
@@ -100,7 +100,6 @@
'License :: OSI Approved :: BSD License',
'Programming Language :: C',
'Programming Language :: Python :: 2.7',
- 'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
--- a/contrib/python-zstandard/setup_zstd.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/setup_zstd.py Sun Sep 15 20:04:00 2019 -0700
@@ -22,6 +22,8 @@
'compress/fse_compress.c',
'compress/hist.c',
'compress/huf_compress.c',
+ 'compress/zstd_compress_literals.c',
+ 'compress/zstd_compress_sequences.c',
'compress/zstd_compress.c',
'compress/zstd_double_fast.c',
'compress/zstd_fast.c',
--- a/contrib/python-zstandard/tests/test_compressor.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/tests/test_compressor.py Sun Sep 15 20:04:00 2019 -0700
@@ -1038,7 +1038,7 @@
d = zstd.train_dictionary(8192, samples)
h = hashlib.sha1(d.as_bytes()).hexdigest()
- self.assertEqual(h, '88ca0d38332aff379d4ced166a51c280a7679aad')
+ self.assertEqual(h, '7a2e59a876db958f74257141045af8f912e00d4e')
buffer = NonClosingBytesIO()
cctx = zstd.ZstdCompressor(level=9, dict_data=d)
@@ -1056,7 +1056,7 @@
self.assertFalse(params.has_checksum)
h = hashlib.sha1(compressed).hexdigest()
- self.assertEqual(h, '8703b4316f274d26697ea5dd480f29c08e85d940')
+ self.assertEqual(h, '0a7c05635061f58039727cdbe76388c6f4cfef06')
source = b'foo' + b'bar' + (b'foo' * 16384)
@@ -1091,7 +1091,7 @@
self.assertFalse(params.has_checksum)
h = hashlib.sha1(compressed).hexdigest()
- self.assertEqual(h, '2a8111d72eb5004cdcecbdac37da9f26720d30ef')
+ self.assertEqual(h, 'dd4bb7d37c1a0235b38a2f6b462814376843ef0b')
def test_write_checksum(self):
no_checksum = NonClosingBytesIO()
--- a/contrib/python-zstandard/tests/test_data_structures.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/tests/test_data_structures.py Sun Sep 15 20:04:00 2019 -0700
@@ -100,7 +100,7 @@
strategy=zstd.STRATEGY_DFAST)
# 32-bit has slightly different values from 64-bit.
- self.assertAlmostEqual(p.estimated_compression_context_size(), 1294072,
+ self.assertAlmostEqual(p.estimated_compression_context_size(), 1294144,
delta=250)
def test_strategy(self):
--- a/contrib/python-zstandard/tests/test_module_attributes.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/tests/test_module_attributes.py Sun Sep 15 20:04:00 2019 -0700
@@ -12,9 +12,9 @@
@make_cffi
class TestModuleAttributes(unittest.TestCase):
def test_version(self):
- self.assertEqual(zstd.ZSTD_VERSION, (1, 3, 8))
+ self.assertEqual(zstd.ZSTD_VERSION, (1, 4, 3))
- self.assertEqual(zstd.__version__, '0.11.0')
+ self.assertEqual(zstd.__version__, '0.12.0')
def test_constants(self):
self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22)
--- a/contrib/python-zstandard/tests/test_train_dictionary.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/tests/test_train_dictionary.py Sun Sep 15 20:04:00 2019 -0700
@@ -7,6 +7,7 @@
from . common import (
generate_samples,
make_cffi,
+ random_input_data,
)
if sys.version_info[0] >= 3:
@@ -29,7 +30,7 @@
zstd.train_dictionary(8192, [u'foo'])
def test_no_params(self):
- d = zstd.train_dictionary(8192, generate_samples())
+ d = zstd.train_dictionary(8192, random_input_data())
self.assertIsInstance(d.dict_id(), int_type)
# The dictionary ID may be different across platforms.
--- a/contrib/python-zstandard/zstandard/__init__.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstandard/__init__.py Sun Sep 15 20:04:00 2019 -0700
@@ -62,4 +62,4 @@
'cext, or cffi' % _module_policy)
# Keep this in sync with python-zstandard.h.
-__version__ = '0.11.0'
+__version__ = '0.12.0'
--- a/contrib/python-zstandard/zstandard/cffi.py Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstandard/cffi.py Sun Sep 15 20:04:00 2019 -0700
@@ -416,7 +416,7 @@
def _set_compression_parameter(params, param, value):
- zresult = lib.ZSTD_CCtxParam_setParameter(params, param, value)
+ zresult = lib.ZSTD_CCtxParams_setParameter(params, param, value)
if lib.ZSTD_isError(zresult):
raise ZstdError('unable to set compression context parameter: %s' %
_zstd_error(zresult))
@@ -425,7 +425,7 @@
def _get_compression_parameter(params, param):
result = ffi.new('int *')
- zresult = lib.ZSTD_CCtxParam_getParameter(params, param, result)
+ zresult = lib.ZSTD_CCtxParams_getParameter(params, param, result)
if lib.ZSTD_isError(zresult):
raise ZstdError('unable to get compression context parameter: %s' %
_zstd_error(zresult))
--- a/contrib/python-zstandard/zstd.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd.c Sun Sep 15 20:04:00 2019 -0700
@@ -210,7 +210,7 @@
We detect this mismatch here and refuse to load the module if this
scenario is detected.
*/
- if (ZSTD_VERSION_NUMBER != 10308 || ZSTD_versionNumber() != 10308) {
+ if (ZSTD_VERSION_NUMBER != 10403 || ZSTD_versionNumber() != 10403) {
PyErr_SetString(PyExc_ImportError, "zstd C API mismatch; Python bindings not compiled against expected zstd version");
return;
}
--- a/contrib/python-zstandard/zstd/common/bitstream.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/bitstream.h Sun Sep 15 20:04:00 2019 -0700
@@ -57,6 +57,8 @@
=========================================*/
#if defined(__BMI__) && defined(__GNUC__)
# include <immintrin.h> /* support for bextr (experimental) */
+#elif defined(__ICCARM__)
+# include <intrinsics.h>
#endif
#define STREAM_ACCUMULATOR_MIN_32 25
@@ -163,6 +165,8 @@
return (unsigned) r;
# elif defined(__GNUC__) && (__GNUC__ >= 3) /* Use GCC Intrinsic */
return 31 - __builtin_clz (val);
+# elif defined(__ICCARM__) /* IAR Intrinsic */
+ return 31 - __CLZ(val);
# else /* Software version */
static const unsigned DeBruijnClz[32] = { 0, 9, 1, 10, 13, 21, 2, 29,
11, 14, 16, 18, 22, 25, 3, 30,
--- a/contrib/python-zstandard/zstd/common/compiler.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/compiler.h Sun Sep 15 20:04:00 2019 -0700
@@ -23,7 +23,7 @@
# define INLINE_KEYWORD
#endif
-#if defined(__GNUC__)
+#if defined(__GNUC__) || defined(__ICCARM__)
# define FORCE_INLINE_ATTR __attribute__((always_inline))
#elif defined(_MSC_VER)
# define FORCE_INLINE_ATTR __forceinline
@@ -40,7 +40,7 @@
/**
* FORCE_INLINE_TEMPLATE is used to define C "templates", which take constant
- * parameters. They must be inlined for the compiler to elimininate the constant
+ * parameters. They must be inlined for the compiler to eliminate the constant
* branches.
*/
#define FORCE_INLINE_TEMPLATE static INLINE_KEYWORD FORCE_INLINE_ATTR
@@ -65,7 +65,7 @@
#ifdef _MSC_VER
# define FORCE_NOINLINE static __declspec(noinline)
#else
-# ifdef __GNUC__
+# if defined(__GNUC__) || defined(__ICCARM__)
# define FORCE_NOINLINE static __attribute__((__noinline__))
# else
# define FORCE_NOINLINE static
@@ -76,7 +76,7 @@
#ifndef __has_attribute
#define __has_attribute(x) 0 /* Compatibility with non-clang compilers. */
#endif
-#if defined(__GNUC__)
+#if defined(__GNUC__) || defined(__ICCARM__)
# define TARGET_ATTRIBUTE(target) __attribute__((__target__(target)))
#else
# define TARGET_ATTRIBUTE(target)
@@ -127,6 +127,13 @@
} \
}
+/* vectorization */
+#if !defined(__clang__) && defined(__GNUC__)
+# define DONT_VECTORIZE __attribute__((optimize("no-tree-vectorize")))
+#else
+# define DONT_VECTORIZE
+#endif
+
/* disable warnings */
#ifdef _MSC_VER /* Visual Studio */
# include <intrin.h> /* For Visual 2005 */
--- a/contrib/python-zstandard/zstd/common/fse.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/fse.h Sun Sep 15 20:04:00 2019 -0700
@@ -358,7 +358,7 @@
typedef enum {
FSE_repeat_none, /**< Cannot use the previous table */
FSE_repeat_check, /**< Can use the previous table but it must be checked */
- FSE_repeat_valid /**< Can use the previous table and it is asumed to be valid */
+ FSE_repeat_valid /**< Can use the previous table and it is assumed to be valid */
} FSE_repeat;
/* *****************************************
--- a/contrib/python-zstandard/zstd/common/mem.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/mem.h Sun Sep 15 20:04:00 2019 -0700
@@ -102,7 +102,7 @@
#ifndef MEM_FORCE_MEMORY_ACCESS /* can be defined externally, on command line for example */
# if defined(__GNUC__) && ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) )
# define MEM_FORCE_MEMORY_ACCESS 2
-# elif defined(__INTEL_COMPILER) || defined(__GNUC__)
+# elif defined(__INTEL_COMPILER) || defined(__GNUC__) || defined(__ICCARM__)
# define MEM_FORCE_MEMORY_ACCESS 1
# endif
#endif
--- a/contrib/python-zstandard/zstd/common/threading.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/threading.c Sun Sep 15 20:04:00 2019 -0700
@@ -14,8 +14,8 @@
* This file will hold wrapper for systems, which do not support pthreads
*/
-/* create fake symbol to avoid empty trnaslation unit warning */
-int g_ZSTD_threading_useles_symbol;
+/* create fake symbol to avoid empty translation unit warning */
+int g_ZSTD_threading_useless_symbol;
#if defined(ZSTD_MULTITHREAD) && defined(_WIN32)
--- a/contrib/python-zstandard/zstd/common/xxhash.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/xxhash.c Sun Sep 15 20:04:00 2019 -0700
@@ -53,7 +53,8 @@
# if defined(__GNUC__) && ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) )
# define XXH_FORCE_MEMORY_ACCESS 2
# elif (defined(__INTEL_COMPILER) && !defined(WIN32)) || \
- (defined(__GNUC__) && ( defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__) ))
+ (defined(__GNUC__) && ( defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__) )) || \
+ defined(__ICCARM__)
# define XXH_FORCE_MEMORY_ACCESS 1
# endif
#endif
@@ -66,10 +67,10 @@
/* #define XXH_ACCEPT_NULL_INPUT_POINTER 1 */
/*!XXH_FORCE_NATIVE_FORMAT :
- * By default, xxHash library provides endian-independant Hash values, based on little-endian convention.
+ * By default, xxHash library provides endian-independent Hash values, based on little-endian convention.
* Results are therefore identical for little-endian and big-endian CPU.
* This comes at a performance cost for big-endian CPU, since some swapping is required to emulate little-endian format.
- * Should endian-independance be of no importance for your application, you may set the #define below to 1,
+ * Should endian-independence be of no importance for your application, you may set the #define below to 1,
* to improve speed for Big-endian CPU.
* This option has no impact on Little_Endian CPU.
*/
@@ -120,7 +121,7 @@
# define INLINE_KEYWORD
#endif
-#if defined(__GNUC__)
+#if defined(__GNUC__) || defined(__ICCARM__)
# define FORCE_INLINE_ATTR __attribute__((always_inline))
#elif defined(_MSC_VER)
# define FORCE_INLINE_ATTR __forceinline
@@ -206,7 +207,12 @@
# define XXH_rotl32(x,r) _rotl(x,r)
# define XXH_rotl64(x,r) _rotl64(x,r)
#else
+#if defined(__ICCARM__)
+# include <intrinsics.h>
+# define XXH_rotl32(x,r) __ROR(x,(32 - r))
+#else
# define XXH_rotl32(x,r) ((x << r) | (x >> (32 - r)))
+#endif
# define XXH_rotl64(x,r) ((x << r) | (x >> (64 - r)))
#endif
--- a/contrib/python-zstandard/zstd/common/zstd_internal.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/common/zstd_internal.h Sun Sep 15 20:04:00 2019 -0700
@@ -34,7 +34,6 @@
#endif
#include "xxhash.h" /* XXH_reset, update, digest */
-
#if defined (__cplusplus)
extern "C" {
#endif
@@ -53,8 +52,50 @@
#undef MAX
#define MIN(a,b) ((a)<(b) ? (a) : (b))
#define MAX(a,b) ((a)>(b) ? (a) : (b))
-#define CHECK_F(f) { size_t const errcod = f; if (ERR_isError(errcod)) return errcod; } /* check and Forward error code */
-#define CHECK_E(f, e) { size_t const errcod = f; if (ERR_isError(errcod)) return ERROR(e); } /* check and send Error code */
+
+/**
+ * Return the specified error if the condition evaluates to true.
+ *
+ * In debug modes, prints additional information.
+ * In order to do that (particularly, printing the conditional that failed),
+ * this can't just wrap RETURN_ERROR().
+ */
+#define RETURN_ERROR_IF(cond, err, ...) \
+ if (cond) { \
+ RAWLOG(3, "%s:%d: ERROR!: check %s failed, returning %s", __FILE__, __LINE__, ZSTD_QUOTE(cond), ZSTD_QUOTE(ERROR(err))); \
+ RAWLOG(3, ": " __VA_ARGS__); \
+ RAWLOG(3, "\n"); \
+ return ERROR(err); \
+ }
+
+/**
+ * Unconditionally return the specified error.
+ *
+ * In debug modes, prints additional information.
+ */
+#define RETURN_ERROR(err, ...) \
+ do { \
+ RAWLOG(3, "%s:%d: ERROR!: unconditional check failed, returning %s", __FILE__, __LINE__, ZSTD_QUOTE(ERROR(err))); \
+ RAWLOG(3, ": " __VA_ARGS__); \
+ RAWLOG(3, "\n"); \
+ return ERROR(err); \
+ } while(0);
+
+/**
+ * If the provided expression evaluates to an error code, returns that error code.
+ *
+ * In debug modes, prints additional information.
+ */
+#define FORWARD_IF_ERROR(err, ...) \
+ do { \
+ size_t const err_code = (err); \
+ if (ERR_isError(err_code)) { \
+ RAWLOG(3, "%s:%d: ERROR!: forwarding error in %s: %s", __FILE__, __LINE__, ZSTD_QUOTE(err), ERR_getErrorName(err_code)); \
+ RAWLOG(3, ": " __VA_ARGS__); \
+ RAWLOG(3, "\n"); \
+ return err_code; \
+ } \
+ } while(0);
/*-*************************************
@@ -151,19 +192,72 @@
* Shared functions to include for inlining
*********************************************/
static void ZSTD_copy8(void* dst, const void* src) { memcpy(dst, src, 8); }
+
#define COPY8(d,s) { ZSTD_copy8(d,s); d+=8; s+=8; }
+static void ZSTD_copy16(void* dst, const void* src) { memcpy(dst, src, 16); }
+#define COPY16(d,s) { ZSTD_copy16(d,s); d+=16; s+=16; }
+
+#define WILDCOPY_OVERLENGTH 8
+#define VECLEN 16
+
+typedef enum {
+ ZSTD_no_overlap,
+ ZSTD_overlap_src_before_dst,
+ /* ZSTD_overlap_dst_before_src, */
+} ZSTD_overlap_e;
/*! ZSTD_wildcopy() :
* custom version of memcpy(), can overwrite up to WILDCOPY_OVERLENGTH bytes (if length==0) */
-#define WILDCOPY_OVERLENGTH 8
-MEM_STATIC void ZSTD_wildcopy(void* dst, const void* src, ptrdiff_t length)
+MEM_STATIC FORCE_INLINE_ATTR DONT_VECTORIZE
+void ZSTD_wildcopy(void* dst, const void* src, ptrdiff_t length, ZSTD_overlap_e ovtype)
{
+ ptrdiff_t diff = (BYTE*)dst - (const BYTE*)src;
const BYTE* ip = (const BYTE*)src;
BYTE* op = (BYTE*)dst;
BYTE* const oend = op + length;
- do
- COPY8(op, ip)
- while (op < oend);
+
+ assert(diff >= 8 || (ovtype == ZSTD_no_overlap && diff < -8));
+ if (length < VECLEN || (ovtype == ZSTD_overlap_src_before_dst && diff < VECLEN)) {
+ do
+ COPY8(op, ip)
+ while (op < oend);
+ }
+ else {
+ if ((length & 8) == 0)
+ COPY8(op, ip);
+ do {
+ COPY16(op, ip);
+ }
+ while (op < oend);
+ }
+}
+
+/*! ZSTD_wildcopy_16min() :
+ * same semantics as ZSTD_wilcopy() except guaranteed to be able to copy 16 bytes at the start */
+MEM_STATIC FORCE_INLINE_ATTR DONT_VECTORIZE
+void ZSTD_wildcopy_16min(void* dst, const void* src, ptrdiff_t length, ZSTD_overlap_e ovtype)
+{
+ ptrdiff_t diff = (BYTE*)dst - (const BYTE*)src;
+ const BYTE* ip = (const BYTE*)src;
+ BYTE* op = (BYTE*)dst;
+ BYTE* const oend = op + length;
+
+ assert(length >= 8);
+ assert(diff >= 8 || (ovtype == ZSTD_no_overlap && diff < -8));
+
+ if (ovtype == ZSTD_overlap_src_before_dst && diff < VECLEN) {
+ do
+ COPY8(op, ip)
+ while (op < oend);
+ }
+ else {
+ if ((length & 8) == 0)
+ COPY8(op, ip);
+ do {
+ COPY16(op, ip);
+ }
+ while (op < oend);
+ }
}
MEM_STATIC void ZSTD_wildcopy_e(void* dst, const void* src, void* dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */
@@ -200,6 +294,17 @@
U32 longLengthPos;
} seqStore_t;
+/**
+ * Contains the compressed frame size and an upper-bound for the decompressed frame size.
+ * Note: before using `compressedSize`, check for errors using ZSTD_isError().
+ * similarly, before using `decompressedBound`, check for errors using:
+ * `decompressedBound != ZSTD_CONTENTSIZE_ERROR`
+ */
+typedef struct {
+ size_t compressedSize;
+ unsigned long long decompressedBound;
+} ZSTD_frameSizeInfo; /* decompress & legacy */
+
const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx); /* compress & dictBuilder */
void ZSTD_seqToCodes(const seqStore_t* seqStorePtr); /* compress, dictBuilder, decodeCorpus (shouldn't get its definition from here) */
@@ -219,6 +324,8 @@
return (unsigned)r;
# elif defined(__GNUC__) && (__GNUC__ >= 3) /* GCC Intrinsic */
return 31 - __builtin_clz(val);
+# elif defined(__ICCARM__) /* IAR Intrinsic */
+ return 31 - __CLZ(val);
# else /* Software version */
static const U32 DeBruijnClz[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 };
U32 v = val;
--- a/contrib/python-zstandard/zstd/compress/fse_compress.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/fse_compress.c Sun Sep 15 20:04:00 2019 -0700
@@ -129,9 +129,9 @@
{ U32 position = 0;
U32 symbol;
for (symbol=0; symbol<=maxSymbolValue; symbol++) {
- int nbOccurences;
+ int nbOccurrences;
int const freq = normalizedCounter[symbol];
- for (nbOccurences=0; nbOccurences<freq; nbOccurences++) {
+ for (nbOccurrences=0; nbOccurrences<freq; nbOccurrences++) {
tableSymbol[position] = (FSE_FUNCTION_TYPE)symbol;
position = (position + step) & tableMask;
while (position > highThreshold)
--- a/contrib/python-zstandard/zstd/compress/zstd_compress.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress.c Sun Sep 15 20:04:00 2019 -0700
@@ -21,6 +21,8 @@
#define HUF_STATIC_LINKING_ONLY
#include "huf.h"
#include "zstd_compress_internal.h"
+#include "zstd_compress_sequences.h"
+#include "zstd_compress_literals.h"
#include "zstd_fast.h"
#include "zstd_double_fast.h"
#include "zstd_lazy.h"
@@ -103,12 +105,31 @@
return cctx;
}
+/**
+ * Clears and frees all of the dictionaries in the CCtx.
+ */
+static void ZSTD_clearAllDicts(ZSTD_CCtx* cctx)
+{
+ ZSTD_free(cctx->localDict.dictBuffer, cctx->customMem);
+ ZSTD_freeCDict(cctx->localDict.cdict);
+ memset(&cctx->localDict, 0, sizeof(cctx->localDict));
+ memset(&cctx->prefixDict, 0, sizeof(cctx->prefixDict));
+ cctx->cdict = NULL;
+}
+
+static size_t ZSTD_sizeof_localDict(ZSTD_localDict dict)
+{
+ size_t const bufferSize = dict.dictBuffer != NULL ? dict.dictSize : 0;
+ size_t const cdictSize = ZSTD_sizeof_CDict(dict.cdict);
+ return bufferSize + cdictSize;
+}
+
static void ZSTD_freeCCtxContent(ZSTD_CCtx* cctx)
{
assert(cctx != NULL);
assert(cctx->staticSize == 0);
ZSTD_free(cctx->workSpace, cctx->customMem); cctx->workSpace = NULL;
- ZSTD_freeCDict(cctx->cdictLocal); cctx->cdictLocal = NULL;
+ ZSTD_clearAllDicts(cctx);
#ifdef ZSTD_MULTITHREAD
ZSTDMT_freeCCtx(cctx->mtctx); cctx->mtctx = NULL;
#endif
@@ -117,7 +138,8 @@
size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx)
{
if (cctx==NULL) return 0; /* support free on NULL */
- if (cctx->staticSize) return ERROR(memory_allocation); /* not compatible with static CCtx */
+ RETURN_ERROR_IF(cctx->staticSize, memory_allocation,
+ "not compatible with static CCtx");
ZSTD_freeCCtxContent(cctx);
ZSTD_free(cctx, cctx->customMem);
return 0;
@@ -139,7 +161,7 @@
{
if (cctx==NULL) return 0; /* support sizeof on NULL */
return sizeof(*cctx) + cctx->workSpaceSize
- + ZSTD_sizeof_CDict(cctx->cdictLocal)
+ + ZSTD_sizeof_localDict(cctx->localDict)
+ ZSTD_sizeof_mtctx(cctx);
}
@@ -195,7 +217,7 @@
}
size_t ZSTD_CCtxParams_init(ZSTD_CCtx_params* cctxParams, int compressionLevel) {
- if (!cctxParams) { return ERROR(GENERIC); }
+ RETURN_ERROR_IF(!cctxParams, GENERIC);
memset(cctxParams, 0, sizeof(*cctxParams));
cctxParams->compressionLevel = compressionLevel;
cctxParams->fParams.contentSizeFlag = 1;
@@ -204,8 +226,8 @@
size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_parameters params)
{
- if (!cctxParams) { return ERROR(GENERIC); }
- CHECK_F( ZSTD_checkCParams(params.cParams) );
+ RETURN_ERROR_IF(!cctxParams, GENERIC);
+ FORWARD_IF_ERROR( ZSTD_checkCParams(params.cParams) );
memset(cctxParams, 0, sizeof(*cctxParams));
cctxParams->cParams = params.cParams;
cctxParams->fParams = params.fParams;
@@ -359,6 +381,17 @@
bounds.upperBound = ZSTD_dictForceCopy; /* note : how to ensure at compile time that this is the highest value enum ? */
return bounds;
+ case ZSTD_c_literalCompressionMode:
+ ZSTD_STATIC_ASSERT(ZSTD_lcm_auto < ZSTD_lcm_huffman && ZSTD_lcm_huffman < ZSTD_lcm_uncompressed);
+ bounds.lowerBound = ZSTD_lcm_auto;
+ bounds.upperBound = ZSTD_lcm_uncompressed;
+ return bounds;
+
+ case ZSTD_c_targetCBlockSize:
+ bounds.lowerBound = ZSTD_TARGETCBLOCKSIZE_MIN;
+ bounds.upperBound = ZSTD_TARGETCBLOCKSIZE_MAX;
+ return bounds;
+
default:
{ ZSTD_bounds const boundError = { ERROR(parameter_unsupported), 0, 0 };
return boundError;
@@ -366,22 +399,22 @@
}
}
-/* ZSTD_cParam_withinBounds:
- * @return 1 if value is within cParam bounds,
- * 0 otherwise */
-static int ZSTD_cParam_withinBounds(ZSTD_cParameter cParam, int value)
+/* ZSTD_cParam_clampBounds:
+ * Clamps the value into the bounded range.
+ */
+static size_t ZSTD_cParam_clampBounds(ZSTD_cParameter cParam, int* value)
{
ZSTD_bounds const bounds = ZSTD_cParam_getBounds(cParam);
- if (ZSTD_isError(bounds.error)) return 0;
- if (value < bounds.lowerBound) return 0;
- if (value > bounds.upperBound) return 0;
- return 1;
+ if (ZSTD_isError(bounds.error)) return bounds.error;
+ if (*value < bounds.lowerBound) *value = bounds.lowerBound;
+ if (*value > bounds.upperBound) *value = bounds.upperBound;
+ return 0;
}
-#define BOUNDCHECK(cParam, val) { \
- if (!ZSTD_cParam_withinBounds(cParam,val)) { \
- return ERROR(parameter_outOfBound); \
-} }
+#define BOUNDCHECK(cParam, val) { \
+ RETURN_ERROR_IF(!ZSTD_cParam_withinBounds(cParam,val), \
+ parameter_outOfBound); \
+}
static int ZSTD_isUpdateAuthorized(ZSTD_cParameter param)
@@ -413,6 +446,8 @@
case ZSTD_c_ldmBucketSizeLog:
case ZSTD_c_ldmHashRateLog:
case ZSTD_c_forceAttachDict:
+ case ZSTD_c_literalCompressionMode:
+ case ZSTD_c_targetCBlockSize:
default:
return 0;
}
@@ -425,18 +460,17 @@
if (ZSTD_isUpdateAuthorized(param)) {
cctx->cParamsChanged = 1;
} else {
- return ERROR(stage_wrong);
+ RETURN_ERROR(stage_wrong);
} }
switch(param)
{
- case ZSTD_c_format :
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
+ case ZSTD_c_nbWorkers:
+ RETURN_ERROR_IF((value!=0) && cctx->staticSize, parameter_unsupported,
+ "MT not compatible with static alloc");
+ break;
case ZSTD_c_compressionLevel:
- if (cctx->cdict) return ERROR(stage_wrong);
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
case ZSTD_c_windowLog:
case ZSTD_c_hashLog:
case ZSTD_c_chainLog:
@@ -444,49 +478,33 @@
case ZSTD_c_minMatch:
case ZSTD_c_targetLength:
case ZSTD_c_strategy:
- if (cctx->cdict) return ERROR(stage_wrong);
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
+ case ZSTD_c_ldmHashRateLog:
+ case ZSTD_c_format:
case ZSTD_c_contentSizeFlag:
case ZSTD_c_checksumFlag:
case ZSTD_c_dictIDFlag:
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
- case ZSTD_c_forceMaxWindow : /* Force back-references to remain < windowSize,
- * even when referencing into Dictionary content.
- * default : 0 when using a CDict, 1 when using a Prefix */
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
+ case ZSTD_c_forceMaxWindow:
case ZSTD_c_forceAttachDict:
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
- case ZSTD_c_nbWorkers:
- if ((value!=0) && cctx->staticSize) {
- return ERROR(parameter_unsupported); /* MT not compatible with static alloc */
- }
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
+ case ZSTD_c_literalCompressionMode:
case ZSTD_c_jobSize:
case ZSTD_c_overlapLog:
case ZSTD_c_rsyncable:
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
case ZSTD_c_enableLongDistanceMatching:
case ZSTD_c_ldmHashLog:
case ZSTD_c_ldmMinMatch:
case ZSTD_c_ldmBucketSizeLog:
- case ZSTD_c_ldmHashRateLog:
- if (cctx->cdict) return ERROR(stage_wrong);
- return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
-
- default: return ERROR(parameter_unsupported);
+ case ZSTD_c_targetCBlockSize:
+ break;
+
+ default: RETURN_ERROR(parameter_unsupported);
}
+ return ZSTD_CCtxParams_setParameter(&cctx->requestedParams, param, value);
}
-size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* CCtxParams,
- ZSTD_cParameter param, int value)
+size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params* CCtxParams,
+ ZSTD_cParameter param, int value)
{
- DEBUGLOG(4, "ZSTD_CCtxParam_setParameter (%i, %i)", (int)param, value);
+ DEBUGLOG(4, "ZSTD_CCtxParams_setParameter (%i, %i)", (int)param, value);
switch(param)
{
case ZSTD_c_format :
@@ -495,11 +513,9 @@
return (size_t)CCtxParams->format;
case ZSTD_c_compressionLevel : {
- int cLevel = value;
- if (cLevel > ZSTD_maxCLevel()) cLevel = ZSTD_maxCLevel();
- if (cLevel < ZSTD_minCLevel()) cLevel = ZSTD_minCLevel();
- if (cLevel) { /* 0 : does not change current level */
- CCtxParams->compressionLevel = cLevel;
+ FORWARD_IF_ERROR(ZSTD_cParam_clampBounds(param, &value));
+ if (value) { /* 0 : does not change current level */
+ CCtxParams->compressionLevel = value;
}
if (CCtxParams->compressionLevel >= 0) return CCtxParams->compressionLevel;
return 0; /* return type (size_t) cannot represent negative values */
@@ -573,33 +589,55 @@
return CCtxParams->attachDictPref;
}
+ case ZSTD_c_literalCompressionMode : {
+ const ZSTD_literalCompressionMode_e lcm = (ZSTD_literalCompressionMode_e)value;
+ BOUNDCHECK(ZSTD_c_literalCompressionMode, lcm);
+ CCtxParams->literalCompressionMode = lcm;
+ return CCtxParams->literalCompressionMode;
+ }
+
case ZSTD_c_nbWorkers :
#ifndef ZSTD_MULTITHREAD
- if (value!=0) return ERROR(parameter_unsupported);
+ RETURN_ERROR_IF(value!=0, parameter_unsupported, "not compiled with multithreading");
return 0;
#else
- return ZSTDMT_CCtxParam_setNbWorkers(CCtxParams, value);
+ FORWARD_IF_ERROR(ZSTD_cParam_clampBounds(param, &value));
+ CCtxParams->nbWorkers = value;
+ return CCtxParams->nbWorkers;
#endif
case ZSTD_c_jobSize :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR_IF(value!=0, parameter_unsupported, "not compiled with multithreading");
+ return 0;
#else
- return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_jobSize, value);
+ /* Adjust to the minimum non-default value. */
+ if (value != 0 && value < ZSTDMT_JOBSIZE_MIN)
+ value = ZSTDMT_JOBSIZE_MIN;
+ FORWARD_IF_ERROR(ZSTD_cParam_clampBounds(param, &value));
+ assert(value >= 0);
+ CCtxParams->jobSize = value;
+ return CCtxParams->jobSize;
#endif
case ZSTD_c_overlapLog :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR_IF(value!=0, parameter_unsupported, "not compiled with multithreading");
+ return 0;
#else
- return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_overlapLog, value);
+ FORWARD_IF_ERROR(ZSTD_cParam_clampBounds(ZSTD_c_overlapLog, &value));
+ CCtxParams->overlapLog = value;
+ return CCtxParams->overlapLog;
#endif
case ZSTD_c_rsyncable :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR_IF(value!=0, parameter_unsupported, "not compiled with multithreading");
+ return 0;
#else
- return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_rsyncable, value);
+ FORWARD_IF_ERROR(ZSTD_cParam_clampBounds(ZSTD_c_overlapLog, &value));
+ CCtxParams->rsyncable = value;
+ return CCtxParams->rsyncable;
#endif
case ZSTD_c_enableLongDistanceMatching :
@@ -625,21 +663,27 @@
return CCtxParams->ldmParams.bucketSizeLog;
case ZSTD_c_ldmHashRateLog :
- if (value > ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN)
- return ERROR(parameter_outOfBound);
+ RETURN_ERROR_IF(value > ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN,
+ parameter_outOfBound);
CCtxParams->ldmParams.hashRateLog = value;
return CCtxParams->ldmParams.hashRateLog;
- default: return ERROR(parameter_unsupported);
+ case ZSTD_c_targetCBlockSize :
+ if (value!=0) /* 0 ==> default */
+ BOUNDCHECK(ZSTD_c_targetCBlockSize, value);
+ CCtxParams->targetCBlockSize = value;
+ return CCtxParams->targetCBlockSize;
+
+ default: RETURN_ERROR(parameter_unsupported, "unknown parameter");
}
}
size_t ZSTD_CCtx_getParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int* value)
{
- return ZSTD_CCtxParam_getParameter(&cctx->requestedParams, param, value);
+ return ZSTD_CCtxParams_getParameter(&cctx->requestedParams, param, value);
}
-size_t ZSTD_CCtxParam_getParameter(
+size_t ZSTD_CCtxParams_getParameter(
ZSTD_CCtx_params* CCtxParams, ZSTD_cParameter param, int* value)
{
switch(param)
@@ -651,13 +695,13 @@
*value = CCtxParams->compressionLevel;
break;
case ZSTD_c_windowLog :
- *value = CCtxParams->cParams.windowLog;
+ *value = (int)CCtxParams->cParams.windowLog;
break;
case ZSTD_c_hashLog :
- *value = CCtxParams->cParams.hashLog;
+ *value = (int)CCtxParams->cParams.hashLog;
break;
case ZSTD_c_chainLog :
- *value = CCtxParams->cParams.chainLog;
+ *value = (int)CCtxParams->cParams.chainLog;
break;
case ZSTD_c_searchLog :
*value = CCtxParams->cParams.searchLog;
@@ -686,6 +730,9 @@
case ZSTD_c_forceAttachDict :
*value = CCtxParams->attachDictPref;
break;
+ case ZSTD_c_literalCompressionMode :
+ *value = CCtxParams->literalCompressionMode;
+ break;
case ZSTD_c_nbWorkers :
#ifndef ZSTD_MULTITHREAD
assert(CCtxParams->nbWorkers == 0);
@@ -694,7 +741,7 @@
break;
case ZSTD_c_jobSize :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR(parameter_unsupported, "not compiled with multithreading");
#else
assert(CCtxParams->jobSize <= INT_MAX);
*value = (int)CCtxParams->jobSize;
@@ -702,14 +749,14 @@
#endif
case ZSTD_c_overlapLog :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR(parameter_unsupported, "not compiled with multithreading");
#else
*value = CCtxParams->overlapLog;
break;
#endif
case ZSTD_c_rsyncable :
#ifndef ZSTD_MULTITHREAD
- return ERROR(parameter_unsupported);
+ RETURN_ERROR(parameter_unsupported, "not compiled with multithreading");
#else
*value = CCtxParams->rsyncable;
break;
@@ -729,7 +776,10 @@
case ZSTD_c_ldmHashRateLog :
*value = CCtxParams->ldmParams.hashRateLog;
break;
- default: return ERROR(parameter_unsupported);
+ case ZSTD_c_targetCBlockSize :
+ *value = (int)CCtxParams->targetCBlockSize;
+ break;
+ default: RETURN_ERROR(parameter_unsupported, "unknown parameter");
}
return 0;
}
@@ -745,8 +795,8 @@
ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params)
{
DEBUGLOG(4, "ZSTD_CCtx_setParametersUsingCCtxParams");
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
- if (cctx->cdict) return ERROR(stage_wrong);
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
+ RETURN_ERROR_IF(cctx->cdict, stage_wrong);
cctx->requestedParams = *params;
return 0;
@@ -755,33 +805,71 @@
ZSTDLIB_API size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long pledgedSrcSize)
{
DEBUGLOG(4, "ZSTD_CCtx_setPledgedSrcSize to %u bytes", (U32)pledgedSrcSize);
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
cctx->pledgedSrcSizePlusOne = pledgedSrcSize+1;
return 0;
}
+/**
+ * Initializes the local dict using the requested parameters.
+ * NOTE: This does not use the pledged src size, because it may be used for more
+ * than one compression.
+ */
+static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx)
+{
+ ZSTD_localDict* const dl = &cctx->localDict;
+ ZSTD_compressionParameters const cParams = ZSTD_getCParamsFromCCtxParams(
+ &cctx->requestedParams, 0, dl->dictSize);
+ if (dl->dict == NULL) {
+ /* No local dictionary. */
+ assert(dl->dictBuffer == NULL);
+ assert(dl->cdict == NULL);
+ assert(dl->dictSize == 0);
+ return 0;
+ }
+ if (dl->cdict != NULL) {
+ assert(cctx->cdict == dl->cdict);
+ /* Local dictionary already initialized. */
+ return 0;
+ }
+ assert(dl->dictSize > 0);
+ assert(cctx->cdict == NULL);
+ assert(cctx->prefixDict.dict == NULL);
+
+ dl->cdict = ZSTD_createCDict_advanced(
+ dl->dict,
+ dl->dictSize,
+ ZSTD_dlm_byRef,
+ dl->dictContentType,
+ cParams,
+ cctx->customMem);
+ RETURN_ERROR_IF(!dl->cdict, memory_allocation);
+ cctx->cdict = dl->cdict;
+ return 0;
+}
+
size_t ZSTD_CCtx_loadDictionary_advanced(
ZSTD_CCtx* cctx, const void* dict, size_t dictSize,
ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType)
{
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
- if (cctx->staticSize) return ERROR(memory_allocation); /* no malloc for static CCtx */
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
+ RETURN_ERROR_IF(cctx->staticSize, memory_allocation,
+ "no malloc for static CCtx");
DEBUGLOG(4, "ZSTD_CCtx_loadDictionary_advanced (size: %u)", (U32)dictSize);
- ZSTD_freeCDict(cctx->cdictLocal); /* in case one already exists */
- if (dict==NULL || dictSize==0) { /* no dictionary mode */
- cctx->cdictLocal = NULL;
- cctx->cdict = NULL;
+ ZSTD_clearAllDicts(cctx); /* in case one already exists */
+ if (dict == NULL || dictSize == 0) /* no dictionary mode */
+ return 0;
+ if (dictLoadMethod == ZSTD_dlm_byRef) {
+ cctx->localDict.dict = dict;
} else {
- ZSTD_compressionParameters const cParams =
- ZSTD_getCParamsFromCCtxParams(&cctx->requestedParams, cctx->pledgedSrcSizePlusOne-1, dictSize);
- cctx->cdictLocal = ZSTD_createCDict_advanced(
- dict, dictSize,
- dictLoadMethod, dictContentType,
- cParams, cctx->customMem);
- cctx->cdict = cctx->cdictLocal;
- if (cctx->cdictLocal == NULL)
- return ERROR(memory_allocation);
+ void* dictBuffer = ZSTD_malloc(dictSize, cctx->customMem);
+ RETURN_ERROR_IF(!dictBuffer, memory_allocation);
+ memcpy(dictBuffer, dict, dictSize);
+ cctx->localDict.dictBuffer = dictBuffer;
+ cctx->localDict.dict = dictBuffer;
}
+ cctx->localDict.dictSize = dictSize;
+ cctx->localDict.dictContentType = dictContentType;
return 0;
}
@@ -801,9 +889,10 @@
size_t ZSTD_CCtx_refCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict)
{
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
+ /* Free the existing local cdict (if any) to save memory. */
+ ZSTD_clearAllDicts(cctx);
cctx->cdict = cdict;
- memset(&cctx->prefixDict, 0, sizeof(cctx->prefixDict)); /* exclusive */
return 0;
}
@@ -815,8 +904,8 @@
size_t ZSTD_CCtx_refPrefix_advanced(
ZSTD_CCtx* cctx, const void* prefix, size_t prefixSize, ZSTD_dictContentType_e dictContentType)
{
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
- cctx->cdict = NULL; /* prefix discards any prior cdict */
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
+ ZSTD_clearAllDicts(cctx);
cctx->prefixDict.dict = prefix;
cctx->prefixDict.dictSize = prefixSize;
cctx->prefixDict.dictContentType = dictContentType;
@@ -834,8 +923,8 @@
}
if ( (reset == ZSTD_reset_parameters)
|| (reset == ZSTD_reset_session_and_parameters) ) {
- if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
- cctx->cdict = NULL;
+ RETURN_ERROR_IF(cctx->streamStage != zcss_init, stage_wrong);
+ ZSTD_clearAllDicts(cctx);
return ZSTD_CCtxParams_reset(&cctx->requestedParams);
}
return 0;
@@ -847,12 +936,12 @@
@return : 0, or an error code if one value is beyond authorized range */
size_t ZSTD_checkCParams(ZSTD_compressionParameters cParams)
{
- BOUNDCHECK(ZSTD_c_windowLog, cParams.windowLog);
- BOUNDCHECK(ZSTD_c_chainLog, cParams.chainLog);
- BOUNDCHECK(ZSTD_c_hashLog, cParams.hashLog);
- BOUNDCHECK(ZSTD_c_searchLog, cParams.searchLog);
- BOUNDCHECK(ZSTD_c_minMatch, cParams.minMatch);
- BOUNDCHECK(ZSTD_c_targetLength,cParams.targetLength);
+ BOUNDCHECK(ZSTD_c_windowLog, (int)cParams.windowLog);
+ BOUNDCHECK(ZSTD_c_chainLog, (int)cParams.chainLog);
+ BOUNDCHECK(ZSTD_c_hashLog, (int)cParams.hashLog);
+ BOUNDCHECK(ZSTD_c_searchLog, (int)cParams.searchLog);
+ BOUNDCHECK(ZSTD_c_minMatch, (int)cParams.minMatch);
+ BOUNDCHECK(ZSTD_c_targetLength,(int)cParams.targetLength);
BOUNDCHECK(ZSTD_c_strategy, cParams.strategy);
return 0;
}
@@ -868,7 +957,7 @@
if ((int)val<bounds.lowerBound) val=(type)bounds.lowerBound; \
else if ((int)val>bounds.upperBound) val=(type)bounds.upperBound; \
}
-# define CLAMP(cParam, val) CLAMP_TYPE(cParam, val, int)
+# define CLAMP(cParam, val) CLAMP_TYPE(cParam, val, unsigned)
CLAMP(ZSTD_c_windowLog, cParams.windowLog);
CLAMP(ZSTD_c_chainLog, cParams.chainLog);
CLAMP(ZSTD_c_hashLog, cParams.hashLog);
@@ -888,10 +977,11 @@
}
/** ZSTD_adjustCParams_internal() :
- optimize `cPar` for a given input (`srcSize` and `dictSize`).
- mostly downsizing to reduce memory consumption and initialization latency.
- Both `srcSize` and `dictSize` are optional (use 0 if unknown).
- Note : cPar is assumed validated. Use ZSTD_checkCParams() to ensure this condition. */
+ * optimize `cPar` for a specified input (`srcSize` and `dictSize`).
+ * mostly downsize to reduce memory consumption and initialization latency.
+ * `srcSize` can be ZSTD_CONTENTSIZE_UNKNOWN when not known.
+ * note : for the time being, `srcSize==0` means "unknown" too, for compatibility with older convention.
+ * condition : cPar is presumed validated (can be checked using ZSTD_checkCParams()). */
static ZSTD_compressionParameters
ZSTD_adjustCParams_internal(ZSTD_compressionParameters cPar,
unsigned long long srcSize,
@@ -901,7 +991,7 @@
static const U64 maxWindowResize = 1ULL << (ZSTD_WINDOWLOG_MAX-1);
assert(ZSTD_checkCParams(cPar)==0);
- if (dictSize && (srcSize+1<2) /* srcSize unknown */ )
+ if (dictSize && (srcSize+1<2) /* ZSTD_CONTENTSIZE_UNKNOWN and 0 mean "unknown" */ )
srcSize = minSrcSize; /* presumed small when there is a dictionary */
else if (srcSize == 0)
srcSize = ZSTD_CONTENTSIZE_UNKNOWN; /* 0 == unknown : presumed large */
@@ -922,7 +1012,7 @@
}
if (cPar.windowLog < ZSTD_WINDOWLOG_ABSOLUTEMIN)
- cPar.windowLog = ZSTD_WINDOWLOG_ABSOLUTEMIN; /* required for frame header */
+ cPar.windowLog = ZSTD_WINDOWLOG_ABSOLUTEMIN; /* minimum wlog required for valid frame header */
return cPar;
}
@@ -932,7 +1022,7 @@
unsigned long long srcSize,
size_t dictSize)
{
- cPar = ZSTD_clampCParams(cPar);
+ cPar = ZSTD_clampCParams(cPar); /* resulting cPar is necessarily valid (all parameters within range) */
return ZSTD_adjustCParams_internal(cPar, srcSize, dictSize);
}
@@ -973,8 +1063,7 @@
size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
{
- /* Estimate CCtx size is supported for single-threaded compression only. */
- if (params->nbWorkers > 0) { return ERROR(GENERIC); }
+ RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is supported for single-threaded compression only.");
{ ZSTD_compressionParameters const cParams =
ZSTD_getCParamsFromCCtxParams(params, 0, 0);
size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cParams.windowLog);
@@ -1022,10 +1111,12 @@
size_t ZSTD_estimateCStreamSize_usingCCtxParams(const ZSTD_CCtx_params* params)
{
- if (params->nbWorkers > 0) { return ERROR(GENERIC); }
- { size_t const CCtxSize = ZSTD_estimateCCtxSize_usingCCtxParams(params);
- size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << params->cParams.windowLog);
- size_t const inBuffSize = ((size_t)1 << params->cParams.windowLog) + blockSize;
+ RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is supported for single-threaded compression only.");
+ { ZSTD_compressionParameters const cParams =
+ ZSTD_getCParamsFromCCtxParams(params, 0, 0);
+ size_t const CCtxSize = ZSTD_estimateCCtxSize_usingCCtxParams(params);
+ size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cParams.windowLog);
+ size_t const inBuffSize = ((size_t)1 << cParams.windowLog) + blockSize;
size_t const outBuffSize = ZSTD_compressBound(blockSize) + 1;
size_t const streamingSize = inBuffSize + outBuffSize;
@@ -1197,15 +1288,14 @@
}
/*! ZSTD_invalidateMatchState()
- * Invalidate all the matches in the match finder tables.
- * Requires nextSrc and base to be set (can be NULL).
+ * Invalidate all the matches in the match finder tables.
+ * Requires nextSrc and base to be set (can be NULL).
*/
static void ZSTD_invalidateMatchState(ZSTD_matchState_t* ms)
{
ZSTD_window_clear(&ms->window);
ms->nextToUpdate = ms->window.dictLimit;
- ms->nextToUpdate3 = ms->window.dictLimit;
ms->loadedDictEnd = 0;
ms->opt.litLengthSum = 0; /* force reset of btopt stats */
ms->dictMatchState = NULL;
@@ -1242,15 +1332,17 @@
typedef enum { ZSTDcrp_continue, ZSTDcrp_noMemset } ZSTD_compResetPolicy_e;
+typedef enum { ZSTD_resetTarget_CDict, ZSTD_resetTarget_CCtx } ZSTD_resetTarget_e;
+
static void*
ZSTD_reset_matchState(ZSTD_matchState_t* ms,
void* ptr,
const ZSTD_compressionParameters* cParams,
- ZSTD_compResetPolicy_e const crp, U32 const forCCtx)
+ ZSTD_compResetPolicy_e const crp, ZSTD_resetTarget_e const forWho)
{
size_t const chainSize = (cParams->strategy == ZSTD_fast) ? 0 : ((size_t)1 << cParams->chainLog);
size_t const hSize = ((size_t)1) << cParams->hashLog;
- U32 const hashLog3 = (forCCtx && cParams->minMatch==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
+ U32 const hashLog3 = ((forWho == ZSTD_resetTarget_CCtx) && cParams->minMatch==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
size_t const h3Size = ((size_t)1) << hashLog3;
size_t const tableSpace = (chainSize + hSize + h3Size) * sizeof(U32);
@@ -1264,7 +1356,7 @@
ZSTD_invalidateMatchState(ms);
/* opt parser space */
- if (forCCtx && (cParams->strategy >= ZSTD_btopt)) {
+ if ((forWho == ZSTD_resetTarget_CCtx) && (cParams->strategy >= ZSTD_btopt)) {
DEBUGLOG(4, "reserving optimal parser space");
ms->opt.litFreq = (unsigned*)ptr;
ms->opt.litLengthFreq = ms->opt.litFreq + (1<<Litbits);
@@ -1292,6 +1384,19 @@
return ptr;
}
+/* ZSTD_indexTooCloseToMax() :
+ * minor optimization : prefer memset() rather than reduceIndex()
+ * which is measurably slow in some circumstances (reported for Visual Studio).
+ * Works when re-using a context for a lot of smallish inputs :
+ * if all inputs are smaller than ZSTD_INDEXOVERFLOW_MARGIN,
+ * memset() will be triggered before reduceIndex().
+ */
+#define ZSTD_INDEXOVERFLOW_MARGIN (16 MB)
+static int ZSTD_indexTooCloseToMax(ZSTD_window_t w)
+{
+ return (size_t)(w.nextSrc - w.base) > (ZSTD_CURRENT_MAX - ZSTD_INDEXOVERFLOW_MARGIN);
+}
+
#define ZSTD_WORKSPACETOOLARGE_FACTOR 3 /* define "workspace is too large" as this number of times larger than needed */
#define ZSTD_WORKSPACETOOLARGE_MAXDURATION 128 /* when workspace is continuously too large
* during at least this number of times,
@@ -1303,7 +1408,7 @@
note : `params` are assumed fully validated at this stage */
static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
ZSTD_CCtx_params params,
- U64 pledgedSrcSize,
+ U64 const pledgedSrcSize,
ZSTD_compResetPolicy_e const crp,
ZSTD_buffered_policy_e const zbuff)
{
@@ -1315,13 +1420,21 @@
if (ZSTD_equivalentParams(zc->appliedParams, params,
zc->inBuffSize,
zc->seqStore.maxNbSeq, zc->seqStore.maxNbLit,
- zbuff, pledgedSrcSize)) {
- DEBUGLOG(4, "ZSTD_equivalentParams()==1 -> continue mode (wLog1=%u, blockSize1=%zu)",
- zc->appliedParams.cParams.windowLog, zc->blockSize);
+ zbuff, pledgedSrcSize) ) {
+ DEBUGLOG(4, "ZSTD_equivalentParams()==1 -> consider continue mode");
zc->workSpaceOversizedDuration += (zc->workSpaceOversizedDuration > 0); /* if it was too large, it still is */
- if (zc->workSpaceOversizedDuration <= ZSTD_WORKSPACETOOLARGE_MAXDURATION)
+ if (zc->workSpaceOversizedDuration <= ZSTD_WORKSPACETOOLARGE_MAXDURATION) {
+ DEBUGLOG(4, "continue mode confirmed (wLog1=%u, blockSize1=%zu)",
+ zc->appliedParams.cParams.windowLog, zc->blockSize);
+ if (ZSTD_indexTooCloseToMax(zc->blockState.matchState.window)) {
+ /* prefer a reset, faster than a rescale */
+ ZSTD_reset_matchState(&zc->blockState.matchState,
+ zc->entropyWorkspace + HUF_WORKSPACE_SIZE_U32,
+ ¶ms.cParams,
+ crp, ZSTD_resetTarget_CCtx);
+ }
return ZSTD_continueCCtx(zc, params, pledgedSrcSize);
- } }
+ } } }
DEBUGLOG(4, "ZSTD_equivalentParams()==0 -> reset CCtx");
if (params.ldmParams.enableLdm) {
@@ -1364,16 +1477,16 @@
DEBUGLOG(4, "windowSize: %zu - blockSize: %zu", windowSize, blockSize);
if (workSpaceTooSmall || workSpaceWasteful) {
- DEBUGLOG(4, "Need to resize workSpaceSize from %zuKB to %zuKB",
+ DEBUGLOG(4, "Resize workSpaceSize from %zuKB to %zuKB",
zc->workSpaceSize >> 10,
neededSpace >> 10);
- /* static cctx : no resize, error out */
- if (zc->staticSize) return ERROR(memory_allocation);
+
+ RETURN_ERROR_IF(zc->staticSize, memory_allocation, "static cctx : no resize");
zc->workSpaceSize = 0;
ZSTD_free(zc->workSpace, zc->customMem);
zc->workSpace = ZSTD_malloc(neededSpace, zc->customMem);
- if (zc->workSpace == NULL) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(zc->workSpace == NULL, memory_allocation);
zc->workSpaceSize = neededSpace;
zc->workSpaceOversizedDuration = 0;
@@ -1406,7 +1519,10 @@
ZSTD_reset_compressedBlockState(zc->blockState.prevCBlock);
- ptr = zc->entropyWorkspace + HUF_WORKSPACE_SIZE_U32;
+ ptr = ZSTD_reset_matchState(&zc->blockState.matchState,
+ zc->entropyWorkspace + HUF_WORKSPACE_SIZE_U32,
+ ¶ms.cParams,
+ crp, ZSTD_resetTarget_CCtx);
/* ldm hash table */
/* initialize bucketOffsets table later for pointer alignment */
@@ -1424,8 +1540,6 @@
}
assert(((size_t)ptr & 3) == 0); /* ensure ptr is properly aligned */
- ptr = ZSTD_reset_matchState(&zc->blockState.matchState, ptr, ¶ms.cParams, crp, /* forCCtx */ 1);
-
/* sequences storage */
zc->seqStore.maxNbSeq = maxNbSeq;
zc->seqStore.sequencesStart = (seqDef*)ptr;
@@ -1502,15 +1616,14 @@
* handled in _enforceMaxDist */
}
-static size_t ZSTD_resetCCtx_byAttachingCDict(
- ZSTD_CCtx* cctx,
- const ZSTD_CDict* cdict,
- ZSTD_CCtx_params params,
- U64 pledgedSrcSize,
- ZSTD_buffered_policy_e zbuff)
+static size_t
+ZSTD_resetCCtx_byAttachingCDict(ZSTD_CCtx* cctx,
+ const ZSTD_CDict* cdict,
+ ZSTD_CCtx_params params,
+ U64 pledgedSrcSize,
+ ZSTD_buffered_policy_e zbuff)
{
- {
- const ZSTD_compressionParameters *cdict_cParams = &cdict->matchState.cParams;
+ { const ZSTD_compressionParameters* const cdict_cParams = &cdict->matchState.cParams;
unsigned const windowLog = params.cParams.windowLog;
assert(windowLog != 0);
/* Resize working context table params for input only, since the dict
@@ -1522,8 +1635,7 @@
assert(cctx->appliedParams.cParams.strategy == cdict_cParams->strategy);
}
- {
- const U32 cdictEnd = (U32)( cdict->matchState.window.nextSrc
+ { const U32 cdictEnd = (U32)( cdict->matchState.window.nextSrc
- cdict->matchState.window.base);
const U32 cdictLen = cdictEnd - cdict->matchState.window.dictLimit;
if (cdictLen == 0) {
@@ -1540,9 +1652,9 @@
cctx->blockState.matchState.window.base + cdictEnd;
ZSTD_window_clear(&cctx->blockState.matchState.window);
}
+ /* loadedDictEnd is expressed within the referential of the active context */
cctx->blockState.matchState.loadedDictEnd = cctx->blockState.matchState.window.dictLimit;
- }
- }
+ } }
cctx->dictID = cdict->dictID;
@@ -1596,7 +1708,6 @@
ZSTD_matchState_t* dstMatchState = &cctx->blockState.matchState;
dstMatchState->window = srcMatchState->window;
dstMatchState->nextToUpdate = srcMatchState->nextToUpdate;
- dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3;
dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd;
}
@@ -1644,7 +1755,7 @@
ZSTD_buffered_policy_e zbuff)
{
DEBUGLOG(5, "ZSTD_copyCCtx_internal");
- if (srcCCtx->stage!=ZSTDcs_init) return ERROR(stage_wrong);
+ RETURN_ERROR_IF(srcCCtx->stage!=ZSTDcs_init, stage_wrong);
memcpy(&dstCCtx->customMem, &srcCCtx->customMem, sizeof(ZSTD_customMem));
{ ZSTD_CCtx_params params = dstCCtx->requestedParams;
@@ -1676,7 +1787,6 @@
ZSTD_matchState_t* dstMatchState = &dstCCtx->blockState.matchState;
dstMatchState->window = srcMatchState->window;
dstMatchState->nextToUpdate = srcMatchState->nextToUpdate;
- dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3;
dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd;
}
dstCCtx->dictID = srcCCtx->dictID;
@@ -1746,16 +1856,15 @@
/*! ZSTD_reduceIndex() :
* rescale all indexes to avoid future overflow (indexes are U32) */
-static void ZSTD_reduceIndex (ZSTD_CCtx* zc, const U32 reducerValue)
+static void ZSTD_reduceIndex (ZSTD_matchState_t* ms, ZSTD_CCtx_params const* params, const U32 reducerValue)
{
- ZSTD_matchState_t* const ms = &zc->blockState.matchState;
- { U32 const hSize = (U32)1 << zc->appliedParams.cParams.hashLog;
+ { U32 const hSize = (U32)1 << params->cParams.hashLog;
ZSTD_reduceTable(ms->hashTable, hSize, reducerValue);
}
- if (zc->appliedParams.cParams.strategy != ZSTD_fast) {
- U32 const chainSize = (U32)1 << zc->appliedParams.cParams.chainLog;
- if (zc->appliedParams.cParams.strategy == ZSTD_btlazy2)
+ if (params->cParams.strategy != ZSTD_fast) {
+ U32 const chainSize = (U32)1 << params->cParams.chainLog;
+ if (params->cParams.strategy == ZSTD_btlazy2)
ZSTD_reduceTable_btlazy2(ms->chainTable, chainSize, reducerValue);
else
ZSTD_reduceTable(ms->chainTable, chainSize, reducerValue);
@@ -1777,161 +1886,13 @@
static size_t ZSTD_noCompressBlock (void* dst, size_t dstCapacity, const void* src, size_t srcSize, U32 lastBlock)
{
U32 const cBlockHeader24 = lastBlock + (((U32)bt_raw)<<1) + (U32)(srcSize << 3);
- if (srcSize + ZSTD_blockHeaderSize > dstCapacity) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(srcSize + ZSTD_blockHeaderSize > dstCapacity,
+ dstSize_tooSmall);
MEM_writeLE24(dst, cBlockHeader24);
memcpy((BYTE*)dst + ZSTD_blockHeaderSize, src, srcSize);
return ZSTD_blockHeaderSize + srcSize;
}
-static size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void* src, size_t srcSize)
-{
- BYTE* const ostart = (BYTE* const)dst;
- U32 const flSize = 1 + (srcSize>31) + (srcSize>4095);
-
- if (srcSize + flSize > dstCapacity) return ERROR(dstSize_tooSmall);
-
- switch(flSize)
- {
- case 1: /* 2 - 1 - 5 */
- ostart[0] = (BYTE)((U32)set_basic + (srcSize<<3));
- break;
- case 2: /* 2 - 2 - 12 */
- MEM_writeLE16(ostart, (U16)((U32)set_basic + (1<<2) + (srcSize<<4)));
- break;
- case 3: /* 2 - 2 - 20 */
- MEM_writeLE32(ostart, (U32)((U32)set_basic + (3<<2) + (srcSize<<4)));
- break;
- default: /* not necessary : flSize is {1,2,3} */
- assert(0);
- }
-
- memcpy(ostart + flSize, src, srcSize);
- return srcSize + flSize;
-}
-
-static size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const void* src, size_t srcSize)
-{
- BYTE* const ostart = (BYTE* const)dst;
- U32 const flSize = 1 + (srcSize>31) + (srcSize>4095);
-
- (void)dstCapacity; /* dstCapacity already guaranteed to be >=4, hence large enough */
-
- switch(flSize)
- {
- case 1: /* 2 - 1 - 5 */
- ostart[0] = (BYTE)((U32)set_rle + (srcSize<<3));
- break;
- case 2: /* 2 - 2 - 12 */
- MEM_writeLE16(ostart, (U16)((U32)set_rle + (1<<2) + (srcSize<<4)));
- break;
- case 3: /* 2 - 2 - 20 */
- MEM_writeLE32(ostart, (U32)((U32)set_rle + (3<<2) + (srcSize<<4)));
- break;
- default: /* not necessary : flSize is {1,2,3} */
- assert(0);
- }
-
- ostart[flSize] = *(const BYTE*)src;
- return flSize+1;
-}
-
-
-/* ZSTD_minGain() :
- * minimum compression required
- * to generate a compress block or a compressed literals section.
- * note : use same formula for both situations */
-static size_t ZSTD_minGain(size_t srcSize, ZSTD_strategy strat)
-{
- U32 const minlog = (strat>=ZSTD_btultra) ? (U32)(strat) - 1 : 6;
- ZSTD_STATIC_ASSERT(ZSTD_btultra == 8);
- assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
- return (srcSize >> minlog) + 2;
-}
-
-static size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf,
- ZSTD_hufCTables_t* nextHuf,
- ZSTD_strategy strategy, int disableLiteralCompression,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- void* workspace, size_t wkspSize,
- const int bmi2)
-{
- size_t const minGain = ZSTD_minGain(srcSize, strategy);
- size_t const lhSize = 3 + (srcSize >= 1 KB) + (srcSize >= 16 KB);
- BYTE* const ostart = (BYTE*)dst;
- U32 singleStream = srcSize < 256;
- symbolEncodingType_e hType = set_compressed;
- size_t cLitSize;
-
- DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=%i)",
- disableLiteralCompression);
-
- /* Prepare nextEntropy assuming reusing the existing table */
- memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-
- if (disableLiteralCompression)
- return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
-
- /* small ? don't even attempt compression (speed opt) */
-# define COMPRESS_LITERALS_SIZE_MIN 63
- { size_t const minLitSize = (prevHuf->repeatMode == HUF_repeat_valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN;
- if (srcSize <= minLitSize) return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
- }
-
- if (dstCapacity < lhSize+1) return ERROR(dstSize_tooSmall); /* not enough space for compression */
- { HUF_repeat repeat = prevHuf->repeatMode;
- int const preferRepeat = strategy < ZSTD_lazy ? srcSize <= 1024 : 0;
- if (repeat == HUF_repeat_valid && lhSize == 3) singleStream = 1;
- cLitSize = singleStream ? HUF_compress1X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
- workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2)
- : HUF_compress4X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
- workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2);
- if (repeat != HUF_repeat_none) {
- /* reused the existing table */
- hType = set_repeat;
- }
- }
-
- if ((cLitSize==0) | (cLitSize >= srcSize - minGain) | ERR_isError(cLitSize)) {
- memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
- return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
- }
- if (cLitSize==1) {
- memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
- return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSize);
- }
-
- if (hType == set_compressed) {
- /* using a newly constructed table */
- nextHuf->repeatMode = HUF_repeat_check;
- }
-
- /* Build header */
- switch(lhSize)
- {
- case 3: /* 2 - 2 - 10 - 10 */
- { U32 const lhc = hType + ((!singleStream) << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<14);
- MEM_writeLE24(ostart, lhc);
- break;
- }
- case 4: /* 2 - 2 - 14 - 14 */
- { U32 const lhc = hType + (2 << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<18);
- MEM_writeLE32(ostart, lhc);
- break;
- }
- case 5: /* 2 - 2 - 18 - 18 */
- { U32 const lhc = hType + (3 << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<22);
- MEM_writeLE32(ostart, lhc);
- ostart[4] = (BYTE)(cLitSize >> 10);
- break;
- }
- default: /* not possible : lhSize is {3,4,5} */
- assert(0);
- }
- return lhSize+cLitSize;
-}
-
-
void ZSTD_seqToCodes(const seqStore_t* seqStorePtr)
{
const seqDef* const sequences = seqStorePtr->sequencesStart;
@@ -1954,418 +1915,19 @@
mlCodeTable[seqStorePtr->longLengthPos] = MaxML;
}
-
-/**
- * -log2(x / 256) lookup table for x in [0, 256).
- * If x == 0: Return 0
- * Else: Return floor(-log2(x / 256) * 256)
- */
-static unsigned const kInverseProbabiltyLog256[256] = {
- 0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162,
- 1130, 1100, 1073, 1047, 1024, 1001, 980, 960, 941, 923, 906, 889,
- 874, 859, 844, 830, 817, 804, 791, 779, 768, 756, 745, 734,
- 724, 714, 704, 694, 685, 676, 667, 658, 650, 642, 633, 626,
- 618, 610, 603, 595, 588, 581, 574, 567, 561, 554, 548, 542,
- 535, 529, 523, 517, 512, 506, 500, 495, 489, 484, 478, 473,
- 468, 463, 458, 453, 448, 443, 438, 434, 429, 424, 420, 415,
- 411, 407, 402, 398, 394, 390, 386, 382, 377, 373, 370, 366,
- 362, 358, 354, 350, 347, 343, 339, 336, 332, 329, 325, 322,
- 318, 315, 311, 308, 305, 302, 298, 295, 292, 289, 286, 282,
- 279, 276, 273, 270, 267, 264, 261, 258, 256, 253, 250, 247,
- 244, 241, 239, 236, 233, 230, 228, 225, 222, 220, 217, 215,
- 212, 209, 207, 204, 202, 199, 197, 194, 192, 190, 187, 185,
- 182, 180, 178, 175, 173, 171, 168, 166, 164, 162, 159, 157,
- 155, 153, 151, 149, 146, 144, 142, 140, 138, 136, 134, 132,
- 130, 128, 126, 123, 121, 119, 117, 115, 114, 112, 110, 108,
- 106, 104, 102, 100, 98, 96, 94, 93, 91, 89, 87, 85,
- 83, 82, 80, 78, 76, 74, 73, 71, 69, 67, 66, 64,
- 62, 61, 59, 57, 55, 54, 52, 50, 49, 47, 46, 44,
- 42, 41, 39, 37, 36, 34, 33, 31, 30, 28, 26, 25,
- 23, 22, 20, 19, 17, 16, 14, 13, 11, 10, 8, 7,
- 5, 4, 2, 1,
-};
-
-
-/**
- * Returns the cost in bits of encoding the distribution described by count
- * using the entropy bound.
- */
-static size_t ZSTD_entropyCost(unsigned const* count, unsigned const max, size_t const total)
-{
- unsigned cost = 0;
- unsigned s;
- for (s = 0; s <= max; ++s) {
- unsigned norm = (unsigned)((256 * count[s]) / total);
- if (count[s] != 0 && norm == 0)
- norm = 1;
- assert(count[s] < total);
- cost += count[s] * kInverseProbabiltyLog256[norm];
- }
- return cost >> 8;
-}
-
-
-/**
- * Returns the cost in bits of encoding the distribution in count using the
- * table described by norm. The max symbol support by norm is assumed >= max.
- * norm must be valid for every symbol with non-zero probability in count.
- */
-static size_t ZSTD_crossEntropyCost(short const* norm, unsigned accuracyLog,
- unsigned const* count, unsigned const max)
+static int ZSTD_disableLiteralsCompression(const ZSTD_CCtx_params* cctxParams)
{
- unsigned const shift = 8 - accuracyLog;
- size_t cost = 0;
- unsigned s;
- assert(accuracyLog <= 8);
- for (s = 0; s <= max; ++s) {
- unsigned const normAcc = norm[s] != -1 ? norm[s] : 1;
- unsigned const norm256 = normAcc << shift;
- assert(norm256 > 0);
- assert(norm256 < 256);
- cost += count[s] * kInverseProbabiltyLog256[norm256];
- }
- return cost >> 8;
-}
-
-
-static unsigned ZSTD_getFSEMaxSymbolValue(FSE_CTable const* ctable) {
- void const* ptr = ctable;
- U16 const* u16ptr = (U16 const*)ptr;
- U32 const maxSymbolValue = MEM_read16(u16ptr + 1);
- return maxSymbolValue;
-}
-
-
-/**
- * Returns the cost in bits of encoding the distribution in count using ctable.
- * Returns an error if ctable cannot represent all the symbols in count.
- */
-static size_t ZSTD_fseBitCost(
- FSE_CTable const* ctable,
- unsigned const* count,
- unsigned const max)
-{
- unsigned const kAccuracyLog = 8;
- size_t cost = 0;
- unsigned s;
- FSE_CState_t cstate;
- FSE_initCState(&cstate, ctable);
- if (ZSTD_getFSEMaxSymbolValue(ctable) < max) {
- DEBUGLOG(5, "Repeat FSE_CTable has maxSymbolValue %u < %u",
- ZSTD_getFSEMaxSymbolValue(ctable), max);
- return ERROR(GENERIC);
- }
- for (s = 0; s <= max; ++s) {
- unsigned const tableLog = cstate.stateLog;
- unsigned const badCost = (tableLog + 1) << kAccuracyLog;
- unsigned const bitCost = FSE_bitCost(cstate.symbolTT, tableLog, s, kAccuracyLog);
- if (count[s] == 0)
- continue;
- if (bitCost >= badCost) {
- DEBUGLOG(5, "Repeat FSE_CTable has Prob[%u] == 0", s);
- return ERROR(GENERIC);
- }
- cost += count[s] * bitCost;
- }
- return cost >> kAccuracyLog;
-}
-
-/**
- * Returns the cost in bytes of encoding the normalized count header.
- * Returns an error if any of the helper functions return an error.
- */
-static size_t ZSTD_NCountCost(unsigned const* count, unsigned const max,
- size_t const nbSeq, unsigned const FSELog)
-{
- BYTE wksp[FSE_NCOUNTBOUND];
- S16 norm[MaxSeq + 1];
- const U32 tableLog = FSE_optimalTableLog(FSELog, nbSeq, max);
- CHECK_F(FSE_normalizeCount(norm, tableLog, count, nbSeq, max));
- return FSE_writeNCount(wksp, sizeof(wksp), norm, max, tableLog);
-}
-
-
-typedef enum {
- ZSTD_defaultDisallowed = 0,
- ZSTD_defaultAllowed = 1
-} ZSTD_defaultPolicy_e;
-
-MEM_STATIC symbolEncodingType_e
-ZSTD_selectEncodingType(
- FSE_repeat* repeatMode, unsigned const* count, unsigned const max,
- size_t const mostFrequent, size_t nbSeq, unsigned const FSELog,
- FSE_CTable const* prevCTable,
- short const* defaultNorm, U32 defaultNormLog,
- ZSTD_defaultPolicy_e const isDefaultAllowed,
- ZSTD_strategy const strategy)
-{
- ZSTD_STATIC_ASSERT(ZSTD_defaultDisallowed == 0 && ZSTD_defaultAllowed != 0);
- if (mostFrequent == nbSeq) {
- *repeatMode = FSE_repeat_none;
- if (isDefaultAllowed && nbSeq <= 2) {
- /* Prefer set_basic over set_rle when there are 2 or less symbols,
- * since RLE uses 1 byte, but set_basic uses 5-6 bits per symbol.
- * If basic encoding isn't possible, always choose RLE.
- */
- DEBUGLOG(5, "Selected set_basic");
- return set_basic;
- }
- DEBUGLOG(5, "Selected set_rle");
- return set_rle;
+ switch (cctxParams->literalCompressionMode) {
+ case ZSTD_lcm_huffman:
+ return 0;
+ case ZSTD_lcm_uncompressed:
+ return 1;
+ default:
+ assert(0 /* impossible: pre-validated */);
+ /* fall-through */
+ case ZSTD_lcm_auto:
+ return (cctxParams->cParams.strategy == ZSTD_fast) && (cctxParams->cParams.targetLength > 0);
}
- if (strategy < ZSTD_lazy) {
- if (isDefaultAllowed) {
- size_t const staticFse_nbSeq_max = 1000;
- size_t const mult = 10 - strategy;
- size_t const baseLog = 3;
- size_t const dynamicFse_nbSeq_min = (((size_t)1 << defaultNormLog) * mult) >> baseLog; /* 28-36 for offset, 56-72 for lengths */
- assert(defaultNormLog >= 5 && defaultNormLog <= 6); /* xx_DEFAULTNORMLOG */
- assert(mult <= 9 && mult >= 7);
- if ( (*repeatMode == FSE_repeat_valid)
- && (nbSeq < staticFse_nbSeq_max) ) {
- DEBUGLOG(5, "Selected set_repeat");
- return set_repeat;
- }
- if ( (nbSeq < dynamicFse_nbSeq_min)
- || (mostFrequent < (nbSeq >> (defaultNormLog-1))) ) {
- DEBUGLOG(5, "Selected set_basic");
- /* The format allows default tables to be repeated, but it isn't useful.
- * When using simple heuristics to select encoding type, we don't want
- * to confuse these tables with dictionaries. When running more careful
- * analysis, we don't need to waste time checking both repeating tables
- * and default tables.
- */
- *repeatMode = FSE_repeat_none;
- return set_basic;
- }
- }
- } else {
- size_t const basicCost = isDefaultAllowed ? ZSTD_crossEntropyCost(defaultNorm, defaultNormLog, count, max) : ERROR(GENERIC);
- size_t const repeatCost = *repeatMode != FSE_repeat_none ? ZSTD_fseBitCost(prevCTable, count, max) : ERROR(GENERIC);
- size_t const NCountCost = ZSTD_NCountCost(count, max, nbSeq, FSELog);
- size_t const compressedCost = (NCountCost << 3) + ZSTD_entropyCost(count, max, nbSeq);
-
- if (isDefaultAllowed) {
- assert(!ZSTD_isError(basicCost));
- assert(!(*repeatMode == FSE_repeat_valid && ZSTD_isError(repeatCost)));
- }
- assert(!ZSTD_isError(NCountCost));
- assert(compressedCost < ERROR(maxCode));
- DEBUGLOG(5, "Estimated bit costs: basic=%u\trepeat=%u\tcompressed=%u",
- (unsigned)basicCost, (unsigned)repeatCost, (unsigned)compressedCost);
- if (basicCost <= repeatCost && basicCost <= compressedCost) {
- DEBUGLOG(5, "Selected set_basic");
- assert(isDefaultAllowed);
- *repeatMode = FSE_repeat_none;
- return set_basic;
- }
- if (repeatCost <= compressedCost) {
- DEBUGLOG(5, "Selected set_repeat");
- assert(!ZSTD_isError(repeatCost));
- return set_repeat;
- }
- assert(compressedCost < basicCost && compressedCost < repeatCost);
- }
- DEBUGLOG(5, "Selected set_compressed");
- *repeatMode = FSE_repeat_check;
- return set_compressed;
-}
-
-MEM_STATIC size_t
-ZSTD_buildCTable(void* dst, size_t dstCapacity,
- FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e type,
- unsigned* count, U32 max,
- const BYTE* codeTable, size_t nbSeq,
- const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
- const FSE_CTable* prevCTable, size_t prevCTableSize,
- void* workspace, size_t workspaceSize)
-{
- BYTE* op = (BYTE*)dst;
- const BYTE* const oend = op + dstCapacity;
- DEBUGLOG(6, "ZSTD_buildCTable (dstCapacity=%u)", (unsigned)dstCapacity);
-
- switch (type) {
- case set_rle:
- CHECK_F(FSE_buildCTable_rle(nextCTable, (BYTE)max));
- if (dstCapacity==0) return ERROR(dstSize_tooSmall);
- *op = codeTable[0];
- return 1;
- case set_repeat:
- memcpy(nextCTable, prevCTable, prevCTableSize);
- return 0;
- case set_basic:
- CHECK_F(FSE_buildCTable_wksp(nextCTable, defaultNorm, defaultMax, defaultNormLog, workspace, workspaceSize)); /* note : could be pre-calculated */
- return 0;
- case set_compressed: {
- S16 norm[MaxSeq + 1];
- size_t nbSeq_1 = nbSeq;
- const U32 tableLog = FSE_optimalTableLog(FSELog, nbSeq, max);
- if (count[codeTable[nbSeq-1]] > 1) {
- count[codeTable[nbSeq-1]]--;
- nbSeq_1--;
- }
- assert(nbSeq_1 > 1);
- CHECK_F(FSE_normalizeCount(norm, tableLog, count, nbSeq_1, max));
- { size_t const NCountSize = FSE_writeNCount(op, oend - op, norm, max, tableLog); /* overflow protected */
- if (FSE_isError(NCountSize)) return NCountSize;
- CHECK_F(FSE_buildCTable_wksp(nextCTable, norm, max, tableLog, workspace, workspaceSize));
- return NCountSize;
- }
- }
- default: return assert(0), ERROR(GENERIC);
- }
-}
-
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_encodeSequences_body(
- void* dst, size_t dstCapacity,
- FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
- FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
- FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
- seqDef const* sequences, size_t nbSeq, int longOffsets)
-{
- BIT_CStream_t blockStream;
- FSE_CState_t stateMatchLength;
- FSE_CState_t stateOffsetBits;
- FSE_CState_t stateLitLength;
-
- CHECK_E(BIT_initCStream(&blockStream, dst, dstCapacity), dstSize_tooSmall); /* not enough space remaining */
- DEBUGLOG(6, "available space for bitstream : %i (dstCapacity=%u)",
- (int)(blockStream.endPtr - blockStream.startPtr),
- (unsigned)dstCapacity);
-
- /* first symbols */
- FSE_initCState2(&stateMatchLength, CTable_MatchLength, mlCodeTable[nbSeq-1]);
- FSE_initCState2(&stateOffsetBits, CTable_OffsetBits, ofCodeTable[nbSeq-1]);
- FSE_initCState2(&stateLitLength, CTable_LitLength, llCodeTable[nbSeq-1]);
- BIT_addBits(&blockStream, sequences[nbSeq-1].litLength, LL_bits[llCodeTable[nbSeq-1]]);
- if (MEM_32bits()) BIT_flushBits(&blockStream);
- BIT_addBits(&blockStream, sequences[nbSeq-1].matchLength, ML_bits[mlCodeTable[nbSeq-1]]);
- if (MEM_32bits()) BIT_flushBits(&blockStream);
- if (longOffsets) {
- U32 const ofBits = ofCodeTable[nbSeq-1];
- int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
- if (extraBits) {
- BIT_addBits(&blockStream, sequences[nbSeq-1].offset, extraBits);
- BIT_flushBits(&blockStream);
- }
- BIT_addBits(&blockStream, sequences[nbSeq-1].offset >> extraBits,
- ofBits - extraBits);
- } else {
- BIT_addBits(&blockStream, sequences[nbSeq-1].offset, ofCodeTable[nbSeq-1]);
- }
- BIT_flushBits(&blockStream);
-
- { size_t n;
- for (n=nbSeq-2 ; n<nbSeq ; n--) { /* intentional underflow */
- BYTE const llCode = llCodeTable[n];
- BYTE const ofCode = ofCodeTable[n];
- BYTE const mlCode = mlCodeTable[n];
- U32 const llBits = LL_bits[llCode];
- U32 const ofBits = ofCode;
- U32 const mlBits = ML_bits[mlCode];
- DEBUGLOG(6, "encoding: litlen:%2u - matchlen:%2u - offCode:%7u",
- (unsigned)sequences[n].litLength,
- (unsigned)sequences[n].matchLength + MINMATCH,
- (unsigned)sequences[n].offset);
- /* 32b*/ /* 64b*/
- /* (7)*/ /* (7)*/
- FSE_encodeSymbol(&blockStream, &stateOffsetBits, ofCode); /* 15 */ /* 15 */
- FSE_encodeSymbol(&blockStream, &stateMatchLength, mlCode); /* 24 */ /* 24 */
- if (MEM_32bits()) BIT_flushBits(&blockStream); /* (7)*/
- FSE_encodeSymbol(&blockStream, &stateLitLength, llCode); /* 16 */ /* 33 */
- if (MEM_32bits() || (ofBits+mlBits+llBits >= 64-7-(LLFSELog+MLFSELog+OffFSELog)))
- BIT_flushBits(&blockStream); /* (7)*/
- BIT_addBits(&blockStream, sequences[n].litLength, llBits);
- if (MEM_32bits() && ((llBits+mlBits)>24)) BIT_flushBits(&blockStream);
- BIT_addBits(&blockStream, sequences[n].matchLength, mlBits);
- if (MEM_32bits() || (ofBits+mlBits+llBits > 56)) BIT_flushBits(&blockStream);
- if (longOffsets) {
- int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
- if (extraBits) {
- BIT_addBits(&blockStream, sequences[n].offset, extraBits);
- BIT_flushBits(&blockStream); /* (7)*/
- }
- BIT_addBits(&blockStream, sequences[n].offset >> extraBits,
- ofBits - extraBits); /* 31 */
- } else {
- BIT_addBits(&blockStream, sequences[n].offset, ofBits); /* 31 */
- }
- BIT_flushBits(&blockStream); /* (7)*/
- DEBUGLOG(7, "remaining space : %i", (int)(blockStream.endPtr - blockStream.ptr));
- } }
-
- DEBUGLOG(6, "ZSTD_encodeSequences: flushing ML state with %u bits", stateMatchLength.stateLog);
- FSE_flushCState(&blockStream, &stateMatchLength);
- DEBUGLOG(6, "ZSTD_encodeSequences: flushing Off state with %u bits", stateOffsetBits.stateLog);
- FSE_flushCState(&blockStream, &stateOffsetBits);
- DEBUGLOG(6, "ZSTD_encodeSequences: flushing LL state with %u bits", stateLitLength.stateLog);
- FSE_flushCState(&blockStream, &stateLitLength);
-
- { size_t const streamSize = BIT_closeCStream(&blockStream);
- if (streamSize==0) return ERROR(dstSize_tooSmall); /* not enough space */
- return streamSize;
- }
-}
-
-static size_t
-ZSTD_encodeSequences_default(
- void* dst, size_t dstCapacity,
- FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
- FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
- FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
- seqDef const* sequences, size_t nbSeq, int longOffsets)
-{
- return ZSTD_encodeSequences_body(dst, dstCapacity,
- CTable_MatchLength, mlCodeTable,
- CTable_OffsetBits, ofCodeTable,
- CTable_LitLength, llCodeTable,
- sequences, nbSeq, longOffsets);
-}
-
-
-#if DYNAMIC_BMI2
-
-static TARGET_ATTRIBUTE("bmi2") size_t
-ZSTD_encodeSequences_bmi2(
- void* dst, size_t dstCapacity,
- FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
- FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
- FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
- seqDef const* sequences, size_t nbSeq, int longOffsets)
-{
- return ZSTD_encodeSequences_body(dst, dstCapacity,
- CTable_MatchLength, mlCodeTable,
- CTable_OffsetBits, ofCodeTable,
- CTable_LitLength, llCodeTable,
- sequences, nbSeq, longOffsets);
-}
-
-#endif
-
-static size_t ZSTD_encodeSequences(
- void* dst, size_t dstCapacity,
- FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
- FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
- FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
- seqDef const* sequences, size_t nbSeq, int longOffsets, int bmi2)
-{
- DEBUGLOG(5, "ZSTD_encodeSequences: dstCapacity = %u", (unsigned)dstCapacity);
-#if DYNAMIC_BMI2
- if (bmi2) {
- return ZSTD_encodeSequences_bmi2(dst, dstCapacity,
- CTable_MatchLength, mlCodeTable,
- CTable_OffsetBits, ofCodeTable,
- CTable_LitLength, llCodeTable,
- sequences, nbSeq, longOffsets);
- }
-#endif
- (void)bmi2;
- return ZSTD_encodeSequences_default(dst, dstCapacity,
- CTable_MatchLength, mlCodeTable,
- CTable_OffsetBits, ofCodeTable,
- CTable_LitLength, llCodeTable,
- sequences, nbSeq, longOffsets);
}
/* ZSTD_compressSequences_internal():
@@ -2393,46 +1955,48 @@
BYTE* const ostart = (BYTE*)dst;
BYTE* const oend = ostart + dstCapacity;
BYTE* op = ostart;
- size_t const nbSeq = seqStorePtr->sequences - seqStorePtr->sequencesStart;
+ size_t const nbSeq = (size_t)(seqStorePtr->sequences - seqStorePtr->sequencesStart);
BYTE* seqHead;
BYTE* lastNCount = NULL;
+ DEBUGLOG(5, "ZSTD_compressSequences_internal (nbSeq=%zu)", nbSeq);
ZSTD_STATIC_ASSERT(HUF_WORKSPACE_SIZE >= (1<<MAX(MLFSELog,LLFSELog)));
- DEBUGLOG(5, "ZSTD_compressSequences_internal");
/* Compress literals */
{ const BYTE* const literals = seqStorePtr->litStart;
- size_t const litSize = seqStorePtr->lit - literals;
- int const disableLiteralCompression = (cctxParams->cParams.strategy == ZSTD_fast) && (cctxParams->cParams.targetLength > 0);
+ size_t const litSize = (size_t)(seqStorePtr->lit - literals);
size_t const cSize = ZSTD_compressLiterals(
&prevEntropy->huf, &nextEntropy->huf,
- cctxParams->cParams.strategy, disableLiteralCompression,
+ cctxParams->cParams.strategy,
+ ZSTD_disableLiteralsCompression(cctxParams),
op, dstCapacity,
literals, litSize,
workspace, wkspSize,
bmi2);
- if (ZSTD_isError(cSize))
- return cSize;
+ FORWARD_IF_ERROR(cSize);
assert(cSize <= dstCapacity);
op += cSize;
}
/* Sequences Header */
- if ((oend-op) < 3 /*max nbSeq Size*/ + 1 /*seqHead*/) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF((oend-op) < 3 /*max nbSeq Size*/ + 1 /*seqHead*/,
+ dstSize_tooSmall);
if (nbSeq < 0x7F)
*op++ = (BYTE)nbSeq;
else if (nbSeq < LONGNBSEQ)
op[0] = (BYTE)((nbSeq>>8) + 0x80), op[1] = (BYTE)nbSeq, op+=2;
else
op[0]=0xFF, MEM_writeLE16(op+1, (U16)(nbSeq - LONGNBSEQ)), op+=3;
+ assert(op <= oend);
if (nbSeq==0) {
/* Copy the old tables over as if we repeated them */
memcpy(&nextEntropy->fse, &prevEntropy->fse, sizeof(prevEntropy->fse));
- return op - ostart;
+ return (size_t)(op - ostart);
}
/* seqHead : flags for FSE encoding type */
seqHead = op++;
+ assert(op <= oend);
/* convert length/distances into codes */
ZSTD_seqToCodes(seqStorePtr);
@@ -2448,14 +2012,15 @@
ZSTD_defaultAllowed, strategy);
assert(set_basic < set_compressed && set_rle < set_compressed);
assert(!(LLtype < set_compressed && nextEntropy->fse.litlength_repeatMode != FSE_repeat_none)); /* We don't copy tables */
- { size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_LitLength, LLFSELog, (symbolEncodingType_e)LLtype,
+ { size_t const countSize = ZSTD_buildCTable(op, (size_t)(oend - op), CTable_LitLength, LLFSELog, (symbolEncodingType_e)LLtype,
count, max, llCodeTable, nbSeq, LL_defaultNorm, LL_defaultNormLog, MaxLL,
prevEntropy->fse.litlengthCTable, sizeof(prevEntropy->fse.litlengthCTable),
workspace, wkspSize);
- if (ZSTD_isError(countSize)) return countSize;
+ FORWARD_IF_ERROR(countSize);
if (LLtype == set_compressed)
lastNCount = op;
op += countSize;
+ assert(op <= oend);
} }
/* build CTable for Offsets */
{ unsigned max = MaxOff;
@@ -2470,14 +2035,15 @@
OF_defaultNorm, OF_defaultNormLog,
defaultPolicy, strategy);
assert(!(Offtype < set_compressed && nextEntropy->fse.offcode_repeatMode != FSE_repeat_none)); /* We don't copy tables */
- { size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_OffsetBits, OffFSELog, (symbolEncodingType_e)Offtype,
+ { size_t const countSize = ZSTD_buildCTable(op, (size_t)(oend - op), CTable_OffsetBits, OffFSELog, (symbolEncodingType_e)Offtype,
count, max, ofCodeTable, nbSeq, OF_defaultNorm, OF_defaultNormLog, DefaultMaxOff,
prevEntropy->fse.offcodeCTable, sizeof(prevEntropy->fse.offcodeCTable),
workspace, wkspSize);
- if (ZSTD_isError(countSize)) return countSize;
+ FORWARD_IF_ERROR(countSize);
if (Offtype == set_compressed)
lastNCount = op;
op += countSize;
+ assert(op <= oend);
} }
/* build CTable for MatchLengths */
{ unsigned max = MaxML;
@@ -2490,29 +2056,31 @@
ML_defaultNorm, ML_defaultNormLog,
ZSTD_defaultAllowed, strategy);
assert(!(MLtype < set_compressed && nextEntropy->fse.matchlength_repeatMode != FSE_repeat_none)); /* We don't copy tables */
- { size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_MatchLength, MLFSELog, (symbolEncodingType_e)MLtype,
+ { size_t const countSize = ZSTD_buildCTable(op, (size_t)(oend - op), CTable_MatchLength, MLFSELog, (symbolEncodingType_e)MLtype,
count, max, mlCodeTable, nbSeq, ML_defaultNorm, ML_defaultNormLog, MaxML,
prevEntropy->fse.matchlengthCTable, sizeof(prevEntropy->fse.matchlengthCTable),
workspace, wkspSize);
- if (ZSTD_isError(countSize)) return countSize;
+ FORWARD_IF_ERROR(countSize);
if (MLtype == set_compressed)
lastNCount = op;
op += countSize;
+ assert(op <= oend);
} }
*seqHead = (BYTE)((LLtype<<6) + (Offtype<<4) + (MLtype<<2));
{ size_t const bitstreamSize = ZSTD_encodeSequences(
- op, oend - op,
+ op, (size_t)(oend - op),
CTable_MatchLength, mlCodeTable,
CTable_OffsetBits, ofCodeTable,
CTable_LitLength, llCodeTable,
sequences, nbSeq,
longOffsets, bmi2);
- if (ZSTD_isError(bitstreamSize)) return bitstreamSize;
+ FORWARD_IF_ERROR(bitstreamSize);
op += bitstreamSize;
+ assert(op <= oend);
/* zstd versions <= 1.3.4 mistakenly report corruption when
- * FSE_readNCount() recieves a buffer < 4 bytes.
+ * FSE_readNCount() receives a buffer < 4 bytes.
* Fixed by https://github.com/facebook/zstd/pull/1146.
* This can happen when the last set_compressed table present is 2
* bytes and the bitstream is only one byte.
@@ -2529,7 +2097,7 @@
}
DEBUGLOG(5, "compressed block size : %u", (unsigned)(op - ostart));
- return op - ostart;
+ return (size_t)(op - ostart);
}
MEM_STATIC size_t
@@ -2552,7 +2120,7 @@
*/
if ((cSize == ERROR(dstSize_tooSmall)) & (srcSize <= dstCapacity))
return 0; /* block not compressed */
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
/* Check compressibility */
{ size_t const maxCSize = srcSize - ZSTD_minGain(srcSize, cctxParams->cParams.strategy);
@@ -2622,27 +2190,24 @@
ssPtr->longLengthID = 0;
}
-static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize)
+typedef enum { ZSTDbss_compress, ZSTDbss_noCompress } ZSTD_buildSeqStore_e;
+
+static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, const void* src, size_t srcSize)
{
ZSTD_matchState_t* const ms = &zc->blockState.matchState;
- size_t cSize;
- DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
- (unsigned)dstCapacity, (unsigned)ms->window.dictLimit, (unsigned)ms->nextToUpdate);
+ DEBUGLOG(5, "ZSTD_buildSeqStore (srcSize=%zu)", srcSize);
assert(srcSize <= ZSTD_BLOCKSIZE_MAX);
-
/* Assert that we have correctly flushed the ctx params into the ms's copy */
ZSTD_assertEqualCParams(zc->appliedParams.cParams, ms->cParams);
-
if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) {
ZSTD_ldm_skipSequences(&zc->externSeqStore, srcSize, zc->appliedParams.cParams.minMatch);
- cSize = 0;
- goto out; /* don't even attempt compression below a certain srcSize */
+ return ZSTDbss_noCompress; /* don't even attempt compression below a certain srcSize */
}
ZSTD_resetSeqStore(&(zc->seqStore));
- ms->opt.symbolCosts = &zc->blockState.prevCBlock->entropy; /* required for optimal parser to read stats from dictionary */
-
+ /* required for optimal parser to read stats from dictionary */
+ ms->opt.symbolCosts = &zc->blockState.prevCBlock->entropy;
+ /* tell the optimal parser how we expect to compress literals */
+ ms->opt.literalCompressionMode = zc->appliedParams.literalCompressionMode;
/* a gap between an attached dict and the current window is not safe,
* they must remain adjacent,
* and when that stops being the case, the dict must be unset */
@@ -2679,7 +2244,7 @@
ldmSeqStore.seq = zc->ldmSequences;
ldmSeqStore.capacity = zc->maxNbLdmSequences;
/* Updates ldmSeqStore.size */
- CHECK_F(ZSTD_ldm_generateSequences(&zc->ldmState, &ldmSeqStore,
+ FORWARD_IF_ERROR(ZSTD_ldm_generateSequences(&zc->ldmState, &ldmSeqStore,
&zc->appliedParams.ldmParams,
src, srcSize));
/* Updates ldmSeqStore.pos */
@@ -2696,6 +2261,22 @@
{ const BYTE* const lastLiterals = (const BYTE*)src + srcSize - lastLLSize;
ZSTD_storeLastLiterals(&zc->seqStore, lastLiterals, lastLLSize);
} }
+ return ZSTDbss_compress;
+}
+
+static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize)
+{
+ size_t cSize;
+ DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
+ (unsigned)dstCapacity, (unsigned)zc->blockState.matchState.window.dictLimit,
+ (unsigned)zc->blockState.matchState.nextToUpdate);
+
+ { const size_t bss = ZSTD_buildSeqStore(zc, src, srcSize);
+ FORWARD_IF_ERROR(bss);
+ if (bss == ZSTDbss_noCompress) { cSize = 0; goto out; }
+ }
/* encode sequences and literals */
cSize = ZSTD_compressSequences(&zc->seqStore,
@@ -2724,6 +2305,25 @@
}
+static void ZSTD_overflowCorrectIfNeeded(ZSTD_matchState_t* ms, ZSTD_CCtx_params const* params, void const* ip, void const* iend)
+{
+ if (ZSTD_window_needOverflowCorrection(ms->window, iend)) {
+ U32 const maxDist = (U32)1 << params->cParams.windowLog;
+ U32 const cycleLog = ZSTD_cycleLog(params->cParams.chainLog, params->cParams.strategy);
+ U32 const correction = ZSTD_window_correctOverflow(&ms->window, cycleLog, maxDist, ip);
+ ZSTD_STATIC_ASSERT(ZSTD_CHAINLOG_MAX <= 30);
+ ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX_32 <= 30);
+ ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <= 31);
+ ZSTD_reduceIndex(ms, params, correction);
+ if (ms->nextToUpdate < correction) ms->nextToUpdate = 0;
+ else ms->nextToUpdate -= correction;
+ /* invalidate dictionaries on overflow correction */
+ ms->loadedDictEnd = 0;
+ ms->dictMatchState = NULL;
+ }
+}
+
+
/*! ZSTD_compress_frameChunk() :
* Compress a chunk of data into one or multiple blocks.
* All blocks will be terminated, all input will be consumed.
@@ -2742,7 +2342,7 @@
BYTE* const ostart = (BYTE*)dst;
BYTE* op = ostart;
U32 const maxDist = (U32)1 << cctx->appliedParams.cParams.windowLog;
- assert(cctx->appliedParams.cParams.windowLog <= 31);
+ assert(cctx->appliedParams.cParams.windowLog <= ZSTD_WINDOWLOG_MAX);
DEBUGLOG(5, "ZSTD_compress_frameChunk (blockSize=%u)", (unsigned)blockSize);
if (cctx->appliedParams.fParams.checksumFlag && srcSize)
@@ -2752,33 +2352,25 @@
ZSTD_matchState_t* const ms = &cctx->blockState.matchState;
U32 const lastBlock = lastFrameChunk & (blockSize >= remaining);
- if (dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SIZE)
- return ERROR(dstSize_tooSmall); /* not enough space to store compressed block */
+ RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SIZE,
+ dstSize_tooSmall,
+ "not enough space to store compressed block");
if (remaining < blockSize) blockSize = remaining;
- if (ZSTD_window_needOverflowCorrection(ms->window, ip + blockSize)) {
- U32 const cycleLog = ZSTD_cycleLog(cctx->appliedParams.cParams.chainLog, cctx->appliedParams.cParams.strategy);
- U32 const correction = ZSTD_window_correctOverflow(&ms->window, cycleLog, maxDist, ip);
- ZSTD_STATIC_ASSERT(ZSTD_CHAINLOG_MAX <= 30);
- ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX_32 <= 30);
- ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <= 31);
- ZSTD_reduceIndex(cctx, correction);
- if (ms->nextToUpdate < correction) ms->nextToUpdate = 0;
- else ms->nextToUpdate -= correction;
- ms->loadedDictEnd = 0;
- ms->dictMatchState = NULL;
- }
- ZSTD_window_enforceMaxDist(&ms->window, ip + blockSize, maxDist, &ms->loadedDictEnd, &ms->dictMatchState);
+ ZSTD_overflowCorrectIfNeeded(ms, &cctx->appliedParams, ip, ip + blockSize);
+ ZSTD_checkDictValidity(&ms->window, ip + blockSize, maxDist, &ms->loadedDictEnd, &ms->dictMatchState);
+
+ /* Ensure hash/chain table insertion resumes no sooner than lowlimit */
if (ms->nextToUpdate < ms->window.lowLimit) ms->nextToUpdate = ms->window.lowLimit;
{ size_t cSize = ZSTD_compressBlock_internal(cctx,
op+ZSTD_blockHeaderSize, dstCapacity-ZSTD_blockHeaderSize,
ip, blockSize);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
if (cSize == 0) { /* block is not compressible */
cSize = ZSTD_noCompressBlock(op, dstCapacity, ip, blockSize, lastBlock);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
} else {
U32 const cBlockHeader24 = lastBlock + (((U32)bt_compressed)<<1) + (U32)(cSize << 3);
MEM_writeLE24(op, cBlockHeader24);
@@ -2796,7 +2388,7 @@
} }
if (lastFrameChunk && (op>ostart)) cctx->stage = ZSTDcs_ending;
- return op-ostart;
+ return (size_t)(op-ostart);
}
@@ -2811,11 +2403,11 @@
BYTE const windowLogByte = (BYTE)((params.cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
U32 const fcsCode = params.fParams.contentSizeFlag ?
(pledgedSrcSize>=256) + (pledgedSrcSize>=65536+256) + (pledgedSrcSize>=0xFFFFFFFFU) : 0; /* 0-3 */
- BYTE const frameHeaderDecriptionByte = (BYTE)(dictIDSizeCode + (checksumFlag<<2) + (singleSegment<<5) + (fcsCode<<6) );
+ BYTE const frameHeaderDescriptionByte = (BYTE)(dictIDSizeCode + (checksumFlag<<2) + (singleSegment<<5) + (fcsCode<<6) );
size_t pos=0;
assert(!(params.fParams.contentSizeFlag && pledgedSrcSize == ZSTD_CONTENTSIZE_UNKNOWN));
- if (dstCapacity < ZSTD_FRAMEHEADERSIZE_MAX) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(dstCapacity < ZSTD_FRAMEHEADERSIZE_MAX, dstSize_tooSmall);
DEBUGLOG(4, "ZSTD_writeFrameHeader : dictIDFlag : %u ; dictID : %u ; dictIDSizeCode : %u",
!params.fParams.noDictIDFlag, (unsigned)dictID, (unsigned)dictIDSizeCode);
@@ -2823,7 +2415,7 @@
MEM_writeLE32(dst, ZSTD_MAGICNUMBER);
pos = 4;
}
- op[pos++] = frameHeaderDecriptionByte;
+ op[pos++] = frameHeaderDescriptionByte;
if (!singleSegment) op[pos++] = windowLogByte;
switch(dictIDSizeCode)
{
@@ -2847,11 +2439,11 @@
/* ZSTD_writeLastEmptyBlock() :
* output an empty Block with end-of-frame mark to complete a frame
* @return : size of data written into `dst` (== ZSTD_blockHeaderSize (defined in zstd_internal.h))
- * or an error code if `dstCapcity` is too small (<ZSTD_blockHeaderSize)
+ * or an error code if `dstCapacity` is too small (<ZSTD_blockHeaderSize)
*/
size_t ZSTD_writeLastEmptyBlock(void* dst, size_t dstCapacity)
{
- if (dstCapacity < ZSTD_blockHeaderSize) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSmall);
{ U32 const cBlockHeader24 = 1 /*lastBlock*/ + (((U32)bt_raw)<<1); /* 0 size */
MEM_writeLE24(dst, cBlockHeader24);
return ZSTD_blockHeaderSize;
@@ -2860,10 +2452,9 @@
size_t ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_t nbSeq)
{
- if (cctx->stage != ZSTDcs_init)
- return ERROR(stage_wrong);
- if (cctx->appliedParams.ldmParams.enableLdm)
- return ERROR(parameter_unsupported);
+ RETURN_ERROR_IF(cctx->stage != ZSTDcs_init, stage_wrong);
+ RETURN_ERROR_IF(cctx->appliedParams.ldmParams.enableLdm,
+ parameter_unsupported);
cctx->externSeqStore.seq = seq;
cctx->externSeqStore.size = nbSeq;
cctx->externSeqStore.capacity = nbSeq;
@@ -2882,12 +2473,14 @@
DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u",
cctx->stage, (unsigned)srcSize);
- if (cctx->stage==ZSTDcs_created) return ERROR(stage_wrong); /* missing init (ZSTD_compressBegin) */
+ RETURN_ERROR_IF(cctx->stage==ZSTDcs_created, stage_wrong,
+ "missing init (ZSTD_compressBegin)");
if (frame && (cctx->stage==ZSTDcs_init)) {
fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->appliedParams,
cctx->pledgedSrcSizePlusOne-1, cctx->dictID);
- if (ZSTD_isError(fhSize)) return fhSize;
+ FORWARD_IF_ERROR(fhSize);
+ assert(fhSize <= dstCapacity);
dstCapacity -= fhSize;
dst = (char*)dst + fhSize;
cctx->stage = ZSTDcs_ongoing;
@@ -2904,35 +2497,25 @@
if (!frame) {
/* overflow check and correction for block mode */
- if (ZSTD_window_needOverflowCorrection(ms->window, (const char*)src + srcSize)) {
- U32 const cycleLog = ZSTD_cycleLog(cctx->appliedParams.cParams.chainLog, cctx->appliedParams.cParams.strategy);
- U32 const correction = ZSTD_window_correctOverflow(&ms->window, cycleLog, 1 << cctx->appliedParams.cParams.windowLog, src);
- ZSTD_STATIC_ASSERT(ZSTD_CHAINLOG_MAX <= 30);
- ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX_32 <= 30);
- ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <= 31);
- ZSTD_reduceIndex(cctx, correction);
- if (ms->nextToUpdate < correction) ms->nextToUpdate = 0;
- else ms->nextToUpdate -= correction;
- ms->loadedDictEnd = 0;
- ms->dictMatchState = NULL;
- }
+ ZSTD_overflowCorrectIfNeeded(ms, &cctx->appliedParams, src, (BYTE const*)src + srcSize);
}
DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=%u)", (unsigned)cctx->blockSize);
{ size_t const cSize = frame ?
ZSTD_compress_frameChunk (cctx, dst, dstCapacity, src, srcSize, lastFrameChunk) :
ZSTD_compressBlock_internal (cctx, dst, dstCapacity, src, srcSize);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
cctx->consumedSrcSize += srcSize;
cctx->producedCSize += (cSize + fhSize);
assert(!(cctx->appliedParams.fParams.contentSizeFlag && cctx->pledgedSrcSizePlusOne == 0));
if (cctx->pledgedSrcSizePlusOne != 0) { /* control src size */
ZSTD_STATIC_ASSERT(ZSTD_CONTENTSIZE_UNKNOWN == (unsigned long long)-1);
- if (cctx->consumedSrcSize+1 > cctx->pledgedSrcSizePlusOne) {
- DEBUGLOG(4, "error : pledgedSrcSize = %u, while realSrcSize >= %u",
- (unsigned)cctx->pledgedSrcSizePlusOne-1, (unsigned)cctx->consumedSrcSize);
- return ERROR(srcSize_wrong);
- }
+ RETURN_ERROR_IF(
+ cctx->consumedSrcSize+1 > cctx->pledgedSrcSizePlusOne,
+ srcSize_wrong,
+ "error : pledgedSrcSize = %u, while realSrcSize >= %u",
+ (unsigned)cctx->pledgedSrcSizePlusOne-1,
+ (unsigned)cctx->consumedSrcSize);
}
return cSize + fhSize;
}
@@ -2956,8 +2539,9 @@
size_t ZSTD_compressBlock(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
{
- size_t const blockSizeMax = ZSTD_getBlockSize(cctx);
- if (srcSize > blockSizeMax) return ERROR(srcSize_wrong);
+ DEBUGLOG(5, "ZSTD_compressBlock: srcSize = %u", (unsigned)srcSize);
+ { size_t const blockSizeMax = ZSTD_getBlockSize(cctx);
+ RETURN_ERROR_IF(srcSize > blockSizeMax, srcSize_wrong); }
return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, srcSize, 0 /* frame mode */, 0 /* last chunk */);
}
@@ -2970,7 +2554,7 @@
const void* src, size_t srcSize,
ZSTD_dictTableLoadMethod_e dtlm)
{
- const BYTE* const ip = (const BYTE*) src;
+ const BYTE* ip = (const BYTE*) src;
const BYTE* const iend = ip + srcSize;
ZSTD_window_update(&ms->window, src, srcSize);
@@ -2981,32 +2565,42 @@
if (srcSize <= HASH_READ_SIZE) return 0;
- switch(params->cParams.strategy)
- {
- case ZSTD_fast:
- ZSTD_fillHashTable(ms, iend, dtlm);
- break;
- case ZSTD_dfast:
- ZSTD_fillDoubleHashTable(ms, iend, dtlm);
- break;
-
- case ZSTD_greedy:
- case ZSTD_lazy:
- case ZSTD_lazy2:
- if (srcSize >= HASH_READ_SIZE)
- ZSTD_insertAndFindFirstIndex(ms, iend-HASH_READ_SIZE);
- break;
-
- case ZSTD_btlazy2: /* we want the dictionary table fully sorted */
- case ZSTD_btopt:
- case ZSTD_btultra:
- case ZSTD_btultra2:
- if (srcSize >= HASH_READ_SIZE)
- ZSTD_updateTree(ms, iend-HASH_READ_SIZE, iend);
- break;
-
- default:
- assert(0); /* not possible : not a valid strategy id */
+ while (iend - ip > HASH_READ_SIZE) {
+ size_t const remaining = (size_t)(iend - ip);
+ size_t const chunk = MIN(remaining, ZSTD_CHUNKSIZE_MAX);
+ const BYTE* const ichunk = ip + chunk;
+
+ ZSTD_overflowCorrectIfNeeded(ms, params, ip, ichunk);
+
+ switch(params->cParams.strategy)
+ {
+ case ZSTD_fast:
+ ZSTD_fillHashTable(ms, ichunk, dtlm);
+ break;
+ case ZSTD_dfast:
+ ZSTD_fillDoubleHashTable(ms, ichunk, dtlm);
+ break;
+
+ case ZSTD_greedy:
+ case ZSTD_lazy:
+ case ZSTD_lazy2:
+ if (chunk >= HASH_READ_SIZE)
+ ZSTD_insertAndFindFirstIndex(ms, ichunk-HASH_READ_SIZE);
+ break;
+
+ case ZSTD_btlazy2: /* we want the dictionary table fully sorted */
+ case ZSTD_btopt:
+ case ZSTD_btultra:
+ case ZSTD_btultra2:
+ if (chunk >= HASH_READ_SIZE)
+ ZSTD_updateTree(ms, ichunk-HASH_READ_SIZE, ichunk);
+ break;
+
+ default:
+ assert(0); /* not possible : not a valid strategy id */
+ }
+
+ ip = ichunk;
}
ms->nextToUpdate = (U32)(iend - ms->window.base);
@@ -3020,9 +2614,9 @@
NOTE: This behavior is not standard and could be improved in the future. */
static size_t ZSTD_checkDictNCount(short* normalizedCounter, unsigned dictMaxSymbolValue, unsigned maxSymbolValue) {
U32 s;
- if (dictMaxSymbolValue < maxSymbolValue) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(dictMaxSymbolValue < maxSymbolValue, dictionary_corrupted);
for (s = 0; s <= maxSymbolValue; ++s) {
- if (normalizedCounter[s] == 0) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(normalizedCounter[s] == 0, dictionary_corrupted);
}
return 0;
}
@@ -3060,53 +2654,56 @@
{ unsigned maxSymbolValue = 255;
size_t const hufHeaderSize = HUF_readCTable((HUF_CElt*)bs->entropy.huf.CTable, &maxSymbolValue, dictPtr, dictEnd-dictPtr);
- if (HUF_isError(hufHeaderSize)) return ERROR(dictionary_corrupted);
- if (maxSymbolValue < 255) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(HUF_isError(hufHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(maxSymbolValue < 255, dictionary_corrupted);
dictPtr += hufHeaderSize;
}
{ unsigned offcodeLog;
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
- if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(offcodeHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(offcodeLog > OffFSELog, dictionary_corrupted);
/* Defer checking offcodeMaxValue because we need to know the size of the dictionary content */
/* fill all offset symbols to avoid garbage at end of table */
- CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.offcodeCTable,
- offcodeNCount, MaxOff, offcodeLog,
- workspace, HUF_WORKSPACE_SIZE),
- dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp(
+ bs->entropy.fse.offcodeCTable,
+ offcodeNCount, MaxOff, offcodeLog,
+ workspace, HUF_WORKSPACE_SIZE)),
+ dictionary_corrupted);
dictPtr += offcodeHeaderSize;
}
{ short matchlengthNCount[MaxML+1];
unsigned matchlengthMaxValue = MaxML, matchlengthLog;
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted);
- if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(matchlengthHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(matchlengthLog > MLFSELog, dictionary_corrupted);
/* Every match length code must have non-zero probability */
- CHECK_F( ZSTD_checkDictNCount(matchlengthNCount, matchlengthMaxValue, MaxML));
- CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.matchlengthCTable,
- matchlengthNCount, matchlengthMaxValue, matchlengthLog,
- workspace, HUF_WORKSPACE_SIZE),
- dictionary_corrupted);
+ FORWARD_IF_ERROR( ZSTD_checkDictNCount(matchlengthNCount, matchlengthMaxValue, MaxML));
+ RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp(
+ bs->entropy.fse.matchlengthCTable,
+ matchlengthNCount, matchlengthMaxValue, matchlengthLog,
+ workspace, HUF_WORKSPACE_SIZE)),
+ dictionary_corrupted);
dictPtr += matchlengthHeaderSize;
}
{ short litlengthNCount[MaxLL+1];
unsigned litlengthMaxValue = MaxLL, litlengthLog;
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted);
- if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(litlengthHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(litlengthLog > LLFSELog, dictionary_corrupted);
/* Every literal length code must have non-zero probability */
- CHECK_F( ZSTD_checkDictNCount(litlengthNCount, litlengthMaxValue, MaxLL));
- CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.litlengthCTable,
- litlengthNCount, litlengthMaxValue, litlengthLog,
- workspace, HUF_WORKSPACE_SIZE),
- dictionary_corrupted);
+ FORWARD_IF_ERROR( ZSTD_checkDictNCount(litlengthNCount, litlengthMaxValue, MaxLL));
+ RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp(
+ bs->entropy.fse.litlengthCTable,
+ litlengthNCount, litlengthMaxValue, litlengthLog,
+ workspace, HUF_WORKSPACE_SIZE)),
+ dictionary_corrupted);
dictPtr += litlengthHeaderSize;
}
- if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(dictPtr+12 > dictEnd, dictionary_corrupted);
bs->rep[0] = MEM_readLE32(dictPtr+0);
bs->rep[1] = MEM_readLE32(dictPtr+4);
bs->rep[2] = MEM_readLE32(dictPtr+8);
@@ -3119,19 +2716,19 @@
offcodeMax = ZSTD_highbit32(maxOffset); /* Calculate minimum offset code required to represent maxOffset */
}
/* All offset values <= dictContentSize + 128 KB must be representable */
- CHECK_F (ZSTD_checkDictNCount(offcodeNCount, offcodeMaxValue, MIN(offcodeMax, MaxOff)));
+ FORWARD_IF_ERROR(ZSTD_checkDictNCount(offcodeNCount, offcodeMaxValue, MIN(offcodeMax, MaxOff)));
/* All repCodes must be <= dictContentSize and != 0*/
{ U32 u;
for (u=0; u<3; u++) {
- if (bs->rep[u] == 0) return ERROR(dictionary_corrupted);
- if (bs->rep[u] > dictContentSize) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(bs->rep[u] == 0, dictionary_corrupted);
+ RETURN_ERROR_IF(bs->rep[u] > dictContentSize, dictionary_corrupted);
} }
bs->entropy.huf.repeatMode = HUF_repeat_valid;
bs->entropy.fse.offcode_repeatMode = FSE_repeat_valid;
bs->entropy.fse.matchlength_repeatMode = FSE_repeat_valid;
bs->entropy.fse.litlength_repeatMode = FSE_repeat_valid;
- CHECK_F(ZSTD_loadDictionaryContent(ms, params, dictPtr, dictContentSize, dtlm));
+ FORWARD_IF_ERROR(ZSTD_loadDictionaryContent(ms, params, dictPtr, dictContentSize, dtlm));
return dictID;
}
}
@@ -3161,8 +2758,7 @@
DEBUGLOG(4, "raw content dictionary detected");
return ZSTD_loadDictionaryContent(ms, params, dict, dictSize, dtlm);
}
- if (dictContentType == ZSTD_dct_fullDict)
- return ERROR(dictionary_wrong);
+ RETURN_ERROR_IF(dictContentType == ZSTD_dct_fullDict, dictionary_wrong);
assert(0); /* impossible */
}
@@ -3189,14 +2785,13 @@
return ZSTD_resetCCtx_usingCDict(cctx, cdict, params, pledgedSrcSize, zbuff);
}
- CHECK_F( ZSTD_resetCCtx_internal(cctx, params, pledgedSrcSize,
+ FORWARD_IF_ERROR( ZSTD_resetCCtx_internal(cctx, params, pledgedSrcSize,
ZSTDcrp_continue, zbuff) );
- {
- size_t const dictID = ZSTD_compress_insertDictionary(
+ { size_t const dictID = ZSTD_compress_insertDictionary(
cctx->blockState.prevCBlock, &cctx->blockState.matchState,
¶ms, dict, dictSize, dictContentType, dtlm, cctx->entropyWorkspace);
- if (ZSTD_isError(dictID)) return dictID;
- assert(dictID <= (size_t)(U32)-1);
+ FORWARD_IF_ERROR(dictID);
+ assert(dictID <= UINT_MAX);
cctx->dictID = (U32)dictID;
}
return 0;
@@ -3212,7 +2807,7 @@
{
DEBUGLOG(4, "ZSTD_compressBegin_advanced_internal: wlog=%u", params.cParams.windowLog);
/* compression parameters verification and optimization */
- CHECK_F( ZSTD_checkCParams(params.cParams) );
+ FORWARD_IF_ERROR( ZSTD_checkCParams(params.cParams) );
return ZSTD_compressBegin_internal(cctx,
dict, dictSize, dictContentType, dtlm,
cdict,
@@ -3260,12 +2855,12 @@
size_t fhSize = 0;
DEBUGLOG(4, "ZSTD_writeEpilogue");
- if (cctx->stage == ZSTDcs_created) return ERROR(stage_wrong); /* init missing */
+ RETURN_ERROR_IF(cctx->stage == ZSTDcs_created, stage_wrong, "init missing");
/* special case : empty frame */
if (cctx->stage == ZSTDcs_init) {
fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->appliedParams, 0, 0);
- if (ZSTD_isError(fhSize)) return fhSize;
+ FORWARD_IF_ERROR(fhSize);
dstCapacity -= fhSize;
op += fhSize;
cctx->stage = ZSTDcs_ongoing;
@@ -3274,7 +2869,7 @@
if (cctx->stage != ZSTDcs_ending) {
/* write one last empty block, make it the "last" block */
U32 const cBlockHeader24 = 1 /* last block */ + (((U32)bt_raw)<<1) + 0;
- if (dstCapacity<4) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall);
MEM_writeLE32(op, cBlockHeader24);
op += ZSTD_blockHeaderSize;
dstCapacity -= ZSTD_blockHeaderSize;
@@ -3282,7 +2877,7 @@
if (cctx->appliedParams.fParams.checksumFlag) {
U32 const checksum = (U32) XXH64_digest(&cctx->xxhState);
- if (dstCapacity<4) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall);
DEBUGLOG(4, "ZSTD_writeEpilogue: write checksum : %08X", (unsigned)checksum);
MEM_writeLE32(op, checksum);
op += 4;
@@ -3300,18 +2895,20 @@
size_t const cSize = ZSTD_compressContinue_internal(cctx,
dst, dstCapacity, src, srcSize,
1 /* frame mode */, 1 /* last chunk */);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
endResult = ZSTD_writeEpilogue(cctx, (char*)dst + cSize, dstCapacity-cSize);
- if (ZSTD_isError(endResult)) return endResult;
+ FORWARD_IF_ERROR(endResult);
assert(!(cctx->appliedParams.fParams.contentSizeFlag && cctx->pledgedSrcSizePlusOne == 0));
if (cctx->pledgedSrcSizePlusOne != 0) { /* control src size */
ZSTD_STATIC_ASSERT(ZSTD_CONTENTSIZE_UNKNOWN == (unsigned long long)-1);
DEBUGLOG(4, "end of frame : controlling src size");
- if (cctx->pledgedSrcSizePlusOne != cctx->consumedSrcSize+1) {
- DEBUGLOG(4, "error : pledgedSrcSize = %u, while realSrcSize = %u",
- (unsigned)cctx->pledgedSrcSizePlusOne-1, (unsigned)cctx->consumedSrcSize);
- return ERROR(srcSize_wrong);
- } }
+ RETURN_ERROR_IF(
+ cctx->pledgedSrcSizePlusOne != cctx->consumedSrcSize+1,
+ srcSize_wrong,
+ "error : pledgedSrcSize = %u, while realSrcSize = %u",
+ (unsigned)cctx->pledgedSrcSizePlusOne-1,
+ (unsigned)cctx->consumedSrcSize);
+ }
return cSize + endResult;
}
@@ -3339,7 +2936,7 @@
ZSTD_parameters params)
{
DEBUGLOG(4, "ZSTD_compress_advanced");
- CHECK_F(ZSTD_checkCParams(params.cParams));
+ FORWARD_IF_ERROR(ZSTD_checkCParams(params.cParams));
return ZSTD_compress_internal(cctx,
dst, dstCapacity,
src, srcSize,
@@ -3356,7 +2953,7 @@
ZSTD_CCtx_params params)
{
DEBUGLOG(4, "ZSTD_compress_advanced_internal (srcSize:%u)", (unsigned)srcSize);
- CHECK_F( ZSTD_compressBegin_internal(cctx,
+ FORWARD_IF_ERROR( ZSTD_compressBegin_internal(cctx,
dict, dictSize, ZSTD_dct_auto, ZSTD_dtlm_fast, NULL,
params, srcSize, ZSTDb_not_buffered) );
return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
@@ -3440,17 +3037,17 @@
void* const internalBuffer = ZSTD_malloc(dictSize, cdict->customMem);
cdict->dictBuffer = internalBuffer;
cdict->dictContent = internalBuffer;
- if (!internalBuffer) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(!internalBuffer, memory_allocation);
memcpy(internalBuffer, dictBuffer, dictSize);
}
cdict->dictContentSize = dictSize;
/* Reset the state to no dictionary */
ZSTD_reset_compressedBlockState(&cdict->cBlockState);
- { void* const end = ZSTD_reset_matchState(
- &cdict->matchState,
- (U32*)cdict->workspace + HUF_WORKSPACE_SIZE_U32,
- &cParams, ZSTDcrp_continue, /* forCCtx */ 0);
+ { void* const end = ZSTD_reset_matchState(&cdict->matchState,
+ (U32*)cdict->workspace + HUF_WORKSPACE_SIZE_U32,
+ &cParams,
+ ZSTDcrp_continue, ZSTD_resetTarget_CDict);
assert(end == (char*)cdict->workspace + cdict->workspaceSize);
(void)end;
}
@@ -3466,7 +3063,7 @@
&cdict->cBlockState, &cdict->matchState, ¶ms,
cdict->dictContent, cdict->dictContentSize,
dictContentType, ZSTD_dtlm_full, cdict->workspace);
- if (ZSTD_isError(dictID)) return dictID;
+ FORWARD_IF_ERROR(dictID);
assert(dictID <= (size_t)(U32)-1);
cdict->dictID = (U32)dictID;
}
@@ -3596,7 +3193,7 @@
ZSTD_frameParameters const fParams, unsigned long long const pledgedSrcSize)
{
DEBUGLOG(4, "ZSTD_compressBegin_usingCDict_advanced");
- if (cdict==NULL) return ERROR(dictionary_wrong);
+ RETURN_ERROR_IF(cdict==NULL, dictionary_wrong);
{ ZSTD_CCtx_params params = cctx->requestedParams;
params.cParams = ZSTD_getCParamsFromCDict(cdict);
/* Increase window log to fit the entire dictionary and source if the
@@ -3632,7 +3229,7 @@
const void* src, size_t srcSize,
const ZSTD_CDict* cdict, ZSTD_frameParameters fParams)
{
- CHECK_F (ZSTD_compressBegin_usingCDict_advanced(cctx, cdict, fParams, srcSize)); /* will check if cdict != NULL */
+ FORWARD_IF_ERROR(ZSTD_compressBegin_usingCDict_advanced(cctx, cdict, fParams, srcSize)); /* will check if cdict != NULL */
return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
}
@@ -3700,7 +3297,7 @@
assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams)));
assert(!((dict) && (cdict))); /* either dict or cdict, not both */
- CHECK_F( ZSTD_compressBegin_internal(cctx,
+ FORWARD_IF_ERROR( ZSTD_compressBegin_internal(cctx,
dict, dictSize, dictContentType, ZSTD_dtlm_fast,
cdict,
params, pledgedSrcSize,
@@ -3718,13 +3315,17 @@
/* ZSTD_resetCStream():
* pledgedSrcSize == 0 means "unknown" */
-size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcSize)
+size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pss)
{
- ZSTD_CCtx_params params = zcs->requestedParams;
+ /* temporary : 0 interpreted as "unknown" during transition period.
+ * Users willing to specify "unknown" **must** use ZSTD_CONTENTSIZE_UNKNOWN.
+ * 0 will be interpreted as "empty" in the future.
+ */
+ U64 const pledgedSrcSize = (pss==0) ? ZSTD_CONTENTSIZE_UNKNOWN : pss;
DEBUGLOG(4, "ZSTD_resetCStream: pledgedSrcSize = %u", (unsigned)pledgedSrcSize);
- if (pledgedSrcSize==0) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN;
- params.fParams.contentSizeFlag = 1;
- return ZSTD_resetCStream_internal(zcs, NULL, 0, ZSTD_dct_auto, zcs->cdict, params, pledgedSrcSize);
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize) );
+ return 0;
}
/*! ZSTD_initCStream_internal() :
@@ -3736,32 +3337,18 @@
ZSTD_CCtx_params params, unsigned long long pledgedSrcSize)
{
DEBUGLOG(4, "ZSTD_initCStream_internal");
- params.cParams = ZSTD_getCParamsFromCCtxParams(¶ms, pledgedSrcSize, dictSize);
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize) );
assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams)));
+ zcs->requestedParams = params;
assert(!((dict) && (cdict))); /* either dict or cdict, not both */
-
- if (dict && dictSize >= 8) {
- DEBUGLOG(4, "loading dictionary of size %u", (unsigned)dictSize);
- if (zcs->staticSize) { /* static CCtx : never uses malloc */
- /* incompatible with internal cdict creation */
- return ERROR(memory_allocation);
- }
- ZSTD_freeCDict(zcs->cdictLocal);
- zcs->cdictLocal = ZSTD_createCDict_advanced(dict, dictSize,
- ZSTD_dlm_byCopy, ZSTD_dct_auto,
- params.cParams, zcs->customMem);
- zcs->cdict = zcs->cdictLocal;
- if (zcs->cdictLocal == NULL) return ERROR(memory_allocation);
+ if (dict) {
+ FORWARD_IF_ERROR( ZSTD_CCtx_loadDictionary(zcs, dict, dictSize) );
} else {
- if (cdict) {
- params.cParams = ZSTD_getCParamsFromCDict(cdict); /* cParams are enforced from cdict; it includes windowLog */
- }
- ZSTD_freeCDict(zcs->cdictLocal);
- zcs->cdictLocal = NULL;
- zcs->cdict = cdict;
+ /* Dictionary is cleared if !cdict */
+ FORWARD_IF_ERROR( ZSTD_CCtx_refCDict(zcs, cdict) );
}
-
- return ZSTD_resetCStream_internal(zcs, NULL, 0, ZSTD_dct_auto, zcs->cdict, params, pledgedSrcSize);
+ return 0;
}
/* ZSTD_initCStream_usingCDict_advanced() :
@@ -3772,22 +3359,20 @@
unsigned long long pledgedSrcSize)
{
DEBUGLOG(4, "ZSTD_initCStream_usingCDict_advanced");
- if (!cdict) return ERROR(dictionary_wrong); /* cannot handle NULL cdict (does not know what to do) */
- { ZSTD_CCtx_params params = zcs->requestedParams;
- params.cParams = ZSTD_getCParamsFromCDict(cdict);
- params.fParams = fParams;
- return ZSTD_initCStream_internal(zcs,
- NULL, 0, cdict,
- params, pledgedSrcSize);
- }
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize) );
+ zcs->requestedParams.fParams = fParams;
+ FORWARD_IF_ERROR( ZSTD_CCtx_refCDict(zcs, cdict) );
+ return 0;
}
/* note : cdict must outlive compression session */
size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cdict)
{
- ZSTD_frameParameters const fParams = { 0 /* contentSizeFlag */, 0 /* checksum */, 0 /* hideDictID */ };
DEBUGLOG(4, "ZSTD_initCStream_usingCDict");
- return ZSTD_initCStream_usingCDict_advanced(zcs, cdict, fParams, ZSTD_CONTENTSIZE_UNKNOWN); /* note : will check that cdict != NULL */
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_refCDict(zcs, cdict) );
+ return 0;
}
@@ -3797,33 +3382,53 @@
* dict is loaded with default parameters ZSTD_dm_auto and ZSTD_dlm_byCopy. */
size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs,
const void* dict, size_t dictSize,
- ZSTD_parameters params, unsigned long long pledgedSrcSize)
+ ZSTD_parameters params, unsigned long long pss)
{
- DEBUGLOG(4, "ZSTD_initCStream_advanced: pledgedSrcSize=%u, flag=%u",
- (unsigned)pledgedSrcSize, params.fParams.contentSizeFlag);
- CHECK_F( ZSTD_checkCParams(params.cParams) );
- if ((pledgedSrcSize==0) && (params.fParams.contentSizeFlag==0)) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN; /* for compatibility with older programs relying on this behavior. Users should now specify ZSTD_CONTENTSIZE_UNKNOWN. This line will be removed in the future. */
+ /* for compatibility with older programs relying on this behavior.
+ * Users should now specify ZSTD_CONTENTSIZE_UNKNOWN.
+ * This line will be removed in the future.
+ */
+ U64 const pledgedSrcSize = (pss==0 && params.fParams.contentSizeFlag==0) ? ZSTD_CONTENTSIZE_UNKNOWN : pss;
+ DEBUGLOG(4, "ZSTD_initCStream_advanced");
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize) );
+ FORWARD_IF_ERROR( ZSTD_checkCParams(params.cParams) );
zcs->requestedParams = ZSTD_assignParamsToCCtxParams(zcs->requestedParams, params);
- return ZSTD_initCStream_internal(zcs, dict, dictSize, NULL /*cdict*/, zcs->requestedParams, pledgedSrcSize);
+ FORWARD_IF_ERROR( ZSTD_CCtx_loadDictionary(zcs, dict, dictSize) );
+ return 0;
}
size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel)
{
- ZSTD_CCtxParams_init(&zcs->requestedParams, compressionLevel);
- return ZSTD_initCStream_internal(zcs, dict, dictSize, NULL, zcs->requestedParams, ZSTD_CONTENTSIZE_UNKNOWN);
+ DEBUGLOG(4, "ZSTD_initCStream_usingDict");
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_loadDictionary(zcs, dict, dictSize) );
+ return 0;
}
size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pss)
{
- U64 const pledgedSrcSize = (pss==0) ? ZSTD_CONTENTSIZE_UNKNOWN : pss; /* temporary : 0 interpreted as "unknown" during transition period. Users willing to specify "unknown" **must** use ZSTD_CONTENTSIZE_UNKNOWN. `0` will be interpreted as "empty" in the future */
- ZSTD_CCtxParams_init(&zcs->requestedParams, compressionLevel);
- return ZSTD_initCStream_internal(zcs, NULL, 0, NULL, zcs->requestedParams, pledgedSrcSize);
+ /* temporary : 0 interpreted as "unknown" during transition period.
+ * Users willing to specify "unknown" **must** use ZSTD_CONTENTSIZE_UNKNOWN.
+ * 0 will be interpreted as "empty" in the future.
+ */
+ U64 const pledgedSrcSize = (pss==0) ? ZSTD_CONTENTSIZE_UNKNOWN : pss;
+ DEBUGLOG(4, "ZSTD_initCStream_srcSize");
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_refCDict(zcs, NULL) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize) );
+ return 0;
}
size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLevel)
{
DEBUGLOG(4, "ZSTD_initCStream");
- return ZSTD_initCStream_srcSize(zcs, compressionLevel, ZSTD_CONTENTSIZE_UNKNOWN);
+ FORWARD_IF_ERROR( ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_refCDict(zcs, NULL) );
+ FORWARD_IF_ERROR( ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel) );
+ return 0;
}
/*====== Compression ======*/
@@ -3847,10 +3452,10 @@
* internal function for all *compressStream*() variants
* non-static, because can be called from zstdmt_compress.c
* @return : hint size for next input */
-size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
- ZSTD_outBuffer* output,
- ZSTD_inBuffer* input,
- ZSTD_EndDirective const flushMode)
+static size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
+ ZSTD_outBuffer* output,
+ ZSTD_inBuffer* input,
+ ZSTD_EndDirective const flushMode)
{
const char* const istart = (const char*)input->src;
const char* const iend = istart + input->size;
@@ -3873,8 +3478,7 @@
switch(zcs->streamStage)
{
case zcss_init:
- /* call ZSTD_initCStream() first ! */
- return ERROR(init_missing);
+ RETURN_ERROR(init_missing, "call ZSTD_initCStream() first!");
case zcss_load:
if ( (flushMode == ZSTD_e_end)
@@ -3884,7 +3488,7 @@
size_t const cSize = ZSTD_compressEnd(zcs,
op, oend-op, ip, iend-ip);
DEBUGLOG(4, "ZSTD_compressEnd : cSize=%u", (unsigned)cSize);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
ip = iend;
op += cSize;
zcs->frameEnded = 1;
@@ -3925,7 +3529,7 @@
zcs->inBuff + zcs->inToCompress, iSize) :
ZSTD_compressContinue(zcs, cDst, oSize,
zcs->inBuff + zcs->inToCompress, iSize);
- if (ZSTD_isError(cSize)) return cSize;
+ FORWARD_IF_ERROR(cSize);
zcs->frameEnded = lastBlock;
/* prepare next block */
zcs->inBuffTarget = zcs->inBuffPos + zcs->blockSize;
@@ -3953,7 +3557,7 @@
case zcss_flush:
DEBUGLOG(5, "flush stage");
{ size_t const toFlush = zcs->outBuffContentSize - zcs->outBuffFlushedSize;
- size_t const flushed = ZSTD_limitCopy(op, oend-op,
+ size_t const flushed = ZSTD_limitCopy(op, (size_t)(oend-op),
zcs->outBuff + zcs->outBuffFlushedSize, toFlush);
DEBUGLOG(5, "toFlush: %u into %u ==> flushed: %u",
(unsigned)toFlush, (unsigned)(oend-op), (unsigned)flushed);
@@ -4001,7 +3605,7 @@
size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input)
{
- CHECK_F( ZSTD_compressStream2(zcs, output, input, ZSTD_e_continue) );
+ FORWARD_IF_ERROR( ZSTD_compressStream2(zcs, output, input, ZSTD_e_continue) );
return ZSTD_nextInputSizeHint_MTorST(zcs);
}
@@ -4013,14 +3617,15 @@
{
DEBUGLOG(5, "ZSTD_compressStream2, endOp=%u ", (unsigned)endOp);
/* check conditions */
- if (output->pos > output->size) return ERROR(GENERIC);
- if (input->pos > input->size) return ERROR(GENERIC);
+ RETURN_ERROR_IF(output->pos > output->size, GENERIC);
+ RETURN_ERROR_IF(input->pos > input->size, GENERIC);
assert(cctx!=NULL);
/* transparent initialization stage */
if (cctx->streamStage == zcss_init) {
ZSTD_CCtx_params params = cctx->requestedParams;
ZSTD_prefixDict const prefixDict = cctx->prefixDict;
+ FORWARD_IF_ERROR( ZSTD_initLocalDict(cctx) ); /* Init the local dict if present. */
memset(&cctx->prefixDict, 0, sizeof(cctx->prefixDict)); /* single usage */
assert(prefixDict.dict==NULL || cctx->cdict==NULL); /* only one can be set */
DEBUGLOG(4, "ZSTD_compressStream2 : transparent init stage");
@@ -4039,11 +3644,11 @@
DEBUGLOG(4, "ZSTD_compressStream2: creating new mtctx for nbWorkers=%u",
params.nbWorkers);
cctx->mtctx = ZSTDMT_createCCtx_advanced(params.nbWorkers, cctx->customMem);
- if (cctx->mtctx == NULL) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(cctx->mtctx == NULL, memory_allocation);
}
/* mt compression */
DEBUGLOG(4, "call ZSTDMT_initCStream_internal as nbWorkers=%u", params.nbWorkers);
- CHECK_F( ZSTDMT_initCStream_internal(
+ FORWARD_IF_ERROR( ZSTDMT_initCStream_internal(
cctx->mtctx,
prefixDict.dict, prefixDict.dictSize, ZSTD_dct_rawContent,
cctx->cdict, params, cctx->pledgedSrcSizePlusOne-1) );
@@ -4051,7 +3656,7 @@
cctx->appliedParams.nbWorkers = params.nbWorkers;
} else
#endif
- { CHECK_F( ZSTD_resetCStream_internal(cctx,
+ { FORWARD_IF_ERROR( ZSTD_resetCStream_internal(cctx,
prefixDict.dict, prefixDict.dictSize, prefixDict.dictContentType,
cctx->cdict,
params, cctx->pledgedSrcSizePlusOne-1) );
@@ -4063,20 +3668,30 @@
/* compression stage */
#ifdef ZSTD_MULTITHREAD
if (cctx->appliedParams.nbWorkers > 0) {
+ int const forceMaxProgress = (endOp == ZSTD_e_flush || endOp == ZSTD_e_end);
+ size_t flushMin;
+ assert(forceMaxProgress || endOp == ZSTD_e_continue /* Protection for a new flush type */);
if (cctx->cParamsChanged) {
ZSTDMT_updateCParams_whileCompressing(cctx->mtctx, &cctx->requestedParams);
cctx->cParamsChanged = 0;
}
- { size_t const flushMin = ZSTDMT_compressStream_generic(cctx->mtctx, output, input, endOp);
+ do {
+ flushMin = ZSTDMT_compressStream_generic(cctx->mtctx, output, input, endOp);
if ( ZSTD_isError(flushMin)
|| (endOp == ZSTD_e_end && flushMin == 0) ) { /* compression completed */
ZSTD_CCtx_reset(cctx, ZSTD_reset_session_only);
}
- DEBUGLOG(5, "completed ZSTD_compressStream2 delegating to ZSTDMT_compressStream_generic");
- return flushMin;
- } }
+ FORWARD_IF_ERROR(flushMin);
+ } while (forceMaxProgress && flushMin != 0 && output->pos < output->size);
+ DEBUGLOG(5, "completed ZSTD_compressStream2 delegating to ZSTDMT_compressStream_generic");
+ /* Either we don't require maximum forward progress, we've finished the
+ * flush, or we are out of output space.
+ */
+ assert(!forceMaxProgress || flushMin == 0 || output->pos == output->size);
+ return flushMin;
+ }
#endif
- CHECK_F( ZSTD_compressStream_generic(cctx, output, input, endOp) );
+ FORWARD_IF_ERROR( ZSTD_compressStream_generic(cctx, output, input, endOp) );
DEBUGLOG(5, "completed ZSTD_compressStream2");
return cctx->outBuffContentSize - cctx->outBuffFlushedSize; /* remaining to flush */
}
@@ -4107,10 +3722,10 @@
dst, dstCapacity, &oPos,
src, srcSize, &iPos,
ZSTD_e_end);
- if (ZSTD_isError(result)) return result;
+ FORWARD_IF_ERROR(result);
if (result != 0) { /* compression not completed, due to lack of output space */
assert(oPos == dstCapacity);
- return ERROR(dstSize_tooSmall);
+ RETURN_ERROR(dstSize_tooSmall);
}
assert(iPos == srcSize); /* all input is expected consumed */
return oPos;
@@ -4132,11 +3747,11 @@
{
ZSTD_inBuffer input = { NULL, 0, 0 };
size_t const remainingToFlush = ZSTD_compressStream2(zcs, output, &input, ZSTD_e_end);
- CHECK_F( remainingToFlush );
+ FORWARD_IF_ERROR( remainingToFlush );
if (zcs->appliedParams.nbWorkers > 0) return remainingToFlush; /* minimal estimation */
/* single thread mode : attempt to calculate remaining to flush more precisely */
{ size_t const lastBlockSize = zcs->frameEnded ? 0 : ZSTD_BLOCKHEADERSIZE;
- size_t const checksumSize = zcs->frameEnded ? 0 : zcs->appliedParams.fParams.checksumFlag * 4;
+ size_t const checksumSize = (size_t)(zcs->frameEnded ? 0 : zcs->appliedParams.fParams.checksumFlag * 4);
size_t const toFlush = remainingToFlush + lastBlockSize + checksumSize;
DEBUGLOG(4, "ZSTD_endStream : remaining to flush : %u", (unsigned)toFlush);
return toFlush;
@@ -4151,7 +3766,7 @@
int ZSTD_minCLevel(void) { return (int)-ZSTD_TARGETLENGTH_MAX; }
static const ZSTD_compressionParameters ZSTD_defaultCParameters[4][ZSTD_MAX_CLEVEL+1] = {
-{ /* "default" - guarantees a monotonically increasing memory budget */
+{ /* "default" - for any srcSize > 256 KB */
/* W, C, H, S, L, TL, strat */
{ 19, 12, 13, 1, 6, 1, ZSTD_fast }, /* base for negative levels */
{ 19, 13, 14, 1, 7, 0, ZSTD_fast }, /* level 1 */
@@ -4258,13 +3873,13 @@
};
/*! ZSTD_getCParams() :
-* @return ZSTD_compressionParameters structure for a selected compression level, srcSize and dictSize.
-* Size values are optional, provide 0 if not known or unused */
+ * @return ZSTD_compressionParameters structure for a selected compression level, srcSize and dictSize.
+ * Size values are optional, provide 0 if not known or unused */
ZSTD_compressionParameters ZSTD_getCParams(int compressionLevel, unsigned long long srcSizeHint, size_t dictSize)
{
size_t const addedSize = srcSizeHint ? 0 : 500;
- U64 const rSize = srcSizeHint+dictSize ? srcSizeHint+dictSize+addedSize : (U64)-1;
- U32 const tableID = (rSize <= 256 KB) + (rSize <= 128 KB) + (rSize <= 16 KB); /* intentional underflow for srcSizeHint == 0 */
+ U64 const rSize = srcSizeHint+dictSize ? srcSizeHint+dictSize+addedSize : ZSTD_CONTENTSIZE_UNKNOWN; /* intentional overflow for srcSizeHint == ZSTD_CONTENTSIZE_UNKNOWN */
+ U32 const tableID = (rSize <= 256 KB) + (rSize <= 128 KB) + (rSize <= 16 KB);
int row = compressionLevel;
DEBUGLOG(5, "ZSTD_getCParams (cLevel=%i)", compressionLevel);
if (compressionLevel == 0) row = ZSTD_CLEVEL_DEFAULT; /* 0 == default */
@@ -4272,13 +3887,14 @@
if (compressionLevel > ZSTD_MAX_CLEVEL) row = ZSTD_MAX_CLEVEL;
{ ZSTD_compressionParameters cp = ZSTD_defaultCParameters[tableID][row];
if (compressionLevel < 0) cp.targetLength = (unsigned)(-compressionLevel); /* acceleration factor */
- return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize);
+ return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize); /* refine parameters based on srcSize & dictSize */
}
}
/*! ZSTD_getParams() :
-* same as ZSTD_getCParams(), but @return a `ZSTD_parameters` object (instead of `ZSTD_compressionParameters`).
-* All fields of `ZSTD_frameParameters` are set to default (0) */
+ * same idea as ZSTD_getCParams()
+ * @return a `ZSTD_parameters` structure (instead of `ZSTD_compressionParameters`).
+ * Fields of `ZSTD_frameParameters` are set to default values */
ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long srcSizeHint, size_t dictSize) {
ZSTD_parameters params;
ZSTD_compressionParameters const cParams = ZSTD_getCParams(compressionLevel, srcSizeHint, dictSize);
--- a/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h Sun Sep 15 20:04:00 2019 -0700
@@ -33,13 +33,13 @@
***************************************/
#define kSearchStrength 8
#define HASH_READ_SIZE 8
-#define ZSTD_DUBT_UNSORTED_MARK 1 /* For btlazy2 strategy, index 1 now means "unsorted".
+#define ZSTD_DUBT_UNSORTED_MARK 1 /* For btlazy2 strategy, index ZSTD_DUBT_UNSORTED_MARK==1 means "unsorted".
It could be confused for a real successor at index "1", if sorted as larger than its predecessor.
It's not a big deal though : candidate will just be sorted again.
- Additionnally, candidate position 1 will be lost.
+ Additionally, candidate position 1 will be lost.
But candidate 1 cannot hide a large tree of candidates, so it's a minimal loss.
- The benefit is that ZSTD_DUBT_UNSORTED_MARK cannot be misdhandled after table re-use with a different strategy
- Constant required by ZSTD_compressBlock_btlazy2() and ZSTD_reduceTable_internal() */
+ The benefit is that ZSTD_DUBT_UNSORTED_MARK cannot be mishandled after table re-use with a different strategy.
+ This constant is required by ZSTD_compressBlock_btlazy2() and ZSTD_reduceTable_internal() */
/*-*************************************
@@ -55,6 +55,14 @@
} ZSTD_prefixDict;
typedef struct {
+ void* dictBuffer;
+ void const* dict;
+ size_t dictSize;
+ ZSTD_dictContentType_e dictContentType;
+ ZSTD_CDict* cdict;
+} ZSTD_localDict;
+
+typedef struct {
U32 CTable[HUF_CTABLE_SIZE_U32(255)];
HUF_repeat repeatMode;
} ZSTD_hufCTables_t;
@@ -107,6 +115,7 @@
U32 offCodeSumBasePrice; /* to compare to log2(offreq) */
ZSTD_OptPrice_e priceType; /* prices can be determined dynamically, or follow a pre-defined cost structure */
const ZSTD_entropyCTables_t* symbolCosts; /* pre-calculated dictionary statistics */
+ ZSTD_literalCompressionMode_e literalCompressionMode;
} optState_t;
typedef struct {
@@ -119,21 +128,26 @@
BYTE const* base; /* All regular indexes relative to this position */
BYTE const* dictBase; /* extDict indexes relative to this position */
U32 dictLimit; /* below that point, need extDict */
- U32 lowLimit; /* below that point, no more data */
+ U32 lowLimit; /* below that point, no more valid data */
} ZSTD_window_t;
typedef struct ZSTD_matchState_t ZSTD_matchState_t;
struct ZSTD_matchState_t {
ZSTD_window_t window; /* State for window round buffer management */
- U32 loadedDictEnd; /* index of end of dictionary */
+ U32 loadedDictEnd; /* index of end of dictionary, within context's referential.
+ * When loadedDictEnd != 0, a dictionary is in use, and still valid.
+ * This relies on a mechanism to set loadedDictEnd=0 when dictionary is no longer within distance.
+ * Such mechanism is provided within ZSTD_window_enforceMaxDist() and ZSTD_checkDictValidity().
+ * When dict referential is copied into active context (i.e. not attached),
+ * loadedDictEnd == dictSize, since referential starts from zero.
+ */
U32 nextToUpdate; /* index from which to continue table update */
- U32 nextToUpdate3; /* index from which to continue table update */
- U32 hashLog3; /* dispatch table : larger == faster, more memory */
+ U32 hashLog3; /* dispatch table for matches of len==3 : larger == faster, more memory */
U32* hashTable;
U32* hashTable3;
U32* chainTable;
optState_t opt; /* optimal parser state */
- const ZSTD_matchState_t * dictMatchState;
+ const ZSTD_matchState_t* dictMatchState;
ZSTD_compressionParameters cParams;
};
@@ -186,8 +200,12 @@
int compressionLevel;
int forceWindow; /* force back-references to respect limit of
* 1<<wLog, even for dictionary */
+ size_t targetCBlockSize; /* Tries to fit compressed block size to be around targetCBlockSize.
+ * No target when targetCBlockSize == 0.
+ * There is no guarantee on compressed block size */
ZSTD_dictAttachPref_e attachDictPref;
+ ZSTD_literalCompressionMode_e literalCompressionMode;
/* Multithreading: used to pass parameters to mtctx */
int nbWorkers;
@@ -243,7 +261,7 @@
U32 frameEnded;
/* Dictionary */
- ZSTD_CDict* cdictLocal;
+ ZSTD_localDict localDict;
const ZSTD_CDict* cdict;
ZSTD_prefixDict prefixDict; /* single-usage dictionary */
@@ -295,6 +313,30 @@
return (mlBase > 127) ? ZSTD_highbit32(mlBase) + ML_deltaCode : ML_Code[mlBase];
}
+/* ZSTD_cParam_withinBounds:
+ * @return 1 if value is within cParam bounds,
+ * 0 otherwise */
+MEM_STATIC int ZSTD_cParam_withinBounds(ZSTD_cParameter cParam, int value)
+{
+ ZSTD_bounds const bounds = ZSTD_cParam_getBounds(cParam);
+ if (ZSTD_isError(bounds.error)) return 0;
+ if (value < bounds.lowerBound) return 0;
+ if (value > bounds.upperBound) return 0;
+ return 1;
+}
+
+/* ZSTD_minGain() :
+ * minimum compression required
+ * to generate a compress block or a compressed literals section.
+ * note : use same formula for both situations */
+MEM_STATIC size_t ZSTD_minGain(size_t srcSize, ZSTD_strategy strat)
+{
+ U32 const minlog = (strat>=ZSTD_btultra) ? (U32)(strat) - 1 : 6;
+ ZSTD_STATIC_ASSERT(ZSTD_btultra == 8);
+ assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
+ return (srcSize >> minlog) + 2;
+}
+
/*! ZSTD_storeSeq() :
* Store a sequence (literal length, literals, offset code and match length code) into seqStore_t.
* `offsetCode` : distance to match + 3 (values 1-3 are repCodes).
@@ -314,7 +356,7 @@
/* copy Literals */
assert(seqStorePtr->maxNbLit <= 128 KB);
assert(seqStorePtr->lit + litLength <= seqStorePtr->litStart + seqStorePtr->maxNbLit);
- ZSTD_wildcopy(seqStorePtr->lit, literals, litLength);
+ ZSTD_wildcopy(seqStorePtr->lit, literals, (ptrdiff_t)litLength, ZSTD_no_overlap);
seqStorePtr->lit += litLength;
/* literal Length */
@@ -554,6 +596,9 @@
/*-*************************************
* Round buffer management
***************************************/
+#if (ZSTD_WINDOWLOG_MAX_64 > 31)
+# error "ZSTD_WINDOWLOG_MAX is too large : would overflow ZSTD_CURRENT_MAX"
+#endif
/* Max current allowed */
#define ZSTD_CURRENT_MAX ((3U << 29) + (1U << ZSTD_WINDOWLOG_MAX))
/* Maximum chunk size before overflow correction needs to be called again */
@@ -665,31 +710,49 @@
* Updates lowLimit so that:
* (srcEnd - base) - lowLimit == maxDist + loadedDictEnd
*
- * This allows a simple check that index >= lowLimit to see if index is valid.
- * This must be called before a block compression call, with srcEnd as the block
- * source end.
+ * It ensures index is valid as long as index >= lowLimit.
+ * This must be called before a block compression call.
+ *
+ * loadedDictEnd is only defined if a dictionary is in use for current compression.
+ * As the name implies, loadedDictEnd represents the index at end of dictionary.
+ * The value lies within context's referential, it can be directly compared to blockEndIdx.
*
- * If loadedDictEndPtr is not NULL, we set it to zero once we update lowLimit.
- * This is because dictionaries are allowed to be referenced as long as the last
- * byte of the dictionary is in the window, but once they are out of range,
- * they cannot be referenced. If loadedDictEndPtr is NULL, we use
- * loadedDictEnd == 0.
+ * If loadedDictEndPtr is NULL, no dictionary is in use, and we use loadedDictEnd == 0.
+ * If loadedDictEndPtr is not NULL, we set it to zero after updating lowLimit.
+ * This is because dictionaries are allowed to be referenced fully
+ * as long as the last byte of the dictionary is in the window.
+ * Once input has progressed beyond window size, dictionary cannot be referenced anymore.
*
- * In normal dict mode, the dict is between lowLimit and dictLimit. In
- * dictMatchState mode, lowLimit and dictLimit are the same, and the dictionary
- * is below them. forceWindow and dictMatchState are therefore incompatible.
+ * In normal dict mode, the dictionary lies between lowLimit and dictLimit.
+ * In dictMatchState mode, lowLimit and dictLimit are the same,
+ * and the dictionary is below them.
+ * forceWindow and dictMatchState are therefore incompatible.
*/
MEM_STATIC void
ZSTD_window_enforceMaxDist(ZSTD_window_t* window,
- void const* srcEnd,
- U32 maxDist,
- U32* loadedDictEndPtr,
+ const void* blockEnd,
+ U32 maxDist,
+ U32* loadedDictEndPtr,
const ZSTD_matchState_t** dictMatchStatePtr)
{
- U32 const blockEndIdx = (U32)((BYTE const*)srcEnd - window->base);
- U32 loadedDictEnd = (loadedDictEndPtr != NULL) ? *loadedDictEndPtr : 0;
- DEBUGLOG(5, "ZSTD_window_enforceMaxDist: blockEndIdx=%u, maxDist=%u",
- (unsigned)blockEndIdx, (unsigned)maxDist);
+ U32 const blockEndIdx = (U32)((BYTE const*)blockEnd - window->base);
+ U32 const loadedDictEnd = (loadedDictEndPtr != NULL) ? *loadedDictEndPtr : 0;
+ DEBUGLOG(5, "ZSTD_window_enforceMaxDist: blockEndIdx=%u, maxDist=%u, loadedDictEnd=%u",
+ (unsigned)blockEndIdx, (unsigned)maxDist, (unsigned)loadedDictEnd);
+
+ /* - When there is no dictionary : loadedDictEnd == 0.
+ In which case, the test (blockEndIdx > maxDist) is merely to avoid
+ overflowing next operation `newLowLimit = blockEndIdx - maxDist`.
+ - When there is a standard dictionary :
+ Index referential is copied from the dictionary,
+ which means it starts from 0.
+ In which case, loadedDictEnd == dictSize,
+ and it makes sense to compare `blockEndIdx > maxDist + dictSize`
+ since `blockEndIdx` also starts from zero.
+ - When there is an attached dictionary :
+ loadedDictEnd is expressed within the referential of the context,
+ so it can be directly compared against blockEndIdx.
+ */
if (blockEndIdx > maxDist + loadedDictEnd) {
U32 const newLowLimit = blockEndIdx - maxDist;
if (window->lowLimit < newLowLimit) window->lowLimit = newLowLimit;
@@ -698,11 +761,45 @@
(unsigned)window->dictLimit, (unsigned)window->lowLimit);
window->dictLimit = window->lowLimit;
}
- if (loadedDictEndPtr)
+ /* On reaching window size, dictionaries are invalidated */
+ if (loadedDictEndPtr) *loadedDictEndPtr = 0;
+ if (dictMatchStatePtr) *dictMatchStatePtr = NULL;
+ }
+}
+
+/* Similar to ZSTD_window_enforceMaxDist(),
+ * but only invalidates dictionary
+ * when input progresses beyond window size.
+ * assumption : loadedDictEndPtr and dictMatchStatePtr are valid (non NULL)
+ * loadedDictEnd uses same referential as window->base
+ * maxDist is the window size */
+MEM_STATIC void
+ZSTD_checkDictValidity(const ZSTD_window_t* window,
+ const void* blockEnd,
+ U32 maxDist,
+ U32* loadedDictEndPtr,
+ const ZSTD_matchState_t** dictMatchStatePtr)
+{
+ assert(loadedDictEndPtr != NULL);
+ assert(dictMatchStatePtr != NULL);
+ { U32 const blockEndIdx = (U32)((BYTE const*)blockEnd - window->base);
+ U32 const loadedDictEnd = *loadedDictEndPtr;
+ DEBUGLOG(5, "ZSTD_checkDictValidity: blockEndIdx=%u, maxDist=%u, loadedDictEnd=%u",
+ (unsigned)blockEndIdx, (unsigned)maxDist, (unsigned)loadedDictEnd);
+ assert(blockEndIdx >= loadedDictEnd);
+
+ if (blockEndIdx > loadedDictEnd + maxDist) {
+ /* On reaching window size, dictionaries are invalidated.
+ * For simplification, if window size is reached anywhere within next block,
+ * the dictionary is invalidated for the full block.
+ */
+ DEBUGLOG(6, "invalidating dictionary for current block (distance > windowSize)");
*loadedDictEndPtr = 0;
- if (dictMatchStatePtr)
*dictMatchStatePtr = NULL;
- }
+ } else {
+ if (*loadedDictEndPtr != 0) {
+ DEBUGLOG(6, "dictionary considered valid for current block");
+ } } }
}
/**
@@ -744,6 +841,17 @@
return contiguous;
}
+MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_matchState_t* ms, U32 current, unsigned windowLog)
+{
+ U32 const maxDistance = 1U << windowLog;
+ U32 const lowestValid = ms->window.lowLimit;
+ U32 const withinWindow = (current - lowestValid > maxDistance) ? current - maxDistance : lowestValid;
+ U32 const isDictionary = (ms->loadedDictEnd != 0);
+ U32 const matchLowest = isDictionary ? lowestValid : withinWindow;
+ return matchLowest;
+}
+
+
/* debug functions */
#if (DEBUGLEVEL>=2)
@@ -806,13 +914,6 @@
void ZSTD_resetSeqStore(seqStore_t* ssPtr);
-/*! ZSTD_compressStream_generic() :
- * Private use only. To be called from zstdmt_compress.c in single-thread mode. */
-size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
- ZSTD_outBuffer* output,
- ZSTD_inBuffer* input,
- ZSTD_EndDirective const flushMode);
-
/*! ZSTD_getCParamsFromCDict() :
* as the name implies */
ZSTD_compressionParameters ZSTD_getCParamsFromCDict(const ZSTD_CDict* cdict);
@@ -839,7 +940,7 @@
/* ZSTD_writeLastEmptyBlock() :
* output an empty Block with end-of-frame mark to complete a frame
* @return : size of data written into `dst` (== ZSTD_blockHeaderSize (defined in zstd_internal.h))
- * or an error code if `dstCapcity` is too small (<ZSTD_blockHeaderSize)
+ * or an error code if `dstCapacity` is too small (<ZSTD_blockHeaderSize)
*/
size_t ZSTD_writeLastEmptyBlock(void* dst, size_t dstCapacity);
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_literals.c Sun Sep 15 20:04:00 2019 -0700
@@ -0,0 +1,149 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+ /*-*************************************
+ * Dependencies
+ ***************************************/
+#include "zstd_compress_literals.h"
+
+size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void* src, size_t srcSize)
+{
+ BYTE* const ostart = (BYTE* const)dst;
+ U32 const flSize = 1 + (srcSize>31) + (srcSize>4095);
+
+ RETURN_ERROR_IF(srcSize + flSize > dstCapacity, dstSize_tooSmall);
+
+ switch(flSize)
+ {
+ case 1: /* 2 - 1 - 5 */
+ ostart[0] = (BYTE)((U32)set_basic + (srcSize<<3));
+ break;
+ case 2: /* 2 - 2 - 12 */
+ MEM_writeLE16(ostart, (U16)((U32)set_basic + (1<<2) + (srcSize<<4)));
+ break;
+ case 3: /* 2 - 2 - 20 */
+ MEM_writeLE32(ostart, (U32)((U32)set_basic + (3<<2) + (srcSize<<4)));
+ break;
+ default: /* not necessary : flSize is {1,2,3} */
+ assert(0);
+ }
+
+ memcpy(ostart + flSize, src, srcSize);
+ return srcSize + flSize;
+}
+
+size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const void* src, size_t srcSize)
+{
+ BYTE* const ostart = (BYTE* const)dst;
+ U32 const flSize = 1 + (srcSize>31) + (srcSize>4095);
+
+ (void)dstCapacity; /* dstCapacity already guaranteed to be >=4, hence large enough */
+
+ switch(flSize)
+ {
+ case 1: /* 2 - 1 - 5 */
+ ostart[0] = (BYTE)((U32)set_rle + (srcSize<<3));
+ break;
+ case 2: /* 2 - 2 - 12 */
+ MEM_writeLE16(ostart, (U16)((U32)set_rle + (1<<2) + (srcSize<<4)));
+ break;
+ case 3: /* 2 - 2 - 20 */
+ MEM_writeLE32(ostart, (U32)((U32)set_rle + (3<<2) + (srcSize<<4)));
+ break;
+ default: /* not necessary : flSize is {1,2,3} */
+ assert(0);
+ }
+
+ ostart[flSize] = *(const BYTE*)src;
+ return flSize+1;
+}
+
+size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf,
+ ZSTD_hufCTables_t* nextHuf,
+ ZSTD_strategy strategy, int disableLiteralCompression,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ void* workspace, size_t wkspSize,
+ const int bmi2)
+{
+ size_t const minGain = ZSTD_minGain(srcSize, strategy);
+ size_t const lhSize = 3 + (srcSize >= 1 KB) + (srcSize >= 16 KB);
+ BYTE* const ostart = (BYTE*)dst;
+ U32 singleStream = srcSize < 256;
+ symbolEncodingType_e hType = set_compressed;
+ size_t cLitSize;
+
+ DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=%i)",
+ disableLiteralCompression);
+
+ /* Prepare nextEntropy assuming reusing the existing table */
+ memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+
+ if (disableLiteralCompression)
+ return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
+
+ /* small ? don't even attempt compression (speed opt) */
+# define COMPRESS_LITERALS_SIZE_MIN 63
+ { size_t const minLitSize = (prevHuf->repeatMode == HUF_repeat_valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN;
+ if (srcSize <= minLitSize) return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
+ }
+
+ RETURN_ERROR_IF(dstCapacity < lhSize+1, dstSize_tooSmall, "not enough space for compression");
+ { HUF_repeat repeat = prevHuf->repeatMode;
+ int const preferRepeat = strategy < ZSTD_lazy ? srcSize <= 1024 : 0;
+ if (repeat == HUF_repeat_valid && lhSize == 3) singleStream = 1;
+ cLitSize = singleStream ? HUF_compress1X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
+ workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2)
+ : HUF_compress4X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
+ workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2);
+ if (repeat != HUF_repeat_none) {
+ /* reused the existing table */
+ hType = set_repeat;
+ }
+ }
+
+ if ((cLitSize==0) | (cLitSize >= srcSize - minGain) | ERR_isError(cLitSize)) {
+ memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+ return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
+ }
+ if (cLitSize==1) {
+ memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+ return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSize);
+ }
+
+ if (hType == set_compressed) {
+ /* using a newly constructed table */
+ nextHuf->repeatMode = HUF_repeat_check;
+ }
+
+ /* Build header */
+ switch(lhSize)
+ {
+ case 3: /* 2 - 2 - 10 - 10 */
+ { U32 const lhc = hType + ((!singleStream) << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<14);
+ MEM_writeLE24(ostart, lhc);
+ break;
+ }
+ case 4: /* 2 - 2 - 14 - 14 */
+ { U32 const lhc = hType + (2 << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<18);
+ MEM_writeLE32(ostart, lhc);
+ break;
+ }
+ case 5: /* 2 - 2 - 18 - 18 */
+ { U32 const lhc = hType + (3 << 2) + ((U32)srcSize<<4) + ((U32)cLitSize<<22);
+ MEM_writeLE32(ostart, lhc);
+ ostart[4] = (BYTE)(cLitSize >> 10);
+ break;
+ }
+ default: /* not possible : lhSize is {3,4,5} */
+ assert(0);
+ }
+ return lhSize+cLitSize;
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_literals.h Sun Sep 15 20:04:00 2019 -0700
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+#ifndef ZSTD_COMPRESS_LITERALS_H
+#define ZSTD_COMPRESS_LITERALS_H
+
+#include "zstd_compress_internal.h" /* ZSTD_hufCTables_t, ZSTD_minGain() */
+
+
+size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void* src, size_t srcSize);
+
+size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const void* src, size_t srcSize);
+
+size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf,
+ ZSTD_hufCTables_t* nextHuf,
+ ZSTD_strategy strategy, int disableLiteralCompression,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ void* workspace, size_t wkspSize,
+ const int bmi2);
+
+#endif /* ZSTD_COMPRESS_LITERALS_H */
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_sequences.c Sun Sep 15 20:04:00 2019 -0700
@@ -0,0 +1,415 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+ /*-*************************************
+ * Dependencies
+ ***************************************/
+#include "zstd_compress_sequences.h"
+
+/**
+ * -log2(x / 256) lookup table for x in [0, 256).
+ * If x == 0: Return 0
+ * Else: Return floor(-log2(x / 256) * 256)
+ */
+static unsigned const kInverseProbabilityLog256[256] = {
+ 0, 2048, 1792, 1642, 1536, 1453, 1386, 1329, 1280, 1236, 1197, 1162,
+ 1130, 1100, 1073, 1047, 1024, 1001, 980, 960, 941, 923, 906, 889,
+ 874, 859, 844, 830, 817, 804, 791, 779, 768, 756, 745, 734,
+ 724, 714, 704, 694, 685, 676, 667, 658, 650, 642, 633, 626,
+ 618, 610, 603, 595, 588, 581, 574, 567, 561, 554, 548, 542,
+ 535, 529, 523, 517, 512, 506, 500, 495, 489, 484, 478, 473,
+ 468, 463, 458, 453, 448, 443, 438, 434, 429, 424, 420, 415,
+ 411, 407, 402, 398, 394, 390, 386, 382, 377, 373, 370, 366,
+ 362, 358, 354, 350, 347, 343, 339, 336, 332, 329, 325, 322,
+ 318, 315, 311, 308, 305, 302, 298, 295, 292, 289, 286, 282,
+ 279, 276, 273, 270, 267, 264, 261, 258, 256, 253, 250, 247,
+ 244, 241, 239, 236, 233, 230, 228, 225, 222, 220, 217, 215,
+ 212, 209, 207, 204, 202, 199, 197, 194, 192, 190, 187, 185,
+ 182, 180, 178, 175, 173, 171, 168, 166, 164, 162, 159, 157,
+ 155, 153, 151, 149, 146, 144, 142, 140, 138, 136, 134, 132,
+ 130, 128, 126, 123, 121, 119, 117, 115, 114, 112, 110, 108,
+ 106, 104, 102, 100, 98, 96, 94, 93, 91, 89, 87, 85,
+ 83, 82, 80, 78, 76, 74, 73, 71, 69, 67, 66, 64,
+ 62, 61, 59, 57, 55, 54, 52, 50, 49, 47, 46, 44,
+ 42, 41, 39, 37, 36, 34, 33, 31, 30, 28, 26, 25,
+ 23, 22, 20, 19, 17, 16, 14, 13, 11, 10, 8, 7,
+ 5, 4, 2, 1,
+};
+
+static unsigned ZSTD_getFSEMaxSymbolValue(FSE_CTable const* ctable) {
+ void const* ptr = ctable;
+ U16 const* u16ptr = (U16 const*)ptr;
+ U32 const maxSymbolValue = MEM_read16(u16ptr + 1);
+ return maxSymbolValue;
+}
+
+/**
+ * Returns the cost in bytes of encoding the normalized count header.
+ * Returns an error if any of the helper functions return an error.
+ */
+static size_t ZSTD_NCountCost(unsigned const* count, unsigned const max,
+ size_t const nbSeq, unsigned const FSELog)
+{
+ BYTE wksp[FSE_NCOUNTBOUND];
+ S16 norm[MaxSeq + 1];
+ const U32 tableLog = FSE_optimalTableLog(FSELog, nbSeq, max);
+ FORWARD_IF_ERROR(FSE_normalizeCount(norm, tableLog, count, nbSeq, max));
+ return FSE_writeNCount(wksp, sizeof(wksp), norm, max, tableLog);
+}
+
+/**
+ * Returns the cost in bits of encoding the distribution described by count
+ * using the entropy bound.
+ */
+static size_t ZSTD_entropyCost(unsigned const* count, unsigned const max, size_t const total)
+{
+ unsigned cost = 0;
+ unsigned s;
+ for (s = 0; s <= max; ++s) {
+ unsigned norm = (unsigned)((256 * count[s]) / total);
+ if (count[s] != 0 && norm == 0)
+ norm = 1;
+ assert(count[s] < total);
+ cost += count[s] * kInverseProbabilityLog256[norm];
+ }
+ return cost >> 8;
+}
+
+/**
+ * Returns the cost in bits of encoding the distribution in count using ctable.
+ * Returns an error if ctable cannot represent all the symbols in count.
+ */
+static size_t ZSTD_fseBitCost(
+ FSE_CTable const* ctable,
+ unsigned const* count,
+ unsigned const max)
+{
+ unsigned const kAccuracyLog = 8;
+ size_t cost = 0;
+ unsigned s;
+ FSE_CState_t cstate;
+ FSE_initCState(&cstate, ctable);
+ RETURN_ERROR_IF(ZSTD_getFSEMaxSymbolValue(ctable) < max, GENERIC,
+ "Repeat FSE_CTable has maxSymbolValue %u < %u",
+ ZSTD_getFSEMaxSymbolValue(ctable), max);
+ for (s = 0; s <= max; ++s) {
+ unsigned const tableLog = cstate.stateLog;
+ unsigned const badCost = (tableLog + 1) << kAccuracyLog;
+ unsigned const bitCost = FSE_bitCost(cstate.symbolTT, tableLog, s, kAccuracyLog);
+ if (count[s] == 0)
+ continue;
+ RETURN_ERROR_IF(bitCost >= badCost, GENERIC,
+ "Repeat FSE_CTable has Prob[%u] == 0", s);
+ cost += count[s] * bitCost;
+ }
+ return cost >> kAccuracyLog;
+}
+
+/**
+ * Returns the cost in bits of encoding the distribution in count using the
+ * table described by norm. The max symbol support by norm is assumed >= max.
+ * norm must be valid for every symbol with non-zero probability in count.
+ */
+static size_t ZSTD_crossEntropyCost(short const* norm, unsigned accuracyLog,
+ unsigned const* count, unsigned const max)
+{
+ unsigned const shift = 8 - accuracyLog;
+ size_t cost = 0;
+ unsigned s;
+ assert(accuracyLog <= 8);
+ for (s = 0; s <= max; ++s) {
+ unsigned const normAcc = norm[s] != -1 ? norm[s] : 1;
+ unsigned const norm256 = normAcc << shift;
+ assert(norm256 > 0);
+ assert(norm256 < 256);
+ cost += count[s] * kInverseProbabilityLog256[norm256];
+ }
+ return cost >> 8;
+}
+
+symbolEncodingType_e
+ZSTD_selectEncodingType(
+ FSE_repeat* repeatMode, unsigned const* count, unsigned const max,
+ size_t const mostFrequent, size_t nbSeq, unsigned const FSELog,
+ FSE_CTable const* prevCTable,
+ short const* defaultNorm, U32 defaultNormLog,
+ ZSTD_defaultPolicy_e const isDefaultAllowed,
+ ZSTD_strategy const strategy)
+{
+ ZSTD_STATIC_ASSERT(ZSTD_defaultDisallowed == 0 && ZSTD_defaultAllowed != 0);
+ if (mostFrequent == nbSeq) {
+ *repeatMode = FSE_repeat_none;
+ if (isDefaultAllowed && nbSeq <= 2) {
+ /* Prefer set_basic over set_rle when there are 2 or less symbols,
+ * since RLE uses 1 byte, but set_basic uses 5-6 bits per symbol.
+ * If basic encoding isn't possible, always choose RLE.
+ */
+ DEBUGLOG(5, "Selected set_basic");
+ return set_basic;
+ }
+ DEBUGLOG(5, "Selected set_rle");
+ return set_rle;
+ }
+ if (strategy < ZSTD_lazy) {
+ if (isDefaultAllowed) {
+ size_t const staticFse_nbSeq_max = 1000;
+ size_t const mult = 10 - strategy;
+ size_t const baseLog = 3;
+ size_t const dynamicFse_nbSeq_min = (((size_t)1 << defaultNormLog) * mult) >> baseLog; /* 28-36 for offset, 56-72 for lengths */
+ assert(defaultNormLog >= 5 && defaultNormLog <= 6); /* xx_DEFAULTNORMLOG */
+ assert(mult <= 9 && mult >= 7);
+ if ( (*repeatMode == FSE_repeat_valid)
+ && (nbSeq < staticFse_nbSeq_max) ) {
+ DEBUGLOG(5, "Selected set_repeat");
+ return set_repeat;
+ }
+ if ( (nbSeq < dynamicFse_nbSeq_min)
+ || (mostFrequent < (nbSeq >> (defaultNormLog-1))) ) {
+ DEBUGLOG(5, "Selected set_basic");
+ /* The format allows default tables to be repeated, but it isn't useful.
+ * When using simple heuristics to select encoding type, we don't want
+ * to confuse these tables with dictionaries. When running more careful
+ * analysis, we don't need to waste time checking both repeating tables
+ * and default tables.
+ */
+ *repeatMode = FSE_repeat_none;
+ return set_basic;
+ }
+ }
+ } else {
+ size_t const basicCost = isDefaultAllowed ? ZSTD_crossEntropyCost(defaultNorm, defaultNormLog, count, max) : ERROR(GENERIC);
+ size_t const repeatCost = *repeatMode != FSE_repeat_none ? ZSTD_fseBitCost(prevCTable, count, max) : ERROR(GENERIC);
+ size_t const NCountCost = ZSTD_NCountCost(count, max, nbSeq, FSELog);
+ size_t const compressedCost = (NCountCost << 3) + ZSTD_entropyCost(count, max, nbSeq);
+
+ if (isDefaultAllowed) {
+ assert(!ZSTD_isError(basicCost));
+ assert(!(*repeatMode == FSE_repeat_valid && ZSTD_isError(repeatCost)));
+ }
+ assert(!ZSTD_isError(NCountCost));
+ assert(compressedCost < ERROR(maxCode));
+ DEBUGLOG(5, "Estimated bit costs: basic=%u\trepeat=%u\tcompressed=%u",
+ (unsigned)basicCost, (unsigned)repeatCost, (unsigned)compressedCost);
+ if (basicCost <= repeatCost && basicCost <= compressedCost) {
+ DEBUGLOG(5, "Selected set_basic");
+ assert(isDefaultAllowed);
+ *repeatMode = FSE_repeat_none;
+ return set_basic;
+ }
+ if (repeatCost <= compressedCost) {
+ DEBUGLOG(5, "Selected set_repeat");
+ assert(!ZSTD_isError(repeatCost));
+ return set_repeat;
+ }
+ assert(compressedCost < basicCost && compressedCost < repeatCost);
+ }
+ DEBUGLOG(5, "Selected set_compressed");
+ *repeatMode = FSE_repeat_check;
+ return set_compressed;
+}
+
+size_t
+ZSTD_buildCTable(void* dst, size_t dstCapacity,
+ FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e type,
+ unsigned* count, U32 max,
+ const BYTE* codeTable, size_t nbSeq,
+ const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
+ const FSE_CTable* prevCTable, size_t prevCTableSize,
+ void* workspace, size_t workspaceSize)
+{
+ BYTE* op = (BYTE*)dst;
+ const BYTE* const oend = op + dstCapacity;
+ DEBUGLOG(6, "ZSTD_buildCTable (dstCapacity=%u)", (unsigned)dstCapacity);
+
+ switch (type) {
+ case set_rle:
+ FORWARD_IF_ERROR(FSE_buildCTable_rle(nextCTable, (BYTE)max));
+ RETURN_ERROR_IF(dstCapacity==0, dstSize_tooSmall);
+ *op = codeTable[0];
+ return 1;
+ case set_repeat:
+ memcpy(nextCTable, prevCTable, prevCTableSize);
+ return 0;
+ case set_basic:
+ FORWARD_IF_ERROR(FSE_buildCTable_wksp(nextCTable, defaultNorm, defaultMax, defaultNormLog, workspace, workspaceSize)); /* note : could be pre-calculated */
+ return 0;
+ case set_compressed: {
+ S16 norm[MaxSeq + 1];
+ size_t nbSeq_1 = nbSeq;
+ const U32 tableLog = FSE_optimalTableLog(FSELog, nbSeq, max);
+ if (count[codeTable[nbSeq-1]] > 1) {
+ count[codeTable[nbSeq-1]]--;
+ nbSeq_1--;
+ }
+ assert(nbSeq_1 > 1);
+ FORWARD_IF_ERROR(FSE_normalizeCount(norm, tableLog, count, nbSeq_1, max));
+ { size_t const NCountSize = FSE_writeNCount(op, oend - op, norm, max, tableLog); /* overflow protected */
+ FORWARD_IF_ERROR(NCountSize);
+ FORWARD_IF_ERROR(FSE_buildCTable_wksp(nextCTable, norm, max, tableLog, workspace, workspaceSize));
+ return NCountSize;
+ }
+ }
+ default: assert(0); RETURN_ERROR(GENERIC);
+ }
+}
+
+FORCE_INLINE_TEMPLATE size_t
+ZSTD_encodeSequences_body(
+ void* dst, size_t dstCapacity,
+ FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
+ FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
+ FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
+ seqDef const* sequences, size_t nbSeq, int longOffsets)
+{
+ BIT_CStream_t blockStream;
+ FSE_CState_t stateMatchLength;
+ FSE_CState_t stateOffsetBits;
+ FSE_CState_t stateLitLength;
+
+ RETURN_ERROR_IF(
+ ERR_isError(BIT_initCStream(&blockStream, dst, dstCapacity)),
+ dstSize_tooSmall, "not enough space remaining");
+ DEBUGLOG(6, "available space for bitstream : %i (dstCapacity=%u)",
+ (int)(blockStream.endPtr - blockStream.startPtr),
+ (unsigned)dstCapacity);
+
+ /* first symbols */
+ FSE_initCState2(&stateMatchLength, CTable_MatchLength, mlCodeTable[nbSeq-1]);
+ FSE_initCState2(&stateOffsetBits, CTable_OffsetBits, ofCodeTable[nbSeq-1]);
+ FSE_initCState2(&stateLitLength, CTable_LitLength, llCodeTable[nbSeq-1]);
+ BIT_addBits(&blockStream, sequences[nbSeq-1].litLength, LL_bits[llCodeTable[nbSeq-1]]);
+ if (MEM_32bits()) BIT_flushBits(&blockStream);
+ BIT_addBits(&blockStream, sequences[nbSeq-1].matchLength, ML_bits[mlCodeTable[nbSeq-1]]);
+ if (MEM_32bits()) BIT_flushBits(&blockStream);
+ if (longOffsets) {
+ U32 const ofBits = ofCodeTable[nbSeq-1];
+ int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
+ if (extraBits) {
+ BIT_addBits(&blockStream, sequences[nbSeq-1].offset, extraBits);
+ BIT_flushBits(&blockStream);
+ }
+ BIT_addBits(&blockStream, sequences[nbSeq-1].offset >> extraBits,
+ ofBits - extraBits);
+ } else {
+ BIT_addBits(&blockStream, sequences[nbSeq-1].offset, ofCodeTable[nbSeq-1]);
+ }
+ BIT_flushBits(&blockStream);
+
+ { size_t n;
+ for (n=nbSeq-2 ; n<nbSeq ; n--) { /* intentional underflow */
+ BYTE const llCode = llCodeTable[n];
+ BYTE const ofCode = ofCodeTable[n];
+ BYTE const mlCode = mlCodeTable[n];
+ U32 const llBits = LL_bits[llCode];
+ U32 const ofBits = ofCode;
+ U32 const mlBits = ML_bits[mlCode];
+ DEBUGLOG(6, "encoding: litlen:%2u - matchlen:%2u - offCode:%7u",
+ (unsigned)sequences[n].litLength,
+ (unsigned)sequences[n].matchLength + MINMATCH,
+ (unsigned)sequences[n].offset);
+ /* 32b*/ /* 64b*/
+ /* (7)*/ /* (7)*/
+ FSE_encodeSymbol(&blockStream, &stateOffsetBits, ofCode); /* 15 */ /* 15 */
+ FSE_encodeSymbol(&blockStream, &stateMatchLength, mlCode); /* 24 */ /* 24 */
+ if (MEM_32bits()) BIT_flushBits(&blockStream); /* (7)*/
+ FSE_encodeSymbol(&blockStream, &stateLitLength, llCode); /* 16 */ /* 33 */
+ if (MEM_32bits() || (ofBits+mlBits+llBits >= 64-7-(LLFSELog+MLFSELog+OffFSELog)))
+ BIT_flushBits(&blockStream); /* (7)*/
+ BIT_addBits(&blockStream, sequences[n].litLength, llBits);
+ if (MEM_32bits() && ((llBits+mlBits)>24)) BIT_flushBits(&blockStream);
+ BIT_addBits(&blockStream, sequences[n].matchLength, mlBits);
+ if (MEM_32bits() || (ofBits+mlBits+llBits > 56)) BIT_flushBits(&blockStream);
+ if (longOffsets) {
+ int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
+ if (extraBits) {
+ BIT_addBits(&blockStream, sequences[n].offset, extraBits);
+ BIT_flushBits(&blockStream); /* (7)*/
+ }
+ BIT_addBits(&blockStream, sequences[n].offset >> extraBits,
+ ofBits - extraBits); /* 31 */
+ } else {
+ BIT_addBits(&blockStream, sequences[n].offset, ofBits); /* 31 */
+ }
+ BIT_flushBits(&blockStream); /* (7)*/
+ DEBUGLOG(7, "remaining space : %i", (int)(blockStream.endPtr - blockStream.ptr));
+ } }
+
+ DEBUGLOG(6, "ZSTD_encodeSequences: flushing ML state with %u bits", stateMatchLength.stateLog);
+ FSE_flushCState(&blockStream, &stateMatchLength);
+ DEBUGLOG(6, "ZSTD_encodeSequences: flushing Off state with %u bits", stateOffsetBits.stateLog);
+ FSE_flushCState(&blockStream, &stateOffsetBits);
+ DEBUGLOG(6, "ZSTD_encodeSequences: flushing LL state with %u bits", stateLitLength.stateLog);
+ FSE_flushCState(&blockStream, &stateLitLength);
+
+ { size_t const streamSize = BIT_closeCStream(&blockStream);
+ RETURN_ERROR_IF(streamSize==0, dstSize_tooSmall, "not enough space");
+ return streamSize;
+ }
+}
+
+static size_t
+ZSTD_encodeSequences_default(
+ void* dst, size_t dstCapacity,
+ FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
+ FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
+ FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
+ seqDef const* sequences, size_t nbSeq, int longOffsets)
+{
+ return ZSTD_encodeSequences_body(dst, dstCapacity,
+ CTable_MatchLength, mlCodeTable,
+ CTable_OffsetBits, ofCodeTable,
+ CTable_LitLength, llCodeTable,
+ sequences, nbSeq, longOffsets);
+}
+
+
+#if DYNAMIC_BMI2
+
+static TARGET_ATTRIBUTE("bmi2") size_t
+ZSTD_encodeSequences_bmi2(
+ void* dst, size_t dstCapacity,
+ FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
+ FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
+ FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
+ seqDef const* sequences, size_t nbSeq, int longOffsets)
+{
+ return ZSTD_encodeSequences_body(dst, dstCapacity,
+ CTable_MatchLength, mlCodeTable,
+ CTable_OffsetBits, ofCodeTable,
+ CTable_LitLength, llCodeTable,
+ sequences, nbSeq, longOffsets);
+}
+
+#endif
+
+size_t ZSTD_encodeSequences(
+ void* dst, size_t dstCapacity,
+ FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
+ FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
+ FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
+ seqDef const* sequences, size_t nbSeq, int longOffsets, int bmi2)
+{
+ DEBUGLOG(5, "ZSTD_encodeSequences: dstCapacity = %u", (unsigned)dstCapacity);
+#if DYNAMIC_BMI2
+ if (bmi2) {
+ return ZSTD_encodeSequences_bmi2(dst, dstCapacity,
+ CTable_MatchLength, mlCodeTable,
+ CTable_OffsetBits, ofCodeTable,
+ CTable_LitLength, llCodeTable,
+ sequences, nbSeq, longOffsets);
+ }
+#endif
+ (void)bmi2;
+ return ZSTD_encodeSequences_default(dst, dstCapacity,
+ CTable_MatchLength, mlCodeTable,
+ CTable_OffsetBits, ofCodeTable,
+ CTable_LitLength, llCodeTable,
+ sequences, nbSeq, longOffsets);
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_sequences.h Sun Sep 15 20:04:00 2019 -0700
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+#ifndef ZSTD_COMPRESS_SEQUENCES_H
+#define ZSTD_COMPRESS_SEQUENCES_H
+
+#include "fse.h" /* FSE_repeat, FSE_CTable */
+#include "zstd_internal.h" /* symbolEncodingType_e, ZSTD_strategy */
+
+typedef enum {
+ ZSTD_defaultDisallowed = 0,
+ ZSTD_defaultAllowed = 1
+} ZSTD_defaultPolicy_e;
+
+symbolEncodingType_e
+ZSTD_selectEncodingType(
+ FSE_repeat* repeatMode, unsigned const* count, unsigned const max,
+ size_t const mostFrequent, size_t nbSeq, unsigned const FSELog,
+ FSE_CTable const* prevCTable,
+ short const* defaultNorm, U32 defaultNormLog,
+ ZSTD_defaultPolicy_e const isDefaultAllowed,
+ ZSTD_strategy const strategy);
+
+size_t
+ZSTD_buildCTable(void* dst, size_t dstCapacity,
+ FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e type,
+ unsigned* count, U32 max,
+ const BYTE* codeTable, size_t nbSeq,
+ const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
+ const FSE_CTable* prevCTable, size_t prevCTableSize,
+ void* workspace, size_t workspaceSize);
+
+size_t ZSTD_encodeSequences(
+ void* dst, size_t dstCapacity,
+ FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
+ FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
+ FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
+ seqDef const* sequences, size_t nbSeq, int longOffsets, int bmi2);
+
+#endif /* ZSTD_COMPRESS_SEQUENCES_H */
--- a/contrib/python-zstandard/zstd/compress/zstd_double_fast.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_double_fast.c Sun Sep 15 20:04:00 2019 -0700
@@ -43,8 +43,7 @@
/* Only load extra positions for ZSTD_dtlm_full */
if (dtlm == ZSTD_dtlm_fast)
break;
- }
- }
+ } }
}
@@ -63,7 +62,11 @@
const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart;
const BYTE* anchor = istart;
- const U32 prefixLowestIndex = ms->window.dictLimit;
+ const U32 endIndex = (U32)((size_t)(istart - base) + srcSize);
+ const U32 lowestValid = ms->window.dictLimit;
+ const U32 maxDistance = 1U << cParams->windowLog;
+ /* presumes that, if there is a dictionary, it must be using Attach mode */
+ const U32 prefixLowestIndex = (endIndex - lowestValid > maxDistance) ? endIndex - maxDistance : lowestValid;
const BYTE* const prefixLowest = base + prefixLowestIndex;
const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - HASH_READ_SIZE;
@@ -95,8 +98,15 @@
dictCParams->chainLog : hBitsS;
const U32 dictAndPrefixLength = (U32)(ip - prefixLowest + dictEnd - dictStart);
+ DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_generic");
+
assert(dictMode == ZSTD_noDict || dictMode == ZSTD_dictMatchState);
+ /* if a dictionary is attached, it must be within window range */
+ if (dictMode == ZSTD_dictMatchState) {
+ assert(lowestValid + maxDistance >= endIndex);
+ }
+
/* init */
ip += (dictAndPrefixLength == 0);
if (dictMode == ZSTD_noDict) {
@@ -138,7 +148,7 @@
const BYTE* repMatchEnd = repIndex < prefixLowestIndex ? dictEnd : iend;
mLength = ZSTD_count_2segments(ip+1+4, repMatch+4, iend, repMatchEnd, prefixLowest) + 4;
ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, 0, mLength-MINMATCH);
goto _match_stored;
}
@@ -147,7 +157,7 @@
&& ((offset_1 > 0) & (MEM_read32(ip+1-offset_1) == MEM_read32(ip+1)))) {
mLength = ZSTD_count(ip+1+4, ip+1+4-offset_1, iend) + 4;
ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, 0, mLength-MINMATCH);
goto _match_stored;
}
@@ -170,8 +180,7 @@
offset = (U32)(current - dictMatchIndexL - dictIndexDelta);
while (((ip>anchor) & (dictMatchL>dictStart)) && (ip[-1] == dictMatchL[-1])) { ip--; dictMatchL--; mLength++; } /* catch up */
goto _match_found;
- }
- }
+ } }
if (matchIndexS > prefixLowestIndex) {
/* check prefix short match */
@@ -186,16 +195,14 @@
if (match > dictStart && MEM_read32(match) == MEM_read32(ip)) {
goto _search_next_long;
- }
- }
+ } }
ip += ((ip-anchor) >> kSearchStrength) + 1;
continue;
_search_next_long:
- {
- size_t const hl3 = ZSTD_hashPtr(ip+1, hBitsL, 8);
+ { size_t const hl3 = ZSTD_hashPtr(ip+1, hBitsL, 8);
size_t const dictHLNext = ZSTD_hashPtr(ip+1, dictHBitsL, 8);
U32 const matchIndexL3 = hashLong[hl3];
const BYTE* matchL3 = base + matchIndexL3;
@@ -221,9 +228,7 @@
offset = (U32)(current + 1 - dictMatchIndexL3 - dictIndexDelta);
while (((ip>anchor) & (dictMatchL3>dictStart)) && (ip[-1] == dictMatchL3[-1])) { ip--; dictMatchL3--; mLength++; } /* catch up */
goto _match_found;
- }
- }
- }
+ } } }
/* if no long +1 match, explore the short match we found */
if (dictMode == ZSTD_dictMatchState && matchIndexS < prefixLowestIndex) {
@@ -242,7 +247,7 @@
offset_2 = offset_1;
offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
_match_stored:
/* match found */
@@ -250,11 +255,14 @@
anchor = ip;
if (ip <= ilimit) {
- /* Fill Table */
- hashLong[ZSTD_hashPtr(base+current+2, hBitsL, 8)] =
- hashSmall[ZSTD_hashPtr(base+current+2, hBitsS, mls)] = current+2; /* here because current+2 could be > iend-8 */
- hashLong[ZSTD_hashPtr(ip-2, hBitsL, 8)] =
- hashSmall[ZSTD_hashPtr(ip-2, hBitsS, mls)] = (U32)(ip-2-base);
+ /* Complementary insertion */
+ /* done after iLimit test, as candidates could be > iend-8 */
+ { U32 const indexToInsert = current+2;
+ hashLong[ZSTD_hashPtr(base+indexToInsert, hBitsL, 8)] = indexToInsert;
+ hashLong[ZSTD_hashPtr(ip-2, hBitsL, 8)] = (U32)(ip-2-base);
+ hashSmall[ZSTD_hashPtr(base+indexToInsert, hBitsS, mls)] = indexToInsert;
+ hashSmall[ZSTD_hashPtr(ip-1, hBitsS, mls)] = (U32)(ip-1-base);
+ }
/* check immediate repcode */
if (dictMode == ZSTD_dictMatchState) {
@@ -278,8 +286,7 @@
continue;
}
break;
- }
- }
+ } }
if (dictMode == ZSTD_noDict) {
while ( (ip <= ilimit)
@@ -294,14 +301,15 @@
ip += rLength;
anchor = ip;
continue; /* faster when present ... (?) */
- } } } }
+ } } }
+ } /* while (ip < ilimit) */
/* save reps for next block */
rep[0] = offset_1 ? offset_1 : offsetSaved;
rep[1] = offset_2 ? offset_2 : offsetSaved;
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
@@ -360,10 +368,13 @@
const BYTE* anchor = istart;
const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - 8;
- const U32 prefixStartIndex = ms->window.dictLimit;
const BYTE* const base = ms->window.base;
+ const U32 endIndex = (U32)((size_t)(istart - base) + srcSize);
+ const U32 lowLimit = ZSTD_getLowestMatchIndex(ms, endIndex, cParams->windowLog);
+ const U32 dictStartIndex = lowLimit;
+ const U32 dictLimit = ms->window.dictLimit;
+ const U32 prefixStartIndex = (dictLimit > lowLimit) ? dictLimit : lowLimit;
const BYTE* const prefixStart = base + prefixStartIndex;
- const U32 dictStartIndex = ms->window.lowLimit;
const BYTE* const dictBase = ms->window.dictBase;
const BYTE* const dictStart = dictBase + dictStartIndex;
const BYTE* const dictEnd = dictBase + prefixStartIndex;
@@ -371,6 +382,10 @@
DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_extDict_generic (srcSize=%zu)", srcSize);
+ /* if extDict is invalidated due to maxDistance, switch to "regular" variant */
+ if (prefixStartIndex == dictStartIndex)
+ return ZSTD_compressBlock_doubleFast_generic(ms, seqStore, rep, src, srcSize, mls, ZSTD_noDict);
+
/* Search Loop */
while (ip < ilimit) { /* < instead of <=, because (ip+1) */
const size_t hSmall = ZSTD_hashPtr(ip, hBitsS, mls);
@@ -396,7 +411,7 @@
const BYTE* repMatchEnd = repIndex < prefixStartIndex ? dictEnd : iend;
mLength = ZSTD_count_2segments(ip+1+4, repMatch+4, iend, repMatchEnd, prefixStart) + 4;
ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, 0, mLength-MINMATCH);
} else {
if ((matchLongIndex > dictStartIndex) && (MEM_read64(matchLong) == MEM_read64(ip))) {
const BYTE* const matchEnd = matchLongIndex < prefixStartIndex ? dictEnd : iend;
@@ -407,7 +422,7 @@
while (((ip>anchor) & (matchLong>lowMatchPtr)) && (ip[-1] == matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */
offset_2 = offset_1;
offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
} else if ((matchIndex > dictStartIndex) && (MEM_read32(match) == MEM_read32(ip))) {
size_t const h3 = ZSTD_hashPtr(ip+1, hBitsL, 8);
@@ -432,23 +447,27 @@
}
offset_2 = offset_1;
offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
} else {
ip += ((ip-anchor) >> kSearchStrength) + 1;
continue;
} }
- /* found a match : store it */
+ /* move to next sequence start */
ip += mLength;
anchor = ip;
if (ip <= ilimit) {
- /* Fill Table */
- hashSmall[ZSTD_hashPtr(base+current+2, hBitsS, mls)] = current+2;
- hashLong[ZSTD_hashPtr(base+current+2, hBitsL, 8)] = current+2;
- hashSmall[ZSTD_hashPtr(ip-2, hBitsS, mls)] = (U32)(ip-2-base);
- hashLong[ZSTD_hashPtr(ip-2, hBitsL, 8)] = (U32)(ip-2-base);
+ /* Complementary insertion */
+ /* done after iLimit test, as candidates could be > iend-8 */
+ { U32 const indexToInsert = current+2;
+ hashLong[ZSTD_hashPtr(base+indexToInsert, hBitsL, 8)] = indexToInsert;
+ hashLong[ZSTD_hashPtr(ip-2, hBitsL, 8)] = (U32)(ip-2-base);
+ hashSmall[ZSTD_hashPtr(base+indexToInsert, hBitsS, mls)] = indexToInsert;
+ hashSmall[ZSTD_hashPtr(ip-1, hBitsS, mls)] = (U32)(ip-1-base);
+ }
+
/* check immediate repcode */
while (ip <= ilimit) {
U32 const current2 = (U32)(ip-base);
@@ -475,7 +494,7 @@
rep[1] = offset_2;
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
--- a/contrib/python-zstandard/zstd/compress/zstd_fast.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_fast.c Sun Sep 15 20:04:00 2019 -0700
@@ -13,7 +13,8 @@
void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
- void const* end, ZSTD_dictTableLoadMethod_e dtlm)
+ const void* const end,
+ ZSTD_dictTableLoadMethod_e dtlm)
{
const ZSTD_compressionParameters* const cParams = &ms->cParams;
U32* const hashTable = ms->hashTable;
@@ -41,11 +42,164 @@
} } } }
}
+
FORCE_INLINE_TEMPLATE
size_t ZSTD_compressBlock_fast_generic(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize,
- U32 const mls, ZSTD_dictMode_e const dictMode)
+ U32 const mls)
+{
+ const ZSTD_compressionParameters* const cParams = &ms->cParams;
+ U32* const hashTable = ms->hashTable;
+ U32 const hlog = cParams->hashLog;
+ /* support stepSize of 0 */
+ size_t const stepSize = cParams->targetLength + !(cParams->targetLength) + 1;
+ const BYTE* const base = ms->window.base;
+ const BYTE* const istart = (const BYTE*)src;
+ /* We check ip0 (ip + 0) and ip1 (ip + 1) each loop */
+ const BYTE* ip0 = istart;
+ const BYTE* ip1;
+ const BYTE* anchor = istart;
+ const U32 endIndex = (U32)((size_t)(istart - base) + srcSize);
+ const U32 maxDistance = 1U << cParams->windowLog;
+ const U32 validStartIndex = ms->window.dictLimit;
+ const U32 prefixStartIndex = (endIndex - validStartIndex > maxDistance) ? endIndex - maxDistance : validStartIndex;
+ const BYTE* const prefixStart = base + prefixStartIndex;
+ const BYTE* const iend = istart + srcSize;
+ const BYTE* const ilimit = iend - HASH_READ_SIZE;
+ U32 offset_1=rep[0], offset_2=rep[1];
+ U32 offsetSaved = 0;
+
+ /* init */
+ DEBUGLOG(5, "ZSTD_compressBlock_fast_generic");
+ ip0 += (ip0 == prefixStart);
+ ip1 = ip0 + 1;
+ {
+ U32 const maxRep = (U32)(ip0 - prefixStart);
+ if (offset_2 > maxRep) offsetSaved = offset_2, offset_2 = 0;
+ if (offset_1 > maxRep) offsetSaved = offset_1, offset_1 = 0;
+ }
+
+ /* Main Search Loop */
+ while (ip1 < ilimit) { /* < instead of <=, because check at ip0+2 */
+ size_t mLength;
+ BYTE const* ip2 = ip0 + 2;
+ size_t const h0 = ZSTD_hashPtr(ip0, hlog, mls);
+ U32 const val0 = MEM_read32(ip0);
+ size_t const h1 = ZSTD_hashPtr(ip1, hlog, mls);
+ U32 const val1 = MEM_read32(ip1);
+ U32 const current0 = (U32)(ip0-base);
+ U32 const current1 = (U32)(ip1-base);
+ U32 const matchIndex0 = hashTable[h0];
+ U32 const matchIndex1 = hashTable[h1];
+ BYTE const* repMatch = ip2-offset_1;
+ const BYTE* match0 = base + matchIndex0;
+ const BYTE* match1 = base + matchIndex1;
+ U32 offcode;
+ hashTable[h0] = current0; /* update hash table */
+ hashTable[h1] = current1; /* update hash table */
+
+ assert(ip0 + 1 == ip1);
+
+ if ((offset_1 > 0) & (MEM_read32(repMatch) == MEM_read32(ip2))) {
+ mLength = ip2[-1] == repMatch[-1] ? 1 : 0;
+ ip0 = ip2 - mLength;
+ match0 = repMatch - mLength;
+ offcode = 0;
+ goto _match;
+ }
+ if ((matchIndex0 > prefixStartIndex) && MEM_read32(match0) == val0) {
+ /* found a regular match */
+ goto _offset;
+ }
+ if ((matchIndex1 > prefixStartIndex) && MEM_read32(match1) == val1) {
+ /* found a regular match after one literal */
+ ip0 = ip1;
+ match0 = match1;
+ goto _offset;
+ }
+ {
+ size_t const step = ((ip0-anchor) >> (kSearchStrength - 1)) + stepSize;
+ assert(step >= 2);
+ ip0 += step;
+ ip1 += step;
+ continue;
+ }
+_offset: /* Requires: ip0, match0 */
+ /* Compute the offset code */
+ offset_2 = offset_1;
+ offset_1 = (U32)(ip0-match0);
+ offcode = offset_1 + ZSTD_REP_MOVE;
+ mLength = 0;
+ /* Count the backwards match length */
+ while (((ip0>anchor) & (match0>prefixStart))
+ && (ip0[-1] == match0[-1])) { ip0--; match0--; mLength++; } /* catch up */
+
+_match: /* Requires: ip0, match0, offcode */
+ /* Count the forward length */
+ mLength += ZSTD_count(ip0+mLength+4, match0+mLength+4, iend) + 4;
+ ZSTD_storeSeq(seqStore, ip0-anchor, anchor, offcode, mLength-MINMATCH);
+ /* match found */
+ ip0 += mLength;
+ anchor = ip0;
+ ip1 = ip0 + 1;
+
+ if (ip0 <= ilimit) {
+ /* Fill Table */
+ assert(base+current0+2 > istart); /* check base overflow */
+ hashTable[ZSTD_hashPtr(base+current0+2, hlog, mls)] = current0+2; /* here because current+2 could be > iend-8 */
+ hashTable[ZSTD_hashPtr(ip0-2, hlog, mls)] = (U32)(ip0-2-base);
+
+ while ( (ip0 <= ilimit)
+ && ( (offset_2>0)
+ & (MEM_read32(ip0) == MEM_read32(ip0 - offset_2)) )) {
+ /* store sequence */
+ size_t const rLength = ZSTD_count(ip0+4, ip0+4-offset_2, iend) + 4;
+ U32 const tmpOff = offset_2; offset_2 = offset_1; offset_1 = tmpOff; /* swap offset_2 <=> offset_1 */
+ hashTable[ZSTD_hashPtr(ip0, hlog, mls)] = (U32)(ip0-base);
+ ip0 += rLength;
+ ip1 = ip0 + 1;
+ ZSTD_storeSeq(seqStore, 0, anchor, 0, rLength-MINMATCH);
+ anchor = ip0;
+ continue; /* faster when present (confirmed on gcc-8) ... (?) */
+ }
+ }
+ }
+
+ /* save reps for next block */
+ rep[0] = offset_1 ? offset_1 : offsetSaved;
+ rep[1] = offset_2 ? offset_2 : offsetSaved;
+
+ /* Return the last literals size */
+ return (size_t)(iend - anchor);
+}
+
+
+size_t ZSTD_compressBlock_fast(
+ ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+ void const* src, size_t srcSize)
+{
+ ZSTD_compressionParameters const* cParams = &ms->cParams;
+ U32 const mls = cParams->minMatch;
+ assert(ms->dictMatchState == NULL);
+ switch(mls)
+ {
+ default: /* includes case 3 */
+ case 4 :
+ return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 4);
+ case 5 :
+ return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 5);
+ case 6 :
+ return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 6);
+ case 7 :
+ return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 7);
+ }
+}
+
+FORCE_INLINE_TEMPLATE
+size_t ZSTD_compressBlock_fast_dictMatchState_generic(
+ ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+ void const* src, size_t srcSize, U32 const mls)
{
const ZSTD_compressionParameters* const cParams = &ms->cParams;
U32* const hashTable = ms->hashTable;
@@ -64,46 +218,34 @@
U32 offsetSaved = 0;
const ZSTD_matchState_t* const dms = ms->dictMatchState;
- const ZSTD_compressionParameters* const dictCParams =
- dictMode == ZSTD_dictMatchState ?
- &dms->cParams : NULL;
- const U32* const dictHashTable = dictMode == ZSTD_dictMatchState ?
- dms->hashTable : NULL;
- const U32 dictStartIndex = dictMode == ZSTD_dictMatchState ?
- dms->window.dictLimit : 0;
- const BYTE* const dictBase = dictMode == ZSTD_dictMatchState ?
- dms->window.base : NULL;
- const BYTE* const dictStart = dictMode == ZSTD_dictMatchState ?
- dictBase + dictStartIndex : NULL;
- const BYTE* const dictEnd = dictMode == ZSTD_dictMatchState ?
- dms->window.nextSrc : NULL;
- const U32 dictIndexDelta = dictMode == ZSTD_dictMatchState ?
- prefixStartIndex - (U32)(dictEnd - dictBase) :
- 0;
+ const ZSTD_compressionParameters* const dictCParams = &dms->cParams ;
+ const U32* const dictHashTable = dms->hashTable;
+ const U32 dictStartIndex = dms->window.dictLimit;
+ const BYTE* const dictBase = dms->window.base;
+ const BYTE* const dictStart = dictBase + dictStartIndex;
+ const BYTE* const dictEnd = dms->window.nextSrc;
+ const U32 dictIndexDelta = prefixStartIndex - (U32)(dictEnd - dictBase);
const U32 dictAndPrefixLength = (U32)(ip - prefixStart + dictEnd - dictStart);
- const U32 dictHLog = dictMode == ZSTD_dictMatchState ?
- dictCParams->hashLog : hlog;
-
- assert(dictMode == ZSTD_noDict || dictMode == ZSTD_dictMatchState);
+ const U32 dictHLog = dictCParams->hashLog;
- /* otherwise, we would get index underflow when translating a dict index
- * into a local index */
- assert(dictMode != ZSTD_dictMatchState
- || prefixStartIndex >= (U32)(dictEnd - dictBase));
+ /* if a dictionary is still attached, it necessarily means that
+ * it is within window size. So we just check it. */
+ const U32 maxDistance = 1U << cParams->windowLog;
+ const U32 endIndex = (U32)((size_t)(ip - base) + srcSize);
+ assert(endIndex - prefixStartIndex <= maxDistance);
+ (void)maxDistance; (void)endIndex; /* these variables are not used when assert() is disabled */
+
+ /* ensure there will be no no underflow
+ * when translating a dict index into a local index */
+ assert(prefixStartIndex >= (U32)(dictEnd - dictBase));
/* init */
+ DEBUGLOG(5, "ZSTD_compressBlock_fast_dictMatchState_generic");
ip += (dictAndPrefixLength == 0);
- if (dictMode == ZSTD_noDict) {
- U32 const maxRep = (U32)(ip - prefixStart);
- if (offset_2 > maxRep) offsetSaved = offset_2, offset_2 = 0;
- if (offset_1 > maxRep) offsetSaved = offset_1, offset_1 = 0;
- }
- if (dictMode == ZSTD_dictMatchState) {
- /* dictMatchState repCode checks don't currently handle repCode == 0
- * disabling. */
- assert(offset_1 <= dictAndPrefixLength);
- assert(offset_2 <= dictAndPrefixLength);
- }
+ /* dictMatchState repCode checks don't currently handle repCode == 0
+ * disabling. */
+ assert(offset_1 <= dictAndPrefixLength);
+ assert(offset_2 <= dictAndPrefixLength);
/* Main Search Loop */
while (ip < ilimit) { /* < instead of <=, because repcode check at (ip+1) */
@@ -113,50 +255,37 @@
U32 const matchIndex = hashTable[h];
const BYTE* match = base + matchIndex;
const U32 repIndex = current + 1 - offset_1;
- const BYTE* repMatch = (dictMode == ZSTD_dictMatchState
- && repIndex < prefixStartIndex) ?
+ const BYTE* repMatch = (repIndex < prefixStartIndex) ?
dictBase + (repIndex - dictIndexDelta) :
base + repIndex;
hashTable[h] = current; /* update hash table */
- if ( (dictMode == ZSTD_dictMatchState)
- && ((U32)((prefixStartIndex-1) - repIndex) >= 3) /* intentional underflow : ensure repIndex isn't overlapping dict + prefix */
+ if ( ((U32)((prefixStartIndex-1) - repIndex) >= 3) /* intentional underflow : ensure repIndex isn't overlapping dict + prefix */
&& (MEM_read32(repMatch) == MEM_read32(ip+1)) ) {
const BYTE* const repMatchEnd = repIndex < prefixStartIndex ? dictEnd : iend;
mLength = ZSTD_count_2segments(ip+1+4, repMatch+4, iend, repMatchEnd, prefixStart) + 4;
ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
- } else if ( dictMode == ZSTD_noDict
- && ((offset_1 > 0) & (MEM_read32(ip+1-offset_1) == MEM_read32(ip+1)))) {
- mLength = ZSTD_count(ip+1+4, ip+1+4-offset_1, iend) + 4;
- ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, 0, mLength-MINMATCH);
} else if ( (matchIndex <= prefixStartIndex) ) {
- if (dictMode == ZSTD_dictMatchState) {
- size_t const dictHash = ZSTD_hashPtr(ip, dictHLog, mls);
- U32 const dictMatchIndex = dictHashTable[dictHash];
- const BYTE* dictMatch = dictBase + dictMatchIndex;
- if (dictMatchIndex <= dictStartIndex ||
- MEM_read32(dictMatch) != MEM_read32(ip)) {
- assert(stepSize >= 1);
- ip += ((ip-anchor) >> kSearchStrength) + stepSize;
- continue;
- } else {
- /* found a dict match */
- U32 const offset = (U32)(current-dictMatchIndex-dictIndexDelta);
- mLength = ZSTD_count_2segments(ip+4, dictMatch+4, iend, dictEnd, prefixStart) + 4;
- while (((ip>anchor) & (dictMatch>dictStart))
- && (ip[-1] == dictMatch[-1])) {
- ip--; dictMatch--; mLength++;
- } /* catch up */
- offset_2 = offset_1;
- offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
- }
- } else {
+ size_t const dictHash = ZSTD_hashPtr(ip, dictHLog, mls);
+ U32 const dictMatchIndex = dictHashTable[dictHash];
+ const BYTE* dictMatch = dictBase + dictMatchIndex;
+ if (dictMatchIndex <= dictStartIndex ||
+ MEM_read32(dictMatch) != MEM_read32(ip)) {
assert(stepSize >= 1);
ip += ((ip-anchor) >> kSearchStrength) + stepSize;
continue;
+ } else {
+ /* found a dict match */
+ U32 const offset = (U32)(current-dictMatchIndex-dictIndexDelta);
+ mLength = ZSTD_count_2segments(ip+4, dictMatch+4, iend, dictEnd, prefixStart) + 4;
+ while (((ip>anchor) & (dictMatch>dictStart))
+ && (ip[-1] == dictMatch[-1])) {
+ ip--; dictMatch--; mLength++;
+ } /* catch up */
+ offset_2 = offset_1;
+ offset_1 = offset;
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
}
} else if (MEM_read32(match) != MEM_read32(ip)) {
/* it's not a match, and we're not going to check the dictionary */
@@ -171,7 +300,7 @@
&& (ip[-1] == match[-1])) { ip--; match--; mLength++; } /* catch up */
offset_2 = offset_1;
offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
}
/* match found */
@@ -185,70 +314,34 @@
hashTable[ZSTD_hashPtr(ip-2, hlog, mls)] = (U32)(ip-2-base);
/* check immediate repcode */
- if (dictMode == ZSTD_dictMatchState) {
- while (ip <= ilimit) {
- U32 const current2 = (U32)(ip-base);
- U32 const repIndex2 = current2 - offset_2;
- const BYTE* repMatch2 = repIndex2 < prefixStartIndex ?
- dictBase - dictIndexDelta + repIndex2 :
- base + repIndex2;
- if ( ((U32)((prefixStartIndex-1) - (U32)repIndex2) >= 3 /* intentional overflow */)
- && (MEM_read32(repMatch2) == MEM_read32(ip)) ) {
- const BYTE* const repEnd2 = repIndex2 < prefixStartIndex ? dictEnd : iend;
- size_t const repLength2 = ZSTD_count_2segments(ip+4, repMatch2+4, iend, repEnd2, prefixStart) + 4;
- U32 tmpOffset = offset_2; offset_2 = offset_1; offset_1 = tmpOffset; /* swap offset_2 <=> offset_1 */
- ZSTD_storeSeq(seqStore, 0, anchor, 0, repLength2-MINMATCH);
- hashTable[ZSTD_hashPtr(ip, hlog, mls)] = current2;
- ip += repLength2;
- anchor = ip;
- continue;
- }
- break;
+ while (ip <= ilimit) {
+ U32 const current2 = (U32)(ip-base);
+ U32 const repIndex2 = current2 - offset_2;
+ const BYTE* repMatch2 = repIndex2 < prefixStartIndex ?
+ dictBase - dictIndexDelta + repIndex2 :
+ base + repIndex2;
+ if ( ((U32)((prefixStartIndex-1) - (U32)repIndex2) >= 3 /* intentional overflow */)
+ && (MEM_read32(repMatch2) == MEM_read32(ip)) ) {
+ const BYTE* const repEnd2 = repIndex2 < prefixStartIndex ? dictEnd : iend;
+ size_t const repLength2 = ZSTD_count_2segments(ip+4, repMatch2+4, iend, repEnd2, prefixStart) + 4;
+ U32 tmpOffset = offset_2; offset_2 = offset_1; offset_1 = tmpOffset; /* swap offset_2 <=> offset_1 */
+ ZSTD_storeSeq(seqStore, 0, anchor, 0, repLength2-MINMATCH);
+ hashTable[ZSTD_hashPtr(ip, hlog, mls)] = current2;
+ ip += repLength2;
+ anchor = ip;
+ continue;
}
+ break;
}
-
- if (dictMode == ZSTD_noDict) {
- while ( (ip <= ilimit)
- && ( (offset_2>0)
- & (MEM_read32(ip) == MEM_read32(ip - offset_2)) )) {
- /* store sequence */
- size_t const rLength = ZSTD_count(ip+4, ip+4-offset_2, iend) + 4;
- U32 const tmpOff = offset_2; offset_2 = offset_1; offset_1 = tmpOff; /* swap offset_2 <=> offset_1 */
- hashTable[ZSTD_hashPtr(ip, hlog, mls)] = (U32)(ip-base);
- ZSTD_storeSeq(seqStore, 0, anchor, 0, rLength-MINMATCH);
- ip += rLength;
- anchor = ip;
- continue; /* faster when present ... (?) */
- } } } }
+ }
+ }
/* save reps for next block */
rep[0] = offset_1 ? offset_1 : offsetSaved;
rep[1] = offset_2 ? offset_2 : offsetSaved;
/* Return the last literals size */
- return iend - anchor;
-}
-
-
-size_t ZSTD_compressBlock_fast(
- ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
- void const* src, size_t srcSize)
-{
- ZSTD_compressionParameters const* cParams = &ms->cParams;
- U32 const mls = cParams->minMatch;
- assert(ms->dictMatchState == NULL);
- switch(mls)
- {
- default: /* includes case 3 */
- case 4 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 4, ZSTD_noDict);
- case 5 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 5, ZSTD_noDict);
- case 6 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 6, ZSTD_noDict);
- case 7 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 7, ZSTD_noDict);
- }
+ return (size_t)(iend - anchor);
}
size_t ZSTD_compressBlock_fast_dictMatchState(
@@ -262,13 +355,13 @@
{
default: /* includes case 3 */
case 4 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 4, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_fast_dictMatchState_generic(ms, seqStore, rep, src, srcSize, 4);
case 5 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 5, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_fast_dictMatchState_generic(ms, seqStore, rep, src, srcSize, 5);
case 6 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 6, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_fast_dictMatchState_generic(ms, seqStore, rep, src, srcSize, 6);
case 7 :
- return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, 7, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_fast_dictMatchState_generic(ms, seqStore, rep, src, srcSize, 7);
}
}
@@ -287,15 +380,24 @@
const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart;
const BYTE* anchor = istart;
- const U32 dictStartIndex = ms->window.lowLimit;
+ const U32 endIndex = (U32)((size_t)(istart - base) + srcSize);
+ const U32 lowLimit = ZSTD_getLowestMatchIndex(ms, endIndex, cParams->windowLog);
+ const U32 dictStartIndex = lowLimit;
const BYTE* const dictStart = dictBase + dictStartIndex;
- const U32 prefixStartIndex = ms->window.dictLimit;
+ const U32 dictLimit = ms->window.dictLimit;
+ const U32 prefixStartIndex = dictLimit < lowLimit ? lowLimit : dictLimit;
const BYTE* const prefixStart = base + prefixStartIndex;
const BYTE* const dictEnd = dictBase + prefixStartIndex;
const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - 8;
U32 offset_1=rep[0], offset_2=rep[1];
+ DEBUGLOG(5, "ZSTD_compressBlock_fast_extDict_generic");
+
+ /* switch to "regular" variant if extDict is invalidated due to maxDistance */
+ if (prefixStartIndex == dictStartIndex)
+ return ZSTD_compressBlock_fast_generic(ms, seqStore, rep, src, srcSize, mls);
+
/* Search Loop */
while (ip < ilimit) { /* < instead of <=, because (ip+1) */
const size_t h = ZSTD_hashPtr(ip, hlog, mls);
@@ -312,10 +414,10 @@
if ( (((U32)((prefixStartIndex-1) - repIndex) >= 3) /* intentional underflow */ & (repIndex > dictStartIndex))
&& (MEM_read32(repMatch) == MEM_read32(ip+1)) ) {
- const BYTE* repMatchEnd = repIndex < prefixStartIndex ? dictEnd : iend;
- mLength = ZSTD_count_2segments(ip+1+4, repMatch+4, iend, repMatchEnd, prefixStart) + 4;
+ const BYTE* const repMatchEnd = repIndex < prefixStartIndex ? dictEnd : iend;
+ mLength = ZSTD_count_2segments(ip+1 +4, repMatch +4, iend, repMatchEnd, prefixStart) + 4;
ip++;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, 0, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, 0, mLength-MINMATCH);
} else {
if ( (matchIndex < dictStartIndex) ||
(MEM_read32(match) != MEM_read32(ip)) ) {
@@ -323,15 +425,15 @@
ip += ((ip-anchor) >> kSearchStrength) + stepSize;
continue;
}
- { const BYTE* matchEnd = matchIndex < prefixStartIndex ? dictEnd : iend;
- const BYTE* lowMatchPtr = matchIndex < prefixStartIndex ? dictStart : prefixStart;
+ { const BYTE* const matchEnd = matchIndex < prefixStartIndex ? dictEnd : iend;
+ const BYTE* const lowMatchPtr = matchIndex < prefixStartIndex ? dictStart : prefixStart;
U32 offset;
mLength = ZSTD_count_2segments(ip+4, match+4, iend, matchEnd, prefixStart) + 4;
while (((ip>anchor) & (match>lowMatchPtr)) && (ip[-1] == match[-1])) { ip--; match--; mLength++; } /* catch up */
offset = current - matchIndex;
offset_2 = offset_1;
offset_1 = offset;
- ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
+ ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
} }
/* found a match : store it */
@@ -351,7 +453,7 @@
&& (MEM_read32(repMatch2) == MEM_read32(ip)) ) {
const BYTE* const repEnd2 = repIndex2 < prefixStartIndex ? dictEnd : iend;
size_t const repLength2 = ZSTD_count_2segments(ip+4, repMatch2+4, iend, repEnd2, prefixStart) + 4;
- U32 tmpOffset = offset_2; offset_2 = offset_1; offset_1 = tmpOffset; /* swap offset_2 <=> offset_1 */
+ U32 const tmpOffset = offset_2; offset_2 = offset_1; offset_1 = tmpOffset; /* swap offset_2 <=> offset_1 */
ZSTD_storeSeq(seqStore, 0, anchor, 0, repLength2-MINMATCH);
hashTable[ZSTD_hashPtr(ip, hlog, mls)] = current2;
ip += repLength2;
@@ -366,7 +468,7 @@
rep[1] = offset_2;
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
--- a/contrib/python-zstandard/zstd/compress/zstd_lazy.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_lazy.c Sun Sep 15 20:04:00 2019 -0700
@@ -83,7 +83,10 @@
U32* largerPtr = smallerPtr + 1;
U32 matchIndex = *smallerPtr; /* this candidate is unsorted : next sorted candidate is reached through *smallerPtr, while *largerPtr contains previous unsorted candidate (which is already saved and can be overwritten) */
U32 dummy32; /* to be nullified at the end */
- U32 const windowLow = ms->window.lowLimit;
+ U32 const windowValid = ms->window.lowLimit;
+ U32 const maxDistance = 1U << cParams->windowLog;
+ U32 const windowLow = (current - windowValid > maxDistance) ? current - maxDistance : windowValid;
+
DEBUGLOG(8, "ZSTD_insertDUBT1(%u) (dictLimit=%u, lowLimit=%u)",
current, dictLimit, windowLow);
@@ -239,7 +242,7 @@
const BYTE* const base = ms->window.base;
U32 const current = (U32)(ip-base);
- U32 const windowLow = ms->window.lowLimit;
+ U32 const windowLow = ZSTD_getLowestMatchIndex(ms, current, cParams->windowLog);
U32* const bt = ms->chainTable;
U32 const btLog = cParams->chainLog - 1;
@@ -490,8 +493,12 @@
const U32 dictLimit = ms->window.dictLimit;
const BYTE* const prefixStart = base + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit;
- const U32 lowLimit = ms->window.lowLimit;
const U32 current = (U32)(ip-base);
+ const U32 maxDistance = 1U << cParams->windowLog;
+ const U32 lowestValid = ms->window.lowLimit;
+ const U32 withinMaxDistance = (current - lowestValid > maxDistance) ? current - maxDistance : lowestValid;
+ const U32 isDictionary = (ms->loadedDictEnd != 0);
+ const U32 lowLimit = isDictionary ? lowestValid : withinMaxDistance;
const U32 minChain = current > chainSize ? current - chainSize : 0;
U32 nbAttempts = 1U << cParams->searchLog;
size_t ml=4-1;
@@ -612,12 +619,14 @@
/* *******************************
* Common parser - lazy strategy
*********************************/
-FORCE_INLINE_TEMPLATE
-size_t ZSTD_compressBlock_lazy_generic(
+typedef enum { search_hashChain, search_binaryTree } searchMethod_e;
+
+FORCE_INLINE_TEMPLATE size_t
+ZSTD_compressBlock_lazy_generic(
ZSTD_matchState_t* ms, seqStore_t* seqStore,
U32 rep[ZSTD_REP_NUM],
const void* src, size_t srcSize,
- const U32 searchMethod, const U32 depth,
+ const searchMethod_e searchMethod, const U32 depth,
ZSTD_dictMode_e const dictMode)
{
const BYTE* const istart = (const BYTE*)src;
@@ -633,8 +642,10 @@
ZSTD_matchState_t* ms,
const BYTE* ip, const BYTE* iLimit, size_t* offsetPtr);
searchMax_f const searchMax = dictMode == ZSTD_dictMatchState ?
- (searchMethod ? ZSTD_BtFindBestMatch_dictMatchState_selectMLS : ZSTD_HcFindBestMatch_dictMatchState_selectMLS) :
- (searchMethod ? ZSTD_BtFindBestMatch_selectMLS : ZSTD_HcFindBestMatch_selectMLS);
+ (searchMethod==search_binaryTree ? ZSTD_BtFindBestMatch_dictMatchState_selectMLS
+ : ZSTD_HcFindBestMatch_dictMatchState_selectMLS) :
+ (searchMethod==search_binaryTree ? ZSTD_BtFindBestMatch_selectMLS
+ : ZSTD_HcFindBestMatch_selectMLS);
U32 offset_1 = rep[0], offset_2 = rep[1], savedOffset=0;
const ZSTD_matchState_t* const dms = ms->dictMatchState;
@@ -653,7 +664,6 @@
/* init */
ip += (dictAndPrefixLength == 0);
- ms->nextToUpdate3 = ms->nextToUpdate;
if (dictMode == ZSTD_noDict) {
U32 const maxRep = (U32)(ip - prefixLowest);
if (offset_2 > maxRep) savedOffset = offset_2, offset_2 = 0;
@@ -844,7 +854,7 @@
rep[1] = offset_2 ? offset_2 : savedOffset;
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
@@ -852,56 +862,56 @@
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 1, 2, ZSTD_noDict);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_binaryTree, 2, ZSTD_noDict);
}
size_t ZSTD_compressBlock_lazy2(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 2, ZSTD_noDict);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 2, ZSTD_noDict);
}
size_t ZSTD_compressBlock_lazy(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 1, ZSTD_noDict);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 1, ZSTD_noDict);
}
size_t ZSTD_compressBlock_greedy(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 0, ZSTD_noDict);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 0, ZSTD_noDict);
}
size_t ZSTD_compressBlock_btlazy2_dictMatchState(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 1, 2, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_binaryTree, 2, ZSTD_dictMatchState);
}
size_t ZSTD_compressBlock_lazy2_dictMatchState(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 2, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 2, ZSTD_dictMatchState);
}
size_t ZSTD_compressBlock_lazy_dictMatchState(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 1, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 1, ZSTD_dictMatchState);
}
size_t ZSTD_compressBlock_greedy_dictMatchState(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, 0, 0, ZSTD_dictMatchState);
+ return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 0, ZSTD_dictMatchState);
}
@@ -910,7 +920,7 @@
ZSTD_matchState_t* ms, seqStore_t* seqStore,
U32 rep[ZSTD_REP_NUM],
const void* src, size_t srcSize,
- const U32 searchMethod, const U32 depth)
+ const searchMethod_e searchMethod, const U32 depth)
{
const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart;
@@ -928,12 +938,11 @@
typedef size_t (*searchMax_f)(
ZSTD_matchState_t* ms,
const BYTE* ip, const BYTE* iLimit, size_t* offsetPtr);
- searchMax_f searchMax = searchMethod ? ZSTD_BtFindBestMatch_extDict_selectMLS : ZSTD_HcFindBestMatch_extDict_selectMLS;
+ searchMax_f searchMax = searchMethod==search_binaryTree ? ZSTD_BtFindBestMatch_extDict_selectMLS : ZSTD_HcFindBestMatch_extDict_selectMLS;
U32 offset_1 = rep[0], offset_2 = rep[1];
/* init */
- ms->nextToUpdate3 = ms->nextToUpdate;
ip += (ip == prefixStart);
/* Match Loop */
@@ -1070,7 +1079,7 @@
rep[1] = offset_2;
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
@@ -1078,7 +1087,7 @@
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, 0, 0);
+ return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 0);
}
size_t ZSTD_compressBlock_lazy_extDict(
@@ -1086,7 +1095,7 @@
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, 0, 1);
+ return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 1);
}
size_t ZSTD_compressBlock_lazy2_extDict(
@@ -1094,7 +1103,7 @@
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, 0, 2);
+ return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, search_hashChain, 2);
}
size_t ZSTD_compressBlock_btlazy2_extDict(
@@ -1102,5 +1111,5 @@
void const* src, size_t srcSize)
{
- return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, 1, 2);
+ return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src, srcSize, search_binaryTree, 2);
}
--- a/contrib/python-zstandard/zstd/compress/zstd_lazy.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_lazy.h Sun Sep 15 20:04:00 2019 -0700
@@ -19,7 +19,7 @@
U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip);
-void ZSTD_preserveUnsortedMark (U32* const table, U32 const size, U32 const reducerValue); /*! used in ZSTD_reduceIndex(). pre-emptively increase value of ZSTD_DUBT_UNSORTED_MARK */
+void ZSTD_preserveUnsortedMark (U32* const table, U32 const size, U32 const reducerValue); /*! used in ZSTD_reduceIndex(). preemptively increase value of ZSTD_DUBT_UNSORTED_MARK */
size_t ZSTD_compressBlock_btlazy2(
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
--- a/contrib/python-zstandard/zstd/compress/zstd_ldm.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_ldm.c Sun Sep 15 20:04:00 2019 -0700
@@ -429,7 +429,7 @@
*/
assert(ldmState->window.nextSrc >= (BYTE const*)src + srcSize);
/* The input could be very large (in zstdmt), so it must be broken up into
- * chunks to enforce the maximmum distance and handle overflow correction.
+ * chunks to enforce the maximum distance and handle overflow correction.
*/
assert(sequences->pos <= sequences->size);
assert(sequences->size <= sequences->capacity);
@@ -447,7 +447,7 @@
if (ZSTD_window_needOverflowCorrection(ldmState->window, chunkEnd)) {
U32 const ldmHSize = 1U << params->hashLog;
U32 const correction = ZSTD_window_correctOverflow(
- &ldmState->window, /* cycleLog */ 0, maxDist, src);
+ &ldmState->window, /* cycleLog */ 0, maxDist, chunkStart);
ZSTD_ldm_reduceTable(ldmState->hashTable, ldmHSize, correction);
}
/* 2. We enforce the maximum offset allowed.
--- a/contrib/python-zstandard/zstd/compress/zstd_opt.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstd_opt.c Sun Sep 15 20:04:00 2019 -0700
@@ -64,9 +64,15 @@
}
#endif
+static int ZSTD_compressedLiterals(optState_t const* const optPtr)
+{
+ return optPtr->literalCompressionMode != ZSTD_lcm_uncompressed;
+}
+
static void ZSTD_setBasePrices(optState_t* optPtr, int optLevel)
{
- optPtr->litSumBasePrice = WEIGHT(optPtr->litSum, optLevel);
+ if (ZSTD_compressedLiterals(optPtr))
+ optPtr->litSumBasePrice = WEIGHT(optPtr->litSum, optLevel);
optPtr->litLengthSumBasePrice = WEIGHT(optPtr->litLengthSum, optLevel);
optPtr->matchLengthSumBasePrice = WEIGHT(optPtr->matchLengthSum, optLevel);
optPtr->offCodeSumBasePrice = WEIGHT(optPtr->offCodeSum, optLevel);
@@ -99,6 +105,7 @@
const BYTE* const src, size_t const srcSize,
int const optLevel)
{
+ int const compressedLiterals = ZSTD_compressedLiterals(optPtr);
DEBUGLOG(5, "ZSTD_rescaleFreqs (srcSize=%u)", (unsigned)srcSize);
optPtr->priceType = zop_dynamic;
@@ -113,9 +120,10 @@
/* huffman table presumed generated by dictionary */
optPtr->priceType = zop_dynamic;
- assert(optPtr->litFreq != NULL);
- optPtr->litSum = 0;
- { unsigned lit;
+ if (compressedLiterals) {
+ unsigned lit;
+ assert(optPtr->litFreq != NULL);
+ optPtr->litSum = 0;
for (lit=0; lit<=MaxLit; lit++) {
U32 const scaleLog = 11; /* scale to 2K */
U32 const bitCost = HUF_getNbBits(optPtr->symbolCosts->huf.CTable, lit);
@@ -163,10 +171,11 @@
} else { /* not a dictionary */
assert(optPtr->litFreq != NULL);
- { unsigned lit = MaxLit;
+ if (compressedLiterals) {
+ unsigned lit = MaxLit;
HIST_count_simple(optPtr->litFreq, &lit, src, srcSize); /* use raw first block to init statistics */
+ optPtr->litSum = ZSTD_downscaleStat(optPtr->litFreq, MaxLit, 1);
}
- optPtr->litSum = ZSTD_downscaleStat(optPtr->litFreq, MaxLit, 1);
{ unsigned ll;
for (ll=0; ll<=MaxLL; ll++)
@@ -190,7 +199,8 @@
} else { /* new block : re-use previous statistics, scaled down */
- optPtr->litSum = ZSTD_downscaleStat(optPtr->litFreq, MaxLit, 1);
+ if (compressedLiterals)
+ optPtr->litSum = ZSTD_downscaleStat(optPtr->litFreq, MaxLit, 1);
optPtr->litLengthSum = ZSTD_downscaleStat(optPtr->litLengthFreq, MaxLL, 0);
optPtr->matchLengthSum = ZSTD_downscaleStat(optPtr->matchLengthFreq, MaxML, 0);
optPtr->offCodeSum = ZSTD_downscaleStat(optPtr->offCodeFreq, MaxOff, 0);
@@ -207,6 +217,10 @@
int optLevel)
{
if (litLength == 0) return 0;
+
+ if (!ZSTD_compressedLiterals(optPtr))
+ return (litLength << 3) * BITCOST_MULTIPLIER; /* Uncompressed - 8 bytes per literal. */
+
if (optPtr->priceType == zop_predef)
return (litLength*6) * BITCOST_MULTIPLIER; /* 6 bit per literal - no statistic used */
@@ -241,13 +255,13 @@
* to provide a cost which is directly comparable to a match ending at same position */
static int ZSTD_litLengthContribution(U32 const litLength, const optState_t* const optPtr, int optLevel)
{
- if (optPtr->priceType >= zop_predef) return WEIGHT(litLength, optLevel);
+ if (optPtr->priceType >= zop_predef) return (int)WEIGHT(litLength, optLevel);
/* dynamic statistics */
{ U32 const llCode = ZSTD_LLcode(litLength);
- int const contribution = (LL_bits[llCode] * BITCOST_MULTIPLIER)
- + WEIGHT(optPtr->litLengthFreq[0], optLevel) /* note: log2litLengthSum cancel out */
- - WEIGHT(optPtr->litLengthFreq[llCode], optLevel);
+ int const contribution = (int)(LL_bits[llCode] * BITCOST_MULTIPLIER)
+ + (int)WEIGHT(optPtr->litLengthFreq[0], optLevel) /* note: log2litLengthSum cancel out */
+ - (int)WEIGHT(optPtr->litLengthFreq[llCode], optLevel);
#if 1
return contribution;
#else
@@ -264,7 +278,7 @@
const optState_t* const optPtr,
int optLevel)
{
- int const contribution = ZSTD_rawLiteralsCost(literals, litLength, optPtr, optLevel)
+ int const contribution = (int)ZSTD_rawLiteralsCost(literals, litLength, optPtr, optLevel)
+ ZSTD_litLengthContribution(litLength, optPtr, optLevel);
return contribution;
}
@@ -310,7 +324,8 @@
U32 offsetCode, U32 matchLength)
{
/* literals */
- { U32 u;
+ if (ZSTD_compressedLiterals(optPtr)) {
+ U32 u;
for (u=0; u < litLength; u++)
optPtr->litFreq[literals[u]] += ZSTD_LITFREQ_ADD;
optPtr->litSum += litLength*ZSTD_LITFREQ_ADD;
@@ -357,13 +372,15 @@
/* Update hashTable3 up to ip (excluded)
Assumption : always within prefix (i.e. not within extDict) */
-static U32 ZSTD_insertAndFindFirstIndexHash3 (ZSTD_matchState_t* ms, const BYTE* const ip)
+static U32 ZSTD_insertAndFindFirstIndexHash3 (ZSTD_matchState_t* ms,
+ U32* nextToUpdate3,
+ const BYTE* const ip)
{
U32* const hashTable3 = ms->hashTable3;
U32 const hashLog3 = ms->hashLog3;
const BYTE* const base = ms->window.base;
- U32 idx = ms->nextToUpdate3;
- U32 const target = ms->nextToUpdate3 = (U32)(ip - base);
+ U32 idx = *nextToUpdate3;
+ U32 const target = (U32)(ip - base);
size_t const hash3 = ZSTD_hash3Ptr(ip, hashLog3);
assert(hashLog3 > 0);
@@ -372,6 +389,7 @@
idx++;
}
+ *nextToUpdate3 = target;
return hashTable3[hash3];
}
@@ -488,9 +506,11 @@
} }
*smallerPtr = *largerPtr = 0;
- if (bestLength > 384) return MIN(192, (U32)(bestLength - 384)); /* speed optimization */
- assert(matchEndIdx > current + 8);
- return matchEndIdx - (current + 8);
+ { U32 positions = 0;
+ if (bestLength > 384) positions = MIN(192, (U32)(bestLength - 384)); /* speed optimization */
+ assert(matchEndIdx > current + 8);
+ return MAX(positions, matchEndIdx - (current + 8));
+ }
}
FORCE_INLINE_TEMPLATE
@@ -505,8 +525,13 @@
DEBUGLOG(6, "ZSTD_updateTree_internal, from %u to %u (dictMode:%u)",
idx, target, dictMode);
- while(idx < target)
- idx += ZSTD_insertBt1(ms, base+idx, iend, mls, dictMode == ZSTD_extDict);
+ while(idx < target) {
+ U32 const forward = ZSTD_insertBt1(ms, base+idx, iend, mls, dictMode == ZSTD_extDict);
+ assert(idx < (U32)(idx + forward));
+ idx += forward;
+ }
+ assert((size_t)(ip - base) <= (size_t)(U32)(-1));
+ assert((size_t)(iend - base) <= (size_t)(U32)(-1));
ms->nextToUpdate = target;
}
@@ -516,11 +541,12 @@
FORCE_INLINE_TEMPLATE
U32 ZSTD_insertBtAndGetAllMatches (
+ ZSTD_match_t* matches, /* store result (found matches) in this table (presumed large enough) */
ZSTD_matchState_t* ms,
+ U32* nextToUpdate3,
const BYTE* const ip, const BYTE* const iLimit, const ZSTD_dictMode_e dictMode,
- U32 rep[ZSTD_REP_NUM],
+ const U32 rep[ZSTD_REP_NUM],
U32 const ll0, /* tells if associated literal length is 0 or not. This value must be 0 or 1 */
- ZSTD_match_t* matches,
const U32 lengthToBeat,
U32 const mls /* template */)
{
@@ -541,8 +567,8 @@
U32 const dictLimit = ms->window.dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const prefixStart = base + dictLimit;
- U32 const btLow = btMask >= current ? 0 : current - btMask;
- U32 const windowLow = ms->window.lowLimit;
+ U32 const btLow = (btMask >= current) ? 0 : current - btMask;
+ U32 const windowLow = ZSTD_getLowestMatchIndex(ms, current, cParams->windowLog);
U32 const matchLow = windowLow ? windowLow : 1;
U32* smallerPtr = bt + 2*(current&btMask);
U32* largerPtr = bt + 2*(current&btMask) + 1;
@@ -612,7 +638,7 @@
/* HC3 match finder */
if ((mls == 3) /*static*/ && (bestLength < mls)) {
- U32 const matchIndex3 = ZSTD_insertAndFindFirstIndexHash3(ms, ip);
+ U32 const matchIndex3 = ZSTD_insertAndFindFirstIndexHash3(ms, nextToUpdate3, ip);
if ((matchIndex3 >= matchLow)
& (current - matchIndex3 < (1<<18)) /*heuristic : longer distance likely too expensive*/ ) {
size_t mlen;
@@ -638,9 +664,7 @@
(ip+mlen == iLimit) ) { /* best possible length */
ms->nextToUpdate = current+1; /* skip insertion */
return 1;
- }
- }
- }
+ } } }
/* no dictMatchState lookup: dicts don't have a populated HC3 table */
}
@@ -648,19 +672,21 @@
while (nbCompares-- && (matchIndex >= matchLow)) {
U32* const nextPtr = bt + 2*(matchIndex & btMask);
+ const BYTE* match;
size_t matchLength = MIN(commonLengthSmaller, commonLengthLarger); /* guaranteed minimum nb of common bytes */
- const BYTE* match;
assert(current > matchIndex);
if ((dictMode == ZSTD_noDict) || (dictMode == ZSTD_dictMatchState) || (matchIndex+matchLength >= dictLimit)) {
assert(matchIndex+matchLength >= dictLimit); /* ensure the condition is correct when !extDict */
match = base + matchIndex;
+ if (matchIndex >= dictLimit) assert(memcmp(match, ip, matchLength) == 0); /* ensure early section of match is equal as expected */
matchLength += ZSTD_count(ip+matchLength, match+matchLength, iLimit);
} else {
match = dictBase + matchIndex;
+ assert(memcmp(match, ip, matchLength) == 0); /* ensure early section of match is equal as expected */
matchLength += ZSTD_count_2segments(ip+matchLength, match+matchLength, iLimit, dictEnd, prefixStart);
if (matchIndex+matchLength >= dictLimit)
- match = base + matchIndex; /* prepare for match[matchLength] */
+ match = base + matchIndex; /* prepare for match[matchLength] read */
}
if (matchLength > bestLength) {
@@ -745,10 +771,13 @@
FORCE_INLINE_TEMPLATE U32 ZSTD_BtGetAllMatches (
+ ZSTD_match_t* matches, /* store result (match found, increasing size) in this table */
ZSTD_matchState_t* ms,
+ U32* nextToUpdate3,
const BYTE* ip, const BYTE* const iHighLimit, const ZSTD_dictMode_e dictMode,
- U32 rep[ZSTD_REP_NUM], U32 const ll0,
- ZSTD_match_t* matches, U32 const lengthToBeat)
+ const U32 rep[ZSTD_REP_NUM],
+ U32 const ll0,
+ U32 const lengthToBeat)
{
const ZSTD_compressionParameters* const cParams = &ms->cParams;
U32 const matchLengthSearch = cParams->minMatch;
@@ -757,12 +786,12 @@
ZSTD_updateTree_internal(ms, ip, iHighLimit, matchLengthSearch, dictMode);
switch(matchLengthSearch)
{
- case 3 : return ZSTD_insertBtAndGetAllMatches(ms, ip, iHighLimit, dictMode, rep, ll0, matches, lengthToBeat, 3);
+ case 3 : return ZSTD_insertBtAndGetAllMatches(matches, ms, nextToUpdate3, ip, iHighLimit, dictMode, rep, ll0, lengthToBeat, 3);
default :
- case 4 : return ZSTD_insertBtAndGetAllMatches(ms, ip, iHighLimit, dictMode, rep, ll0, matches, lengthToBeat, 4);
- case 5 : return ZSTD_insertBtAndGetAllMatches(ms, ip, iHighLimit, dictMode, rep, ll0, matches, lengthToBeat, 5);
+ case 4 : return ZSTD_insertBtAndGetAllMatches(matches, ms, nextToUpdate3, ip, iHighLimit, dictMode, rep, ll0, lengthToBeat, 4);
+ case 5 : return ZSTD_insertBtAndGetAllMatches(matches, ms, nextToUpdate3, ip, iHighLimit, dictMode, rep, ll0, lengthToBeat, 5);
case 7 :
- case 6 : return ZSTD_insertBtAndGetAllMatches(ms, ip, iHighLimit, dictMode, rep, ll0, matches, lengthToBeat, 6);
+ case 6 : return ZSTD_insertBtAndGetAllMatches(matches, ms, nextToUpdate3, ip, iHighLimit, dictMode, rep, ll0, lengthToBeat, 6);
}
}
@@ -838,6 +867,7 @@
U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1);
U32 const minMatch = (cParams->minMatch == 3) ? 3 : 4;
+ U32 nextToUpdate3 = ms->nextToUpdate;
ZSTD_optimal_t* const opt = optStatePtr->priceTable;
ZSTD_match_t* const matches = optStatePtr->matchTable;
@@ -847,7 +877,6 @@
DEBUGLOG(5, "ZSTD_compressBlock_opt_generic: current=%u, prefix=%u, nextToUpdate=%u",
(U32)(ip - base), ms->window.dictLimit, ms->nextToUpdate);
assert(optLevel <= 2);
- ms->nextToUpdate3 = ms->nextToUpdate;
ZSTD_rescaleFreqs(optStatePtr, (const BYTE*)src, srcSize, optLevel);
ip += (ip==prefixStart);
@@ -858,7 +887,7 @@
/* find first match */
{ U32 const litlen = (U32)(ip - anchor);
U32 const ll0 = !litlen;
- U32 const nbMatches = ZSTD_BtGetAllMatches(ms, ip, iend, dictMode, rep, ll0, matches, minMatch);
+ U32 const nbMatches = ZSTD_BtGetAllMatches(matches, ms, &nextToUpdate3, ip, iend, dictMode, rep, ll0, minMatch);
if (!nbMatches) { ip++; continue; }
/* initialize opt[0] */
@@ -870,7 +899,7 @@
/* large match -> immediate encoding */
{ U32 const maxML = matches[nbMatches-1].len;
U32 const maxOffset = matches[nbMatches-1].off;
- DEBUGLOG(6, "found %u matches of maxLength=%u and maxOffCode=%u at cPos=%u => start new serie",
+ DEBUGLOG(6, "found %u matches of maxLength=%u and maxOffCode=%u at cPos=%u => start new series",
nbMatches, maxML, maxOffset, (U32)(ip-prefixStart));
if (maxML > sufficient_len) {
@@ -955,7 +984,7 @@
U32 const litlen = (opt[cur].mlen == 0) ? opt[cur].litlen : 0;
U32 const previousPrice = opt[cur].price;
U32 const basePrice = previousPrice + ZSTD_litLengthPrice(0, optStatePtr, optLevel);
- U32 const nbMatches = ZSTD_BtGetAllMatches(ms, inr, iend, dictMode, opt[cur].rep, ll0, matches, minMatch);
+ U32 const nbMatches = ZSTD_BtGetAllMatches(matches, ms, &nextToUpdate3, inr, iend, dictMode, opt[cur].rep, ll0, minMatch);
U32 matchNb;
if (!nbMatches) {
DEBUGLOG(7, "rPos:%u : no match found", cur);
@@ -1079,7 +1108,7 @@
} /* while (ip < ilimit) */
/* Return the last literals size */
- return iend - anchor;
+ return (size_t)(iend - anchor);
}
@@ -1108,7 +1137,8 @@
/* used in 2-pass strategy */
MEM_STATIC void ZSTD_upscaleStats(optState_t* optPtr)
{
- optPtr->litSum = ZSTD_upscaleStat(optPtr->litFreq, MaxLit, 0);
+ if (ZSTD_compressedLiterals(optPtr))
+ optPtr->litSum = ZSTD_upscaleStat(optPtr->litFreq, MaxLit, 0);
optPtr->litLengthSum = ZSTD_upscaleStat(optPtr->litLengthFreq, MaxLL, 0);
optPtr->matchLengthSum = ZSTD_upscaleStat(optPtr->matchLengthFreq, MaxML, 0);
optPtr->offCodeSum = ZSTD_upscaleStat(optPtr->offCodeFreq, MaxOff, 0);
@@ -1117,7 +1147,7 @@
/* ZSTD_initStats_ultra():
* make a first compression pass, just to seed stats with more accurate starting values.
* only works on first block, with no dictionary and no ldm.
- * this function cannot error, hence its constract must be respected.
+ * this function cannot error, hence its contract must be respected.
*/
static void
ZSTD_initStats_ultra(ZSTD_matchState_t* ms,
@@ -1142,7 +1172,6 @@
ms->window.dictLimit += (U32)srcSize;
ms->window.lowLimit = ms->window.dictLimit;
ms->nextToUpdate = ms->window.dictLimit;
- ms->nextToUpdate3 = ms->window.dictLimit;
/* re-inforce weight of collected statistics */
ZSTD_upscaleStats(&ms->opt);
--- a/contrib/python-zstandard/zstd/compress/zstdmt_compress.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstdmt_compress.c Sun Sep 15 20:04:00 2019 -0700
@@ -22,6 +22,7 @@
/* ====== Dependencies ====== */
#include <string.h> /* memcpy, memset */
#include <limits.h> /* INT_MAX, UINT_MAX */
+#include "mem.h" /* MEM_STATIC */
#include "pool.h" /* threadpool */
#include "threading.h" /* mutex */
#include "zstd_compress_internal.h" /* MIN, ERROR, ZSTD_*, ZSTD_highbit32 */
@@ -456,7 +457,7 @@
* Must be acquired after the main mutex when acquiring both.
*/
ZSTD_pthread_mutex_t ldmWindowMutex;
- ZSTD_pthread_cond_t ldmWindowCond; /* Signaled when ldmWindow is udpated */
+ ZSTD_pthread_cond_t ldmWindowCond; /* Signaled when ldmWindow is updated */
ZSTD_window_t ldmWindow; /* A thread-safe copy of ldmState.window */
} serialState_t;
@@ -647,7 +648,7 @@
buffer_t dstBuff = job->dstBuff;
size_t lastCBlockSize = 0;
- /* ressources */
+ /* resources */
if (cctx==NULL) JOB_ERROR(ERROR(memory_allocation));
if (dstBuff.start == NULL) { /* streaming job : doesn't provide a dstBuffer */
dstBuff = ZSTDMT_getBuffer(job->bufPool);
@@ -672,7 +673,7 @@
if (ZSTD_isError(initError)) JOB_ERROR(initError);
} else { /* srcStart points at reloaded section */
U64 const pledgedSrcSize = job->firstJob ? job->fullFrameSize : job->src.size;
- { size_t const forceWindowError = ZSTD_CCtxParam_setParameter(&jobParams, ZSTD_c_forceMaxWindow, !job->firstJob);
+ { size_t const forceWindowError = ZSTD_CCtxParams_setParameter(&jobParams, ZSTD_c_forceMaxWindow, !job->firstJob);
if (ZSTD_isError(forceWindowError)) JOB_ERROR(forceWindowError);
}
{ size_t const initError = ZSTD_compressBegin_advanced_internal(cctx,
@@ -864,14 +865,10 @@
* Internal use only */
size_t ZSTDMT_CCtxParam_setNbWorkers(ZSTD_CCtx_params* params, unsigned nbWorkers)
{
- if (nbWorkers > ZSTDMT_NBWORKERS_MAX) nbWorkers = ZSTDMT_NBWORKERS_MAX;
- params->nbWorkers = nbWorkers;
- params->overlapLog = ZSTDMT_OVERLAPLOG_DEFAULT;
- params->jobSize = 0;
- return nbWorkers;
+ return ZSTD_CCtxParams_setParameter(params, ZSTD_c_nbWorkers, (int)nbWorkers);
}
-ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers, ZSTD_customMem cMem)
+MEM_STATIC ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced_internal(unsigned nbWorkers, ZSTD_customMem cMem)
{
ZSTDMT_CCtx* mtctx;
U32 nbJobs = nbWorkers + 2;
@@ -906,6 +903,17 @@
return mtctx;
}
+ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers, ZSTD_customMem cMem)
+{
+#ifdef ZSTD_MULTITHREAD
+ return ZSTDMT_createCCtx_advanced_internal(nbWorkers, cMem);
+#else
+ (void)nbWorkers;
+ (void)cMem;
+ return NULL;
+#endif
+}
+
ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers)
{
return ZSTDMT_createCCtx_advanced(nbWorkers, ZSTD_defaultCMem);
@@ -986,26 +994,13 @@
{
case ZSTDMT_p_jobSize :
DEBUGLOG(4, "ZSTDMT_CCtxParam_setMTCtxParameter : set jobSize to %i", value);
- if ( value != 0 /* default */
- && value < ZSTDMT_JOBSIZE_MIN)
- value = ZSTDMT_JOBSIZE_MIN;
- assert(value >= 0);
- if (value > ZSTDMT_JOBSIZE_MAX) value = ZSTDMT_JOBSIZE_MAX;
- params->jobSize = value;
- return value;
-
+ return ZSTD_CCtxParams_setParameter(params, ZSTD_c_jobSize, value);
case ZSTDMT_p_overlapLog :
DEBUGLOG(4, "ZSTDMT_p_overlapLog : %i", value);
- if (value < ZSTD_OVERLAPLOG_MIN) value = ZSTD_OVERLAPLOG_MIN;
- if (value > ZSTD_OVERLAPLOG_MAX) value = ZSTD_OVERLAPLOG_MAX;
- params->overlapLog = value;
- return value;
-
+ return ZSTD_CCtxParams_setParameter(params, ZSTD_c_overlapLog, value);
case ZSTDMT_p_rsyncable :
- value = (value != 0);
- params->rsyncable = value;
- return value;
-
+ DEBUGLOG(4, "ZSTD_p_rsyncable : %i", value);
+ return ZSTD_CCtxParams_setParameter(params, ZSTD_c_rsyncable, value);
default :
return ERROR(parameter_unsupported);
}
@@ -1021,32 +1016,29 @@
{
switch (parameter) {
case ZSTDMT_p_jobSize:
- assert(mtctx->params.jobSize <= INT_MAX);
- *value = (int)(mtctx->params.jobSize);
- break;
+ return ZSTD_CCtxParams_getParameter(&mtctx->params, ZSTD_c_jobSize, value);
case ZSTDMT_p_overlapLog:
- *value = mtctx->params.overlapLog;
- break;
+ return ZSTD_CCtxParams_getParameter(&mtctx->params, ZSTD_c_overlapLog, value);
case ZSTDMT_p_rsyncable:
- *value = mtctx->params.rsyncable;
- break;
+ return ZSTD_CCtxParams_getParameter(&mtctx->params, ZSTD_c_rsyncable, value);
default:
return ERROR(parameter_unsupported);
}
- return 0;
}
/* Sets parameters relevant to the compression job,
* initializing others to default values. */
static ZSTD_CCtx_params ZSTDMT_initJobCCtxParams(ZSTD_CCtx_params const params)
{
- ZSTD_CCtx_params jobParams;
- memset(&jobParams, 0, sizeof(jobParams));
-
- jobParams.cParams = params.cParams;
- jobParams.fParams = params.fParams;
- jobParams.compressionLevel = params.compressionLevel;
-
+ ZSTD_CCtx_params jobParams = params;
+ /* Clear parameters related to multithreading */
+ jobParams.forceWindow = 0;
+ jobParams.nbWorkers = 0;
+ jobParams.jobSize = 0;
+ jobParams.overlapLog = 0;
+ jobParams.rsyncable = 0;
+ memset(&jobParams.ldmParams, 0, sizeof(ldmParams_t));
+ memset(&jobParams.customMem, 0, sizeof(ZSTD_customMem));
return jobParams;
}
@@ -1056,7 +1048,7 @@
static size_t ZSTDMT_resize(ZSTDMT_CCtx* mtctx, unsigned nbWorkers)
{
if (POOL_resize(mtctx->factory, nbWorkers)) return ERROR(memory_allocation);
- CHECK_F( ZSTDMT_expandJobsTable(mtctx, nbWorkers) );
+ FORWARD_IF_ERROR( ZSTDMT_expandJobsTable(mtctx, nbWorkers) );
mtctx->bufPool = ZSTDMT_expandBufferPool(mtctx->bufPool, nbWorkers);
if (mtctx->bufPool == NULL) return ERROR(memory_allocation);
mtctx->cctxPool = ZSTDMT_expandCCtxPool(mtctx->cctxPool, nbWorkers);
@@ -1137,9 +1129,14 @@
size_t const produced = ZSTD_isError(cResult) ? 0 : cResult;
size_t const flushed = ZSTD_isError(cResult) ? 0 : jobPtr->dstFlushed;
assert(flushed <= produced);
+ assert(jobPtr->consumed <= jobPtr->src.size);
toFlush = produced - flushed;
- if (toFlush==0 && (jobPtr->consumed >= jobPtr->src.size)) {
- /* doneJobID is not-fully-flushed, but toFlush==0 : doneJobID should be compressing some more data */
+ /* if toFlush==0, nothing is available to flush.
+ * However, jobID is expected to still be active:
+ * if jobID was already completed and fully flushed,
+ * ZSTDMT_flushProduced() should have already moved onto next job.
+ * Therefore, some input has not yet been consumed. */
+ if (toFlush==0) {
assert(jobPtr->consumed < jobPtr->src.size);
}
}
@@ -1156,12 +1153,16 @@
static unsigned ZSTDMT_computeTargetJobLog(ZSTD_CCtx_params const params)
{
- if (params.ldmParams.enableLdm)
+ unsigned jobLog;
+ if (params.ldmParams.enableLdm) {
/* In Long Range Mode, the windowLog is typically oversized.
* In which case, it's preferable to determine the jobSize
* based on chainLog instead. */
- return MAX(21, params.cParams.chainLog + 4);
- return MAX(20, params.cParams.windowLog + 2);
+ jobLog = MAX(21, params.cParams.chainLog + 4);
+ } else {
+ jobLog = MAX(20, params.cParams.windowLog + 2);
+ }
+ return MIN(jobLog, (unsigned)ZSTDMT_JOBLOG_MAX);
}
static int ZSTDMT_overlapLog_default(ZSTD_strategy strat)
@@ -1205,7 +1206,7 @@
ovLog = MIN(params.cParams.windowLog, ZSTDMT_computeTargetJobLog(params) - 2)
- overlapRLog;
}
- assert(0 <= ovLog && ovLog <= 30);
+ assert(0 <= ovLog && ovLog <= ZSTD_WINDOWLOG_MAX);
DEBUGLOG(4, "overlapLog : %i", params.overlapLog);
DEBUGLOG(4, "overlap size : %i", 1 << ovLog);
return (ovLog==0) ? 0 : (size_t)1 << ovLog;
@@ -1263,7 +1264,7 @@
if (ZSTDMT_serialState_reset(&mtctx->serial, mtctx->seqPool, params, avgJobSize))
return ERROR(memory_allocation);
- CHECK_F( ZSTDMT_expandJobsTable(mtctx, nbJobs) ); /* only expands if necessary */
+ FORWARD_IF_ERROR( ZSTDMT_expandJobsTable(mtctx, nbJobs) ); /* only expands if necessary */
{ unsigned u;
for (u=0; u<nbJobs; u++) {
@@ -1396,10 +1397,10 @@
/* init */
if (params.nbWorkers != mtctx->params.nbWorkers)
- CHECK_F( ZSTDMT_resize(mtctx, params.nbWorkers) );
+ FORWARD_IF_ERROR( ZSTDMT_resize(mtctx, params.nbWorkers) );
if (params.jobSize != 0 && params.jobSize < ZSTDMT_JOBSIZE_MIN) params.jobSize = ZSTDMT_JOBSIZE_MIN;
- if (params.jobSize > (size_t)ZSTDMT_JOBSIZE_MAX) params.jobSize = ZSTDMT_JOBSIZE_MAX;
+ if (params.jobSize > (size_t)ZSTDMT_JOBSIZE_MAX) params.jobSize = (size_t)ZSTDMT_JOBSIZE_MAX;
mtctx->singleBlockingThread = (pledgedSrcSize <= ZSTDMT_JOBSIZE_MIN); /* do not trigger multi-threading when srcSize is too small */
if (mtctx->singleBlockingThread) {
@@ -1440,6 +1441,8 @@
if (mtctx->targetSectionSize == 0) {
mtctx->targetSectionSize = 1ULL << ZSTDMT_computeTargetJobLog(params);
}
+ assert(mtctx->targetSectionSize <= (size_t)ZSTDMT_JOBSIZE_MAX);
+
if (params.rsyncable) {
/* Aim for the targetsectionSize as the average job size. */
U32 const jobSizeMB = (U32)(mtctx->targetSectionSize >> 20);
@@ -1547,7 +1550,7 @@
/* ZSTDMT_writeLastEmptyBlock()
* Write a single empty block with an end-of-frame to finish a frame.
* Job must be created from streaming variant.
- * This function is always successfull if expected conditions are fulfilled.
+ * This function is always successful if expected conditions are fulfilled.
*/
static void ZSTDMT_writeLastEmptyBlock(ZSTDMT_jobDescription* job)
{
@@ -1987,7 +1990,7 @@
assert(input->pos <= input->size);
if (mtctx->singleBlockingThread) { /* delegate to single-thread (synchronous) */
- return ZSTD_compressStream_generic(mtctx->cctxPool->cctx[0], output, input, endOp);
+ return ZSTD_compressStream2(mtctx->cctxPool->cctx[0], output, input, endOp);
}
if ((mtctx->frameEnded) && (endOp==ZSTD_e_continue)) {
@@ -2051,7 +2054,7 @@
|| ((endOp == ZSTD_e_end) && (!mtctx->frameEnded)) ) { /* must finish the frame with a zero-size block */
size_t const jobSize = mtctx->inBuff.filled;
assert(mtctx->inBuff.filled <= mtctx->targetSectionSize);
- CHECK_F( ZSTDMT_createCompressionJob(mtctx, jobSize, endOp) );
+ FORWARD_IF_ERROR( ZSTDMT_createCompressionJob(mtctx, jobSize, endOp) );
}
/* check for potential compressed data ready to be flushed */
@@ -2065,7 +2068,7 @@
size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input)
{
- CHECK_F( ZSTDMT_compressStream_generic(mtctx, output, input, ZSTD_e_continue) );
+ FORWARD_IF_ERROR( ZSTDMT_compressStream_generic(mtctx, output, input, ZSTD_e_continue) );
/* recommended next input size : fill current input buffer */
return mtctx->targetSectionSize - mtctx->inBuff.filled; /* note : could be zero when input buffer is fully filled and no more availability to create new job */
@@ -2082,7 +2085,7 @@
|| ((endFrame==ZSTD_e_end) && !mtctx->frameEnded)) { /* need a last 0-size block to end frame */
DEBUGLOG(5, "ZSTDMT_flushStream_internal : create a new job (%u bytes, end:%u)",
(U32)srcSize, (U32)endFrame);
- CHECK_F( ZSTDMT_createCompressionJob(mtctx, srcSize, endFrame) );
+ FORWARD_IF_ERROR( ZSTDMT_createCompressionJob(mtctx, srcSize, endFrame) );
}
/* check if there is any data available to flush */
--- a/contrib/python-zstandard/zstd/compress/zstdmt_compress.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/compress/zstdmt_compress.h Sun Sep 15 20:04:00 2019 -0700
@@ -17,10 +17,25 @@
/* Note : This is an internal API.
- * Some methods are still exposed (ZSTDLIB_API),
+ * These APIs used to be exposed with ZSTDLIB_API,
* because it used to be the only way to invoke MT compression.
- * Now, it's recommended to use ZSTD_compress_generic() instead.
- * These methods will stop being exposed in a future version */
+ * Now, it's recommended to use ZSTD_compress2 and ZSTD_compressStream2()
+ * instead.
+ *
+ * If you depend on these APIs and can't switch, then define
+ * ZSTD_LEGACY_MULTITHREADED_API when making the dynamic library.
+ * However, we may completely remove these functions in a future
+ * release, so please switch soon.
+ *
+ * This API requires ZSTD_MULTITHREAD to be defined during compilation,
+ * otherwise ZSTDMT_createCCtx*() will fail.
+ */
+
+#ifdef ZSTD_LEGACY_MULTITHREADED_API
+# define ZSTDMT_API ZSTDLIB_API
+#else
+# define ZSTDMT_API
+#endif
/* === Dependencies === */
#include <stddef.h> /* size_t */
@@ -35,22 +50,25 @@
#ifndef ZSTDMT_JOBSIZE_MIN
# define ZSTDMT_JOBSIZE_MIN (1 MB)
#endif
+#define ZSTDMT_JOBLOG_MAX (MEM_32bits() ? 29 : 30)
#define ZSTDMT_JOBSIZE_MAX (MEM_32bits() ? (512 MB) : (1024 MB))
/* === Memory management === */
typedef struct ZSTDMT_CCtx_s ZSTDMT_CCtx;
-ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers);
-ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers,
+/* Requires ZSTD_MULTITHREAD to be defined during compilation, otherwise it will return NULL. */
+ZSTDMT_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers);
+/* Requires ZSTD_MULTITHREAD to be defined during compilation, otherwise it will return NULL. */
+ZSTDMT_API ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers,
ZSTD_customMem cMem);
-ZSTDLIB_API size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* mtctx);
+ZSTDMT_API size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* mtctx);
-ZSTDLIB_API size_t ZSTDMT_sizeof_CCtx(ZSTDMT_CCtx* mtctx);
+ZSTDMT_API size_t ZSTDMT_sizeof_CCtx(ZSTDMT_CCtx* mtctx);
/* === Simple one-pass compression function === */
-ZSTDLIB_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx,
+ZSTDMT_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
@@ -59,31 +77,31 @@
/* === Streaming functions === */
-ZSTDLIB_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel);
-ZSTDLIB_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize); /**< if srcSize is not known at reset time, use ZSTD_CONTENTSIZE_UNKNOWN. Note: for compatibility with older programs, 0 means the same as ZSTD_CONTENTSIZE_UNKNOWN, but it will change in the future to mean "empty" */
+ZSTDMT_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel);
+ZSTDMT_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize); /**< if srcSize is not known at reset time, use ZSTD_CONTENTSIZE_UNKNOWN. Note: for compatibility with older programs, 0 means the same as ZSTD_CONTENTSIZE_UNKNOWN, but it will change in the future to mean "empty" */
-ZSTDLIB_API size_t ZSTDMT_nextInputSizeHint(const ZSTDMT_CCtx* mtctx);
-ZSTDLIB_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
+ZSTDMT_API size_t ZSTDMT_nextInputSizeHint(const ZSTDMT_CCtx* mtctx);
+ZSTDMT_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
-ZSTDLIB_API size_t ZSTDMT_flushStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /**< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) */
-ZSTDLIB_API size_t ZSTDMT_endStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /**< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) */
+ZSTDMT_API size_t ZSTDMT_flushStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /**< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) */
+ZSTDMT_API size_t ZSTDMT_endStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /**< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) */
/* === Advanced functions and parameters === */
-ZSTDLIB_API size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- const ZSTD_CDict* cdict,
- ZSTD_parameters params,
- int overlapLog);
+ZSTDMT_API size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const ZSTD_CDict* cdict,
+ ZSTD_parameters params,
+ int overlapLog);
-ZSTDLIB_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx,
+ZSTDMT_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx,
const void* dict, size_t dictSize, /* dict can be released after init, a local copy is preserved within zcs */
ZSTD_parameters params,
unsigned long long pledgedSrcSize); /* pledgedSrcSize is optional and can be zero == unknown */
-ZSTDLIB_API size_t ZSTDMT_initCStream_usingCDict(ZSTDMT_CCtx* mtctx,
+ZSTDMT_API size_t ZSTDMT_initCStream_usingCDict(ZSTDMT_CCtx* mtctx,
const ZSTD_CDict* cdict,
ZSTD_frameParameters fparams,
unsigned long long pledgedSrcSize); /* note : zero means empty */
@@ -92,7 +110,7 @@
* List of parameters that can be set using ZSTDMT_setMTCtxParameter() */
typedef enum {
ZSTDMT_p_jobSize, /* Each job is compressed in parallel. By default, this value is dynamically determined depending on compression parameters. Can be set explicitly here. */
- ZSTDMT_p_overlapLog, /* Each job may reload a part of previous job to enhance compressionr ratio; 0 == no overlap, 6(default) == use 1/8th of window, >=9 == use full window. This is a "sticky" parameter : its value will be re-used on next compression job */
+ ZSTDMT_p_overlapLog, /* Each job may reload a part of previous job to enhance compression ratio; 0 == no overlap, 6(default) == use 1/8th of window, >=9 == use full window. This is a "sticky" parameter : its value will be re-used on next compression job */
ZSTDMT_p_rsyncable /* Enables rsyncable mode. */
} ZSTDMT_parameter;
@@ -101,12 +119,12 @@
* The function must be called typically after ZSTD_createCCtx() but __before ZSTDMT_init*() !__
* Parameters not explicitly reset by ZSTDMT_init*() remain the same in consecutive compression sessions.
* @return : 0, or an error code (which can be tested using ZSTD_isError()) */
-ZSTDLIB_API size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int value);
+ZSTDMT_API size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int value);
/* ZSTDMT_getMTCtxParameter() :
* Query the ZSTDMT_CCtx for a parameter value.
* @return : 0, or an error code (which can be tested using ZSTD_isError()) */
-ZSTDLIB_API size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int* value);
+ZSTDMT_API size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int* value);
/*! ZSTDMT_compressStream_generic() :
@@ -116,7 +134,7 @@
* 0 if fully flushed
* or an error code
* note : needs to be init using any ZSTD_initCStream*() variant */
-ZSTDLIB_API size_t ZSTDMT_compressStream_generic(ZSTDMT_CCtx* mtctx,
+ZSTDMT_API size_t ZSTDMT_compressStream_generic(ZSTDMT_CCtx* mtctx,
ZSTD_outBuffer* output,
ZSTD_inBuffer* input,
ZSTD_EndDirective endOp);
--- a/contrib/python-zstandard/zstd/decompress/zstd_ddict.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/decompress/zstd_ddict.c Sun Sep 15 20:04:00 2019 -0700
@@ -105,9 +105,9 @@
ddict->dictID = MEM_readLE32((const char*)ddict->dictContent + ZSTD_FRAMEIDSIZE);
/* load entropy tables */
- CHECK_E( ZSTD_loadDEntropy(&ddict->entropy,
- ddict->dictContent, ddict->dictSize),
- dictionary_corrupted );
+ RETURN_ERROR_IF(ZSTD_isError(ZSTD_loadDEntropy(
+ &ddict->entropy, ddict->dictContent, ddict->dictSize)),
+ dictionary_corrupted);
ddict->entropyPresent = 1;
return 0;
}
@@ -133,7 +133,7 @@
ddict->entropy.hufTable[0] = (HUF_DTable)((HufLog)*0x1000001); /* cover both little and big endian */
/* parse dictionary content */
- CHECK_F( ZSTD_loadEntropy_intoDDict(ddict, dictContentType) );
+ FORWARD_IF_ERROR( ZSTD_loadEntropy_intoDDict(ddict, dictContentType) );
return 0;
}
--- a/contrib/python-zstandard/zstd/decompress/zstd_decompress.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress.c Sun Sep 15 20:04:00 2019 -0700
@@ -106,6 +106,7 @@
dctx->ddictLocal = NULL;
dctx->dictEnd = NULL;
dctx->ddictIsCold = 0;
+ dctx->dictUses = ZSTD_dont_use;
dctx->inBuff = NULL;
dctx->inBuffSize = 0;
dctx->outBuffSize = 0;
@@ -147,13 +148,20 @@
return ZSTD_createDCtx_advanced(ZSTD_defaultCMem);
}
+static void ZSTD_clearDict(ZSTD_DCtx* dctx)
+{
+ ZSTD_freeDDict(dctx->ddictLocal);
+ dctx->ddictLocal = NULL;
+ dctx->ddict = NULL;
+ dctx->dictUses = ZSTD_dont_use;
+}
+
size_t ZSTD_freeDCtx(ZSTD_DCtx* dctx)
{
if (dctx==NULL) return 0; /* support free on NULL */
- if (dctx->staticSize) return ERROR(memory_allocation); /* not compatible with static DCtx */
+ RETURN_ERROR_IF(dctx->staticSize, memory_allocation, "not compatible with static DCtx");
{ ZSTD_customMem const cMem = dctx->customMem;
- ZSTD_freeDDict(dctx->ddictLocal);
- dctx->ddictLocal = NULL;
+ ZSTD_clearDict(dctx);
ZSTD_free(dctx->inBuff, cMem);
dctx->inBuff = NULL;
#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT >= 1)
@@ -203,7 +211,7 @@
static size_t ZSTD_frameHeaderSize_internal(const void* src, size_t srcSize, ZSTD_format_e format)
{
size_t const minInputSize = ZSTD_startingInputLength(format);
- if (srcSize < minInputSize) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize < minInputSize, srcSize_wrong);
{ BYTE const fhd = ((const BYTE*)src)[minInputSize-1];
U32 const dictID= fhd & 3;
@@ -238,7 +246,7 @@
memset(zfhPtr, 0, sizeof(*zfhPtr)); /* not strictly necessary, but static analyzer do not understand that zfhPtr is only going to be read only if return value is zero, since they are 2 different signals */
if (srcSize < minInputSize) return minInputSize;
- if (src==NULL) return ERROR(GENERIC); /* invalid parameter */
+ RETURN_ERROR_IF(src==NULL, GENERIC, "invalid parameter");
if ( (format != ZSTD_f_zstd1_magicless)
&& (MEM_readLE32(src) != ZSTD_MAGICNUMBER) ) {
@@ -251,7 +259,7 @@
zfhPtr->frameType = ZSTD_skippableFrame;
return 0;
}
- return ERROR(prefix_unknown);
+ RETURN_ERROR(prefix_unknown);
}
/* ensure there is enough `srcSize` to fully read/decode frame header */
@@ -269,14 +277,13 @@
U64 windowSize = 0;
U32 dictID = 0;
U64 frameContentSize = ZSTD_CONTENTSIZE_UNKNOWN;
- if ((fhdByte & 0x08) != 0)
- return ERROR(frameParameter_unsupported); /* reserved bits, must be zero */
+ RETURN_ERROR_IF((fhdByte & 0x08) != 0, frameParameter_unsupported,
+ "reserved bits, must be zero");
if (!singleSegment) {
BYTE const wlByte = ip[pos++];
U32 const windowLog = (wlByte >> 3) + ZSTD_WINDOWLOG_ABSOLUTEMIN;
- if (windowLog > ZSTD_WINDOWLOG_MAX)
- return ERROR(frameParameter_windowTooLarge);
+ RETURN_ERROR_IF(windowLog > ZSTD_WINDOWLOG_MAX, frameParameter_windowTooLarge);
windowSize = (1ULL << windowLog);
windowSize += (windowSize >> 3) * (wlByte&7);
}
@@ -348,14 +355,16 @@
size_t const skippableHeaderSize = ZSTD_SKIPPABLEHEADERSIZE;
U32 sizeU32;
- if (srcSize < ZSTD_SKIPPABLEHEADERSIZE)
- return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize < ZSTD_SKIPPABLEHEADERSIZE, srcSize_wrong);
sizeU32 = MEM_readLE32((BYTE const*)src + ZSTD_FRAMEIDSIZE);
- if ((U32)(sizeU32 + ZSTD_SKIPPABLEHEADERSIZE) < sizeU32)
- return ERROR(frameParameter_unsupported);
-
- return skippableHeaderSize + sizeU32;
+ RETURN_ERROR_IF((U32)(sizeU32 + ZSTD_SKIPPABLEHEADERSIZE) < sizeU32,
+ frameParameter_unsupported);
+ {
+ size_t const skippableSize = skippableHeaderSize + sizeU32;
+ RETURN_ERROR_IF(skippableSize > srcSize, srcSize_wrong);
+ return skippableSize;
+ }
}
/** ZSTD_findDecompressedSize() :
@@ -372,11 +381,10 @@
if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
size_t const skippableSize = readSkippableFrameSize(src, srcSize);
- if (ZSTD_isError(skippableSize))
- return skippableSize;
- if (srcSize < skippableSize) {
+ if (ZSTD_isError(skippableSize)) {
return ZSTD_CONTENTSIZE_ERROR;
}
+ assert(skippableSize <= srcSize);
src = (const BYTE *)src + skippableSize;
srcSize -= skippableSize;
@@ -428,13 +436,91 @@
{
size_t const result = ZSTD_getFrameHeader_advanced(&(dctx->fParams), src, headerSize, dctx->format);
if (ZSTD_isError(result)) return result; /* invalid header */
- if (result>0) return ERROR(srcSize_wrong); /* headerSize too small */
- if (dctx->fParams.dictID && (dctx->dictID != dctx->fParams.dictID))
- return ERROR(dictionary_wrong);
+ RETURN_ERROR_IF(result>0, srcSize_wrong, "headerSize too small");
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+ /* Skip the dictID check in fuzzing mode, because it makes the search
+ * harder.
+ */
+ RETURN_ERROR_IF(dctx->fParams.dictID && (dctx->dictID != dctx->fParams.dictID),
+ dictionary_wrong);
+#endif
if (dctx->fParams.checksumFlag) XXH64_reset(&dctx->xxhState, 0);
return 0;
}
+static ZSTD_frameSizeInfo ZSTD_errorFrameSizeInfo(size_t ret)
+{
+ ZSTD_frameSizeInfo frameSizeInfo;
+ frameSizeInfo.compressedSize = ret;
+ frameSizeInfo.decompressedBound = ZSTD_CONTENTSIZE_ERROR;
+ return frameSizeInfo;
+}
+
+static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(const void* src, size_t srcSize)
+{
+ ZSTD_frameSizeInfo frameSizeInfo;
+ memset(&frameSizeInfo, 0, sizeof(ZSTD_frameSizeInfo));
+
+#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT >= 1)
+ if (ZSTD_isLegacy(src, srcSize))
+ return ZSTD_findFrameSizeInfoLegacy(src, srcSize);
+#endif
+
+ if ((srcSize >= ZSTD_SKIPPABLEHEADERSIZE)
+ && (MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
+ frameSizeInfo.compressedSize = readSkippableFrameSize(src, srcSize);
+ assert(ZSTD_isError(frameSizeInfo.compressedSize) ||
+ frameSizeInfo.compressedSize <= srcSize);
+ return frameSizeInfo;
+ } else {
+ const BYTE* ip = (const BYTE*)src;
+ const BYTE* const ipstart = ip;
+ size_t remainingSize = srcSize;
+ size_t nbBlocks = 0;
+ ZSTD_frameHeader zfh;
+
+ /* Extract Frame Header */
+ { size_t const ret = ZSTD_getFrameHeader(&zfh, src, srcSize);
+ if (ZSTD_isError(ret))
+ return ZSTD_errorFrameSizeInfo(ret);
+ if (ret > 0)
+ return ZSTD_errorFrameSizeInfo(ERROR(srcSize_wrong));
+ }
+
+ ip += zfh.headerSize;
+ remainingSize -= zfh.headerSize;
+
+ /* Iterate over each block */
+ while (1) {
+ blockProperties_t blockProperties;
+ size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSize, &blockProperties);
+ if (ZSTD_isError(cBlockSize))
+ return ZSTD_errorFrameSizeInfo(cBlockSize);
+
+ if (ZSTD_blockHeaderSize + cBlockSize > remainingSize)
+ return ZSTD_errorFrameSizeInfo(ERROR(srcSize_wrong));
+
+ ip += ZSTD_blockHeaderSize + cBlockSize;
+ remainingSize -= ZSTD_blockHeaderSize + cBlockSize;
+ nbBlocks++;
+
+ if (blockProperties.lastBlock) break;
+ }
+
+ /* Final frame content checksum */
+ if (zfh.checksumFlag) {
+ if (remainingSize < 4)
+ return ZSTD_errorFrameSizeInfo(ERROR(srcSize_wrong));
+ ip += 4;
+ }
+
+ frameSizeInfo.compressedSize = ip - ipstart;
+ frameSizeInfo.decompressedBound = (zfh.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN)
+ ? zfh.frameContentSize
+ : nbBlocks * zfh.blockSizeMax;
+ return frameSizeInfo;
+ }
+}
/** ZSTD_findFrameCompressedSize() :
* compatible with legacy mode
@@ -443,52 +529,33 @@
* @return : the compressed size of the frame starting at `src` */
size_t ZSTD_findFrameCompressedSize(const void *src, size_t srcSize)
{
-#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT >= 1)
- if (ZSTD_isLegacy(src, srcSize))
- return ZSTD_findFrameCompressedSizeLegacy(src, srcSize);
-#endif
- if ( (srcSize >= ZSTD_SKIPPABLEHEADERSIZE)
- && (MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START ) {
- return readSkippableFrameSize(src, srcSize);
- } else {
- const BYTE* ip = (const BYTE*)src;
- const BYTE* const ipstart = ip;
- size_t remainingSize = srcSize;
- ZSTD_frameHeader zfh;
-
- /* Extract Frame Header */
- { size_t const ret = ZSTD_getFrameHeader(&zfh, src, srcSize);
- if (ZSTD_isError(ret)) return ret;
- if (ret > 0) return ERROR(srcSize_wrong);
- }
-
- ip += zfh.headerSize;
- remainingSize -= zfh.headerSize;
-
- /* Loop on each block */
- while (1) {
- blockProperties_t blockProperties;
- size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSize, &blockProperties);
- if (ZSTD_isError(cBlockSize)) return cBlockSize;
-
- if (ZSTD_blockHeaderSize + cBlockSize > remainingSize)
- return ERROR(srcSize_wrong);
-
- ip += ZSTD_blockHeaderSize + cBlockSize;
- remainingSize -= ZSTD_blockHeaderSize + cBlockSize;
-
- if (blockProperties.lastBlock) break;
- }
-
- if (zfh.checksumFlag) { /* Final frame content checksum */
- if (remainingSize < 4) return ERROR(srcSize_wrong);
- ip += 4;
- }
-
- return ip - ipstart;
- }
+ ZSTD_frameSizeInfo const frameSizeInfo = ZSTD_findFrameSizeInfo(src, srcSize);
+ return frameSizeInfo.compressedSize;
}
+/** ZSTD_decompressBound() :
+ * compatible with legacy mode
+ * `src` must point to the start of a ZSTD frame or a skippeable frame
+ * `srcSize` must be at least as large as the frame contained
+ * @return : the maximum decompressed size of the compressed source
+ */
+unsigned long long ZSTD_decompressBound(const void* src, size_t srcSize)
+{
+ unsigned long long bound = 0;
+ /* Iterate over each frame */
+ while (srcSize > 0) {
+ ZSTD_frameSizeInfo const frameSizeInfo = ZSTD_findFrameSizeInfo(src, srcSize);
+ size_t const compressedSize = frameSizeInfo.compressedSize;
+ unsigned long long const decompressedBound = frameSizeInfo.decompressedBound;
+ if (ZSTD_isError(compressedSize) || decompressedBound == ZSTD_CONTENTSIZE_ERROR)
+ return ZSTD_CONTENTSIZE_ERROR;
+ assert(srcSize >= compressedSize);
+ src = (const BYTE*)src + compressedSize;
+ srcSize -= compressedSize;
+ bound += decompressedBound;
+ }
+ return bound;
+}
/*-*************************************************************
@@ -507,9 +574,10 @@
}
/** ZSTD_insertBlock() :
- insert `src` block into `dctx` history. Useful to track uncompressed blocks. */
+ * insert `src` block into `dctx` history. Useful to track uncompressed blocks. */
size_t ZSTD_insertBlock(ZSTD_DCtx* dctx, const void* blockStart, size_t blockSize)
{
+ DEBUGLOG(5, "ZSTD_insertBlock: %u bytes", (unsigned)blockSize);
ZSTD_checkContinuity(dctx, blockStart);
dctx->previousDstEnd = (const char*)blockStart + blockSize;
return blockSize;
@@ -522,9 +590,9 @@
DEBUGLOG(5, "ZSTD_copyRawBlock");
if (dst == NULL) {
if (srcSize == 0) return 0;
- return ERROR(dstBuffer_null);
+ RETURN_ERROR(dstBuffer_null);
}
- if (srcSize > dstCapacity) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(srcSize > dstCapacity, dstSize_tooSmall);
memcpy(dst, src, srcSize);
return srcSize;
}
@@ -535,9 +603,9 @@
{
if (dst == NULL) {
if (regenSize == 0) return 0;
- return ERROR(dstBuffer_null);
+ RETURN_ERROR(dstBuffer_null);
}
- if (regenSize > dstCapacity) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(regenSize > dstCapacity, dstSize_tooSmall);
memset(dst, b, regenSize);
return regenSize;
}
@@ -560,15 +628,16 @@
DEBUGLOG(4, "ZSTD_decompressFrame (srcSize:%i)", (int)*srcSizePtr);
/* check */
- if (remainingSrcSize < ZSTD_FRAMEHEADERSIZE_MIN+ZSTD_blockHeaderSize)
- return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(
+ remainingSrcSize < ZSTD_FRAMEHEADERSIZE_MIN+ZSTD_blockHeaderSize,
+ srcSize_wrong);
/* Frame Header */
{ size_t const frameHeaderSize = ZSTD_frameHeaderSize(ip, ZSTD_FRAMEHEADERSIZE_PREFIX);
if (ZSTD_isError(frameHeaderSize)) return frameHeaderSize;
- if (remainingSrcSize < frameHeaderSize+ZSTD_blockHeaderSize)
- return ERROR(srcSize_wrong);
- CHECK_F( ZSTD_decodeFrameHeader(dctx, ip, frameHeaderSize) );
+ RETURN_ERROR_IF(remainingSrcSize < frameHeaderSize+ZSTD_blockHeaderSize,
+ srcSize_wrong);
+ FORWARD_IF_ERROR( ZSTD_decodeFrameHeader(dctx, ip, frameHeaderSize) );
ip += frameHeaderSize; remainingSrcSize -= frameHeaderSize;
}
@@ -581,7 +650,7 @@
ip += ZSTD_blockHeaderSize;
remainingSrcSize -= ZSTD_blockHeaderSize;
- if (cBlockSize > remainingSrcSize) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(cBlockSize > remainingSrcSize, srcSize_wrong);
switch(blockProperties.blockType)
{
@@ -596,7 +665,7 @@
break;
case bt_reserved :
default:
- return ERROR(corruption_detected);
+ RETURN_ERROR(corruption_detected);
}
if (ZSTD_isError(decodedSize)) return decodedSize;
@@ -609,15 +678,15 @@
}
if (dctx->fParams.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN) {
- if ((U64)(op-ostart) != dctx->fParams.frameContentSize) {
- return ERROR(corruption_detected);
- } }
+ RETURN_ERROR_IF((U64)(op-ostart) != dctx->fParams.frameContentSize,
+ corruption_detected);
+ }
if (dctx->fParams.checksumFlag) { /* Frame content checksum verification */
U32 const checkCalc = (U32)XXH64_digest(&dctx->xxhState);
U32 checkRead;
- if (remainingSrcSize<4) return ERROR(checksum_wrong);
+ RETURN_ERROR_IF(remainingSrcSize<4, checksum_wrong);
checkRead = MEM_readLE32(ip);
- if (checkRead != checkCalc) return ERROR(checksum_wrong);
+ RETURN_ERROR_IF(checkRead != checkCalc, checksum_wrong);
ip += 4;
remainingSrcSize -= 4;
}
@@ -652,8 +721,8 @@
size_t decodedSize;
size_t const frameSize = ZSTD_findFrameCompressedSizeLegacy(src, srcSize);
if (ZSTD_isError(frameSize)) return frameSize;
- /* legacy support is not compatible with static dctx */
- if (dctx->staticSize) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(dctx->staticSize, memory_allocation,
+ "legacy support is not compatible with static dctx");
decodedSize = ZSTD_decompressLegacy(dst, dstCapacity, src, frameSize, dict, dictSize);
if (ZSTD_isError(decodedSize)) return decodedSize;
@@ -674,9 +743,8 @@
(unsigned)magicNumber, ZSTD_MAGICNUMBER);
if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
size_t const skippableSize = readSkippableFrameSize(src, srcSize);
- if (ZSTD_isError(skippableSize))
- return skippableSize;
- if (srcSize < skippableSize) return ERROR(srcSize_wrong);
+ FORWARD_IF_ERROR(skippableSize);
+ assert(skippableSize <= srcSize);
src = (const BYTE *)src + skippableSize;
srcSize -= skippableSize;
@@ -685,29 +753,29 @@
if (ddict) {
/* we were called from ZSTD_decompress_usingDDict */
- CHECK_F(ZSTD_decompressBegin_usingDDict(dctx, ddict));
+ FORWARD_IF_ERROR(ZSTD_decompressBegin_usingDDict(dctx, ddict));
} else {
/* this will initialize correctly with no dict if dict == NULL, so
* use this in all cases but ddict */
- CHECK_F(ZSTD_decompressBegin_usingDict(dctx, dict, dictSize));
+ FORWARD_IF_ERROR(ZSTD_decompressBegin_usingDict(dctx, dict, dictSize));
}
ZSTD_checkContinuity(dctx, dst);
{ const size_t res = ZSTD_decompressFrame(dctx, dst, dstCapacity,
&src, &srcSize);
- if ( (ZSTD_getErrorCode(res) == ZSTD_error_prefix_unknown)
- && (moreThan1Frame==1) ) {
- /* at least one frame successfully completed,
- * but following bytes are garbage :
- * it's more likely to be a srcSize error,
- * specifying more bytes than compressed size of frame(s).
- * This error message replaces ERROR(prefix_unknown),
- * which would be confusing, as the first header is actually correct.
- * Note that one could be unlucky, it might be a corruption error instead,
- * happening right at the place where we expect zstd magic bytes.
- * But this is _much_ less likely than a srcSize field error. */
- return ERROR(srcSize_wrong);
- }
+ RETURN_ERROR_IF(
+ (ZSTD_getErrorCode(res) == ZSTD_error_prefix_unknown)
+ && (moreThan1Frame==1),
+ srcSize_wrong,
+ "at least one frame successfully completed, but following "
+ "bytes are garbage: it's more likely to be a srcSize error, "
+ "specifying more bytes than compressed size of frame(s). This "
+ "error message replaces ERROR(prefix_unknown), which would be "
+ "confusing, as the first header is actually correct. Note that "
+ "one could be unlucky, it might be a corruption error instead, "
+ "happening right at the place where we expect zstd magic "
+ "bytes. But this is _much_ less likely than a srcSize field "
+ "error.");
if (ZSTD_isError(res)) return res;
assert(res <= dstCapacity);
dst = (BYTE*)dst + res;
@@ -716,7 +784,7 @@
moreThan1Frame = 1;
} /* while (srcSize >= ZSTD_frameHeaderSize_prefix) */
- if (srcSize) return ERROR(srcSize_wrong); /* input not entirely consumed */
+ RETURN_ERROR_IF(srcSize, srcSize_wrong, "input not entirely consumed");
return (BYTE*)dst - (BYTE*)dststart;
}
@@ -730,9 +798,26 @@
}
+static ZSTD_DDict const* ZSTD_getDDict(ZSTD_DCtx* dctx)
+{
+ switch (dctx->dictUses) {
+ default:
+ assert(0 /* Impossible */);
+ /* fall-through */
+ case ZSTD_dont_use:
+ ZSTD_clearDict(dctx);
+ return NULL;
+ case ZSTD_use_indefinitely:
+ return dctx->ddict;
+ case ZSTD_use_once:
+ dctx->dictUses = ZSTD_dont_use;
+ return dctx->ddict;
+ }
+}
+
size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
{
- return ZSTD_decompress_usingDict(dctx, dst, dstCapacity, src, srcSize, NULL, 0);
+ return ZSTD_decompress_usingDDict(dctx, dst, dstCapacity, src, srcSize, ZSTD_getDDict(dctx));
}
@@ -741,7 +826,7 @@
#if defined(ZSTD_HEAPMODE) && (ZSTD_HEAPMODE>=1)
size_t regenSize;
ZSTD_DCtx* const dctx = ZSTD_createDCtx();
- if (dctx==NULL) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(dctx==NULL, memory_allocation);
regenSize = ZSTD_decompressDCtx(dctx, dst, dstCapacity, src, srcSize);
ZSTD_freeDCtx(dctx);
return regenSize;
@@ -791,8 +876,7 @@
{
DEBUGLOG(5, "ZSTD_decompressContinue (srcSize:%u)", (unsigned)srcSize);
/* Sanity check */
- if (srcSize != dctx->expected)
- return ERROR(srcSize_wrong); /* not allowed */
+ RETURN_ERROR_IF(srcSize != dctx->expected, srcSize_wrong, "not allowed");
if (dstCapacity) ZSTD_checkContinuity(dctx, dst);
switch (dctx->stage)
@@ -817,7 +901,7 @@
case ZSTDds_decodeFrameHeader:
assert(src != NULL);
memcpy(dctx->headerBuffer + (dctx->headerSize - srcSize), src, srcSize);
- CHECK_F(ZSTD_decodeFrameHeader(dctx, dctx->headerBuffer, dctx->headerSize));
+ FORWARD_IF_ERROR(ZSTD_decodeFrameHeader(dctx, dctx->headerBuffer, dctx->headerSize));
dctx->expected = ZSTD_blockHeaderSize;
dctx->stage = ZSTDds_decodeBlockHeader;
return 0;
@@ -826,6 +910,7 @@
{ blockProperties_t bp;
size_t const cBlockSize = ZSTD_getcBlockSize(src, ZSTD_blockHeaderSize, &bp);
if (ZSTD_isError(cBlockSize)) return cBlockSize;
+ RETURN_ERROR_IF(cBlockSize > dctx->fParams.blockSizeMax, corruption_detected, "Block Size Exceeds Maximum");
dctx->expected = cBlockSize;
dctx->bType = bp.blockType;
dctx->rleSize = bp.origSize;
@@ -867,19 +952,20 @@
break;
case bt_reserved : /* should never happen */
default:
- return ERROR(corruption_detected);
+ RETURN_ERROR(corruption_detected);
}
if (ZSTD_isError(rSize)) return rSize;
+ RETURN_ERROR_IF(rSize > dctx->fParams.blockSizeMax, corruption_detected, "Decompressed Block Size Exceeds Maximum");
DEBUGLOG(5, "ZSTD_decompressContinue: decoded size from block : %u", (unsigned)rSize);
dctx->decodedSize += rSize;
if (dctx->fParams.checksumFlag) XXH64_update(&dctx->xxhState, dst, rSize);
if (dctx->stage == ZSTDds_decompressLastBlock) { /* end of frame */
DEBUGLOG(4, "ZSTD_decompressContinue: decoded size from frame : %u", (unsigned)dctx->decodedSize);
- if (dctx->fParams.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN) {
- if (dctx->decodedSize != dctx->fParams.frameContentSize) {
- return ERROR(corruption_detected);
- } }
+ RETURN_ERROR_IF(
+ dctx->fParams.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN
+ && dctx->decodedSize != dctx->fParams.frameContentSize,
+ corruption_detected);
if (dctx->fParams.checksumFlag) { /* another round for frame checksum */
dctx->expected = 4;
dctx->stage = ZSTDds_checkChecksum;
@@ -900,7 +986,7 @@
{ U32 const h32 = (U32)XXH64_digest(&dctx->xxhState);
U32 const check32 = MEM_readLE32(src);
DEBUGLOG(4, "ZSTD_decompressContinue: checksum : calculated %08X :: %08X read", (unsigned)h32, (unsigned)check32);
- if (check32 != h32) return ERROR(checksum_wrong);
+ RETURN_ERROR_IF(check32 != h32, checksum_wrong);
dctx->expected = 0;
dctx->stage = ZSTDds_getFrameHeaderSize;
return 0;
@@ -921,7 +1007,7 @@
default:
assert(0); /* impossible */
- return ERROR(GENERIC); /* some compiler require default to do something */
+ RETURN_ERROR(GENERIC); /* some compiler require default to do something */
}
}
@@ -945,7 +1031,7 @@
const BYTE* dictPtr = (const BYTE*)dict;
const BYTE* const dictEnd = dictPtr + dictSize;
- if (dictSize <= 8) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(dictSize <= 8, dictionary_corrupted);
assert(MEM_readLE32(dict) == ZSTD_MAGIC_DICTIONARY); /* dict must be valid */
dictPtr += 8; /* skip header = magic + dictID */
@@ -964,16 +1050,16 @@
dictPtr, dictEnd - dictPtr,
workspace, workspaceSize);
#endif
- if (HUF_isError(hSize)) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(HUF_isError(hSize), dictionary_corrupted);
dictPtr += hSize;
}
{ short offcodeNCount[MaxOff+1];
unsigned offcodeMaxValue = MaxOff, offcodeLog;
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
- if (offcodeMaxValue > MaxOff) return ERROR(dictionary_corrupted);
- if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(offcodeHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(offcodeMaxValue > MaxOff, dictionary_corrupted);
+ RETURN_ERROR_IF(offcodeLog > OffFSELog, dictionary_corrupted);
ZSTD_buildFSETable( entropy->OFTable,
offcodeNCount, offcodeMaxValue,
OF_base, OF_bits,
@@ -984,9 +1070,9 @@
{ short matchlengthNCount[MaxML+1];
unsigned matchlengthMaxValue = MaxML, matchlengthLog;
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted);
- if (matchlengthMaxValue > MaxML) return ERROR(dictionary_corrupted);
- if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(matchlengthHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(matchlengthMaxValue > MaxML, dictionary_corrupted);
+ RETURN_ERROR_IF(matchlengthLog > MLFSELog, dictionary_corrupted);
ZSTD_buildFSETable( entropy->MLTable,
matchlengthNCount, matchlengthMaxValue,
ML_base, ML_bits,
@@ -997,9 +1083,9 @@
{ short litlengthNCount[MaxLL+1];
unsigned litlengthMaxValue = MaxLL, litlengthLog;
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
- if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted);
- if (litlengthMaxValue > MaxLL) return ERROR(dictionary_corrupted);
- if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(FSE_isError(litlengthHeaderSize), dictionary_corrupted);
+ RETURN_ERROR_IF(litlengthMaxValue > MaxLL, dictionary_corrupted);
+ RETURN_ERROR_IF(litlengthLog > LLFSELog, dictionary_corrupted);
ZSTD_buildFSETable( entropy->LLTable,
litlengthNCount, litlengthMaxValue,
LL_base, LL_bits,
@@ -1007,12 +1093,13 @@
dictPtr += litlengthHeaderSize;
}
- if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(dictPtr+12 > dictEnd, dictionary_corrupted);
{ int i;
size_t const dictContentSize = (size_t)(dictEnd - (dictPtr+12));
for (i=0; i<3; i++) {
U32 const rep = MEM_readLE32(dictPtr); dictPtr += 4;
- if (rep==0 || rep >= dictContentSize) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(rep==0 || rep >= dictContentSize,
+ dictionary_corrupted);
entropy->rep[i] = rep;
} }
@@ -1030,7 +1117,7 @@
/* load entropy tables */
{ size_t const eSize = ZSTD_loadDEntropy(&dctx->entropy, dict, dictSize);
- if (ZSTD_isError(eSize)) return ERROR(dictionary_corrupted);
+ RETURN_ERROR_IF(ZSTD_isError(eSize), dictionary_corrupted);
dict = (const char*)dict + eSize;
dictSize -= eSize;
}
@@ -1064,9 +1151,11 @@
size_t ZSTD_decompressBegin_usingDict(ZSTD_DCtx* dctx, const void* dict, size_t dictSize)
{
- CHECK_F( ZSTD_decompressBegin(dctx) );
+ FORWARD_IF_ERROR( ZSTD_decompressBegin(dctx) );
if (dict && dictSize)
- CHECK_E(ZSTD_decompress_insertDictionary(dctx, dict, dictSize), dictionary_corrupted);
+ RETURN_ERROR_IF(
+ ZSTD_isError(ZSTD_decompress_insertDictionary(dctx, dict, dictSize)),
+ dictionary_corrupted);
return 0;
}
@@ -1085,7 +1174,7 @@
DEBUGLOG(4, "DDict is %s",
dctx->ddictIsCold ? "~cold~" : "hot!");
}
- CHECK_F( ZSTD_decompressBegin(dctx) );
+ FORWARD_IF_ERROR( ZSTD_decompressBegin(dctx) );
if (ddict) { /* NULL ddict is equivalent to no dictionary */
ZSTD_copyDDictParameters(dctx, ddict);
}
@@ -1104,7 +1193,7 @@
}
/*! ZSTD_getDictID_fromFrame() :
- * Provides the dictID required to decompresse frame stored within `src`.
+ * Provides the dictID required to decompress frame stored within `src`.
* If @return == 0, the dictID could not be decoded.
* This could for one of the following reasons :
* - The frame does not require a dictionary (most common case).
@@ -1176,15 +1265,14 @@
ZSTD_dictLoadMethod_e dictLoadMethod,
ZSTD_dictContentType_e dictContentType)
{
- if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
- ZSTD_freeDDict(dctx->ddictLocal);
+ RETURN_ERROR_IF(dctx->streamStage != zdss_init, stage_wrong);
+ ZSTD_clearDict(dctx);
if (dict && dictSize >= 8) {
dctx->ddictLocal = ZSTD_createDDict_advanced(dict, dictSize, dictLoadMethod, dictContentType, dctx->customMem);
- if (dctx->ddictLocal == NULL) return ERROR(memory_allocation);
- } else {
- dctx->ddictLocal = NULL;
+ RETURN_ERROR_IF(dctx->ddictLocal == NULL, memory_allocation);
+ dctx->ddict = dctx->ddictLocal;
+ dctx->dictUses = ZSTD_use_indefinitely;
}
- dctx->ddict = dctx->ddictLocal;
return 0;
}
@@ -1200,7 +1288,9 @@
size_t ZSTD_DCtx_refPrefix_advanced(ZSTD_DCtx* dctx, const void* prefix, size_t prefixSize, ZSTD_dictContentType_e dictContentType)
{
- return ZSTD_DCtx_loadDictionary_advanced(dctx, prefix, prefixSize, ZSTD_dlm_byRef, dictContentType);
+ FORWARD_IF_ERROR(ZSTD_DCtx_loadDictionary_advanced(dctx, prefix, prefixSize, ZSTD_dlm_byRef, dictContentType));
+ dctx->dictUses = ZSTD_use_once;
+ return 0;
}
size_t ZSTD_DCtx_refPrefix(ZSTD_DCtx* dctx, const void* prefix, size_t prefixSize)
@@ -1215,9 +1305,8 @@
size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize)
{
DEBUGLOG(4, "ZSTD_initDStream_usingDict");
- zds->streamStage = zdss_init;
- zds->noForwardProgress = 0;
- CHECK_F( ZSTD_DCtx_loadDictionary(zds, dict, dictSize) );
+ FORWARD_IF_ERROR( ZSTD_DCtx_reset(zds, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_DCtx_loadDictionary(zds, dict, dictSize) );
return ZSTD_FRAMEHEADERSIZE_PREFIX;
}
@@ -1225,7 +1314,7 @@
size_t ZSTD_initDStream(ZSTD_DStream* zds)
{
DEBUGLOG(4, "ZSTD_initDStream");
- return ZSTD_initDStream_usingDict(zds, NULL, 0);
+ return ZSTD_initDStream_usingDDict(zds, NULL);
}
/* ZSTD_initDStream_usingDDict() :
@@ -1233,9 +1322,9 @@
* this function cannot fail */
size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* dctx, const ZSTD_DDict* ddict)
{
- size_t const initResult = ZSTD_initDStream(dctx);
- dctx->ddict = ddict;
- return initResult;
+ FORWARD_IF_ERROR( ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only) );
+ FORWARD_IF_ERROR( ZSTD_DCtx_refDDict(dctx, ddict) );
+ return ZSTD_FRAMEHEADERSIZE_PREFIX;
}
/* ZSTD_resetDStream() :
@@ -1243,19 +1332,19 @@
* this function cannot fail */
size_t ZSTD_resetDStream(ZSTD_DStream* dctx)
{
- DEBUGLOG(4, "ZSTD_resetDStream");
- dctx->streamStage = zdss_loadHeader;
- dctx->lhSize = dctx->inPos = dctx->outStart = dctx->outEnd = 0;
- dctx->legacyVersion = 0;
- dctx->hostageByte = 0;
+ FORWARD_IF_ERROR(ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only));
return ZSTD_FRAMEHEADERSIZE_PREFIX;
}
size_t ZSTD_DCtx_refDDict(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict)
{
- if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
- dctx->ddict = ddict;
+ RETURN_ERROR_IF(dctx->streamStage != zdss_init, stage_wrong);
+ ZSTD_clearDict(dctx);
+ if (ddict) {
+ dctx->ddict = ddict;
+ dctx->dictUses = ZSTD_use_indefinitely;
+ }
return 0;
}
@@ -1267,9 +1356,9 @@
ZSTD_bounds const bounds = ZSTD_dParam_getBounds(ZSTD_d_windowLogMax);
size_t const min = (size_t)1 << bounds.lowerBound;
size_t const max = (size_t)1 << bounds.upperBound;
- if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
- if (maxWindowSize < min) return ERROR(parameter_outOfBound);
- if (maxWindowSize > max) return ERROR(parameter_outOfBound);
+ RETURN_ERROR_IF(dctx->streamStage != zdss_init, stage_wrong);
+ RETURN_ERROR_IF(maxWindowSize < min, parameter_outOfBound);
+ RETURN_ERROR_IF(maxWindowSize > max, parameter_outOfBound);
dctx->maxWindowSize = maxWindowSize;
return 0;
}
@@ -1311,15 +1400,15 @@
}
#define CHECK_DBOUNDS(p,v) { \
- if (!ZSTD_dParam_withinBounds(p, v)) \
- return ERROR(parameter_outOfBound); \
+ RETURN_ERROR_IF(!ZSTD_dParam_withinBounds(p, v), parameter_outOfBound); \
}
size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_dParameter dParam, int value)
{
- if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
+ RETURN_ERROR_IF(dctx->streamStage != zdss_init, stage_wrong);
switch(dParam) {
case ZSTD_d_windowLogMax:
+ if (value == 0) value = ZSTD_WINDOWLOG_LIMIT_DEFAULT;
CHECK_DBOUNDS(ZSTD_d_windowLogMax, value);
dctx->maxWindowSize = ((size_t)1) << value;
return 0;
@@ -1329,19 +1418,20 @@
return 0;
default:;
}
- return ERROR(parameter_unsupported);
+ RETURN_ERROR(parameter_unsupported);
}
size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDirective reset)
{
if ( (reset == ZSTD_reset_session_only)
|| (reset == ZSTD_reset_session_and_parameters) ) {
- (void)ZSTD_initDStream(dctx);
+ dctx->streamStage = zdss_init;
+ dctx->noForwardProgress = 0;
}
if ( (reset == ZSTD_reset_parameters)
|| (reset == ZSTD_reset_session_and_parameters) ) {
- if (dctx->streamStage != zdss_init)
- return ERROR(stage_wrong);
+ RETURN_ERROR_IF(dctx->streamStage != zdss_init, stage_wrong);
+ ZSTD_clearDict(dctx);
dctx->format = ZSTD_f_zstd1;
dctx->maxWindowSize = ZSTD_MAXWINDOWSIZE_DEFAULT;
}
@@ -1360,7 +1450,8 @@
unsigned long long const neededRBSize = windowSize + blockSize + (WILDCOPY_OVERLENGTH * 2);
unsigned long long const neededSize = MIN(frameContentSize, neededRBSize);
size_t const minRBSize = (size_t) neededSize;
- if ((unsigned long long)minRBSize != neededSize) return ERROR(frameParameter_windowTooLarge);
+ RETURN_ERROR_IF((unsigned long long)minRBSize != neededSize,
+ frameParameter_windowTooLarge);
return minRBSize;
}
@@ -1378,9 +1469,9 @@
ZSTD_frameHeader zfh;
size_t const err = ZSTD_getFrameHeader(&zfh, src, srcSize);
if (ZSTD_isError(err)) return err;
- if (err>0) return ERROR(srcSize_wrong);
- if (zfh.windowSize > windowSizeMax)
- return ERROR(frameParameter_windowTooLarge);
+ RETURN_ERROR_IF(err>0, srcSize_wrong);
+ RETURN_ERROR_IF(zfh.windowSize > windowSizeMax,
+ frameParameter_windowTooLarge);
return ZSTD_estimateDStreamSize((size_t)zfh.windowSize);
}
@@ -1406,16 +1497,16 @@
U32 someMoreWork = 1;
DEBUGLOG(5, "ZSTD_decompressStream");
- if (input->pos > input->size) { /* forbidden */
- DEBUGLOG(5, "in: pos: %u vs size: %u",
- (U32)input->pos, (U32)input->size);
- return ERROR(srcSize_wrong);
- }
- if (output->pos > output->size) { /* forbidden */
- DEBUGLOG(5, "out: pos: %u vs size: %u",
- (U32)output->pos, (U32)output->size);
- return ERROR(dstSize_tooSmall);
- }
+ RETURN_ERROR_IF(
+ input->pos > input->size,
+ srcSize_wrong,
+ "forbidden. in: pos: %u vs size: %u",
+ (U32)input->pos, (U32)input->size);
+ RETURN_ERROR_IF(
+ output->pos > output->size,
+ dstSize_tooSmall,
+ "forbidden. out: pos: %u vs size: %u",
+ (U32)output->pos, (U32)output->size);
DEBUGLOG(5, "input size : %u", (U32)(input->size - input->pos));
while (someMoreWork) {
@@ -1423,15 +1514,18 @@
{
case zdss_init :
DEBUGLOG(5, "stage zdss_init => transparent reset ");
- ZSTD_resetDStream(zds); /* transparent reset on starting decoding a new frame */
+ zds->streamStage = zdss_loadHeader;
+ zds->lhSize = zds->inPos = zds->outStart = zds->outEnd = 0;
+ zds->legacyVersion = 0;
+ zds->hostageByte = 0;
/* fall-through */
case zdss_loadHeader :
DEBUGLOG(5, "stage zdss_loadHeader (srcSize : %u)", (U32)(iend - ip));
#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
if (zds->legacyVersion) {
- /* legacy support is incompatible with static dctx */
- if (zds->staticSize) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(zds->staticSize, memory_allocation,
+ "legacy support is incompatible with static dctx");
{ size_t const hint = ZSTD_decompressLegacyStream(zds->legacyContext, zds->legacyVersion, output, input);
if (hint==0) zds->streamStage = zdss_init;
return hint;
@@ -1443,12 +1537,13 @@
#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
U32 const legacyVersion = ZSTD_isLegacy(istart, iend-istart);
if (legacyVersion) {
- const void* const dict = zds->ddict ? ZSTD_DDict_dictContent(zds->ddict) : NULL;
- size_t const dictSize = zds->ddict ? ZSTD_DDict_dictSize(zds->ddict) : 0;
+ ZSTD_DDict const* const ddict = ZSTD_getDDict(zds);
+ const void* const dict = ddict ? ZSTD_DDict_dictContent(ddict) : NULL;
+ size_t const dictSize = ddict ? ZSTD_DDict_dictSize(ddict) : 0;
DEBUGLOG(5, "ZSTD_decompressStream: detected legacy version v0.%u", legacyVersion);
- /* legacy support is incompatible with static dctx */
- if (zds->staticSize) return ERROR(memory_allocation);
- CHECK_F(ZSTD_initLegacyStream(&zds->legacyContext,
+ RETURN_ERROR_IF(zds->staticSize, memory_allocation,
+ "legacy support is incompatible with static dctx");
+ FORWARD_IF_ERROR(ZSTD_initLegacyStream(&zds->legacyContext,
zds->previousLegacyVersion, legacyVersion,
dict, dictSize));
zds->legacyVersion = zds->previousLegacyVersion = legacyVersion;
@@ -1482,7 +1577,7 @@
size_t const cSize = ZSTD_findFrameCompressedSize(istart, iend-istart);
if (cSize <= (size_t)(iend-istart)) {
/* shortcut : using single-pass mode */
- size_t const decompressedSize = ZSTD_decompress_usingDDict(zds, op, oend-op, istart, cSize, zds->ddict);
+ size_t const decompressedSize = ZSTD_decompress_usingDDict(zds, op, oend-op, istart, cSize, ZSTD_getDDict(zds));
if (ZSTD_isError(decompressedSize)) return decompressedSize;
DEBUGLOG(4, "shortcut to single-pass ZSTD_decompress_usingDDict()")
ip = istart + cSize;
@@ -1495,13 +1590,13 @@
/* Consume header (see ZSTDds_decodeFrameHeader) */
DEBUGLOG(4, "Consume header");
- CHECK_F(ZSTD_decompressBegin_usingDDict(zds, zds->ddict));
+ FORWARD_IF_ERROR(ZSTD_decompressBegin_usingDDict(zds, ZSTD_getDDict(zds)));
if ((MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) { /* skippable frame */
zds->expected = MEM_readLE32(zds->headerBuffer + ZSTD_FRAMEIDSIZE);
zds->stage = ZSTDds_skipFrame;
} else {
- CHECK_F(ZSTD_decodeFrameHeader(zds, zds->headerBuffer, zds->lhSize));
+ FORWARD_IF_ERROR(ZSTD_decodeFrameHeader(zds, zds->headerBuffer, zds->lhSize));
zds->expected = ZSTD_blockHeaderSize;
zds->stage = ZSTDds_decodeBlockHeader;
}
@@ -1511,7 +1606,8 @@
(U32)(zds->fParams.windowSize >>10),
(U32)(zds->maxWindowSize >> 10) );
zds->fParams.windowSize = MAX(zds->fParams.windowSize, 1U << ZSTD_WINDOWLOG_ABSOLUTEMIN);
- if (zds->fParams.windowSize > zds->maxWindowSize) return ERROR(frameParameter_windowTooLarge);
+ RETURN_ERROR_IF(zds->fParams.windowSize > zds->maxWindowSize,
+ frameParameter_windowTooLarge);
/* Adapt buffer sizes to frame header instructions */
{ size_t const neededInBuffSize = MAX(zds->fParams.blockSizeMax, 4 /* frame checksum */);
@@ -1525,14 +1621,15 @@
if (zds->staticSize) { /* static DCtx */
DEBUGLOG(4, "staticSize : %u", (U32)zds->staticSize);
assert(zds->staticSize >= sizeof(ZSTD_DCtx)); /* controlled at init */
- if (bufferSize > zds->staticSize - sizeof(ZSTD_DCtx))
- return ERROR(memory_allocation);
+ RETURN_ERROR_IF(
+ bufferSize > zds->staticSize - sizeof(ZSTD_DCtx),
+ memory_allocation);
} else {
ZSTD_free(zds->inBuff, zds->customMem);
zds->inBuffSize = 0;
zds->outBuffSize = 0;
zds->inBuff = (char*)ZSTD_malloc(bufferSize, zds->customMem);
- if (zds->inBuff == NULL) return ERROR(memory_allocation);
+ RETURN_ERROR_IF(zds->inBuff == NULL, memory_allocation);
}
zds->inBuffSize = neededInBuffSize;
zds->outBuff = zds->inBuff + zds->inBuffSize;
@@ -1574,7 +1671,9 @@
if (isSkipFrame) {
loadedSize = MIN(toLoad, (size_t)(iend-ip));
} else {
- if (toLoad > zds->inBuffSize - zds->inPos) return ERROR(corruption_detected); /* should never happen */
+ RETURN_ERROR_IF(toLoad > zds->inBuffSize - zds->inPos,
+ corruption_detected,
+ "should never happen");
loadedSize = ZSTD_limitCopy(zds->inBuff + zds->inPos, toLoad, ip, iend-ip);
}
ip += loadedSize;
@@ -1615,7 +1714,7 @@
default:
assert(0); /* impossible */
- return ERROR(GENERIC); /* some compiler require default to do something */
+ RETURN_ERROR(GENERIC); /* some compiler require default to do something */
} }
/* result */
@@ -1624,8 +1723,8 @@
if ((ip==istart) && (op==ostart)) { /* no forward progress */
zds->noForwardProgress ++;
if (zds->noForwardProgress >= ZSTD_NO_FORWARD_PROGRESS_MAX) {
- if (op==oend) return ERROR(dstSize_tooSmall);
- if (ip==iend) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(op==oend, dstSize_tooSmall);
+ RETURN_ERROR_IF(ip==iend, srcSize_wrong);
assert(0);
}
} else {
--- a/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c Sun Sep 15 20:04:00 2019 -0700
@@ -56,14 +56,15 @@
size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
blockProperties_t* bpPtr)
{
- if (srcSize < ZSTD_blockHeaderSize) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize < ZSTD_blockHeaderSize, srcSize_wrong);
+
{ U32 const cBlockHeader = MEM_readLE24(src);
U32 const cSize = cBlockHeader >> 3;
bpPtr->lastBlock = cBlockHeader & 1;
bpPtr->blockType = (blockType_e)((cBlockHeader >> 1) & 3);
bpPtr->origSize = cSize; /* only useful for RLE */
if (bpPtr->blockType == bt_rle) return 1;
- if (bpPtr->blockType == bt_reserved) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(bpPtr->blockType == bt_reserved, corruption_detected);
return cSize;
}
}
@@ -78,7 +79,8 @@
size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
const void* src, size_t srcSize) /* note : srcSize < BLOCKSIZE */
{
- if (srcSize < MIN_CBLOCK_SIZE) return ERROR(corruption_detected);
+ DEBUGLOG(5, "ZSTD_decodeLiteralsBlock");
+ RETURN_ERROR_IF(srcSize < MIN_CBLOCK_SIZE, corruption_detected);
{ const BYTE* const istart = (const BYTE*) src;
symbolEncodingType_e const litEncType = (symbolEncodingType_e)(istart[0] & 3);
@@ -86,11 +88,12 @@
switch(litEncType)
{
case set_repeat:
- if (dctx->litEntropy==0) return ERROR(dictionary_corrupted);
+ DEBUGLOG(5, "set_repeat flag : re-using stats from previous compressed literals block");
+ RETURN_ERROR_IF(dctx->litEntropy==0, dictionary_corrupted);
/* fall-through */
case set_compressed:
- if (srcSize < 5) return ERROR(corruption_detected); /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need up to 5 for case 3 */
+ RETURN_ERROR_IF(srcSize < 5, corruption_detected, "srcSize >= MIN_CBLOCK_SIZE == 3; here we need up to 5 for case 3");
{ size_t lhSize, litSize, litCSize;
U32 singleStream=0;
U32 const lhlCode = (istart[0] >> 2) & 3;
@@ -115,11 +118,11 @@
/* 2 - 2 - 18 - 18 */
lhSize = 5;
litSize = (lhc >> 4) & 0x3FFFF;
- litCSize = (lhc >> 22) + (istart[4] << 10);
+ litCSize = (lhc >> 22) + ((size_t)istart[4] << 10);
break;
}
- if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
- if (litCSize + lhSize > srcSize) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_detected);
+ RETURN_ERROR_IF(litCSize + lhSize > srcSize, corruption_detected);
/* prefetch huffman table if cold */
if (dctx->ddictIsCold && (litSize > 768 /* heuristic */)) {
@@ -157,7 +160,7 @@
}
}
- if (HUF_isError(hufSuccess)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(HUF_isError(hufSuccess), corruption_detected);
dctx->litPtr = dctx->litBuffer;
dctx->litSize = litSize;
@@ -187,7 +190,7 @@
}
if (lhSize+litSize+WILDCOPY_OVERLENGTH > srcSize) { /* risk reading beyond src buffer with wildcopy */
- if (litSize+lhSize > srcSize) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(litSize+lhSize > srcSize, corruption_detected);
memcpy(dctx->litBuffer, istart+lhSize, litSize);
dctx->litPtr = dctx->litBuffer;
dctx->litSize = litSize;
@@ -216,17 +219,17 @@
case 3:
lhSize = 3;
litSize = MEM_readLE24(istart) >> 4;
- if (srcSize<4) return ERROR(corruption_detected); /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need lhSize+1 = 4 */
+ RETURN_ERROR_IF(srcSize<4, corruption_detected, "srcSize >= MIN_CBLOCK_SIZE == 3; here we need lhSize+1 = 4");
break;
}
- if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_detected);
memset(dctx->litBuffer, istart[lhSize], litSize + WILDCOPY_OVERLENGTH);
dctx->litPtr = dctx->litBuffer;
dctx->litSize = litSize;
return lhSize+1;
}
default:
- return ERROR(corruption_detected); /* impossible */
+ RETURN_ERROR(corruption_detected, "impossible");
}
}
}
@@ -390,7 +393,8 @@
symbolNext[s] = 1;
} else {
if (normalizedCounter[s] >= largeLimit) DTableH.fastMode=0;
- symbolNext[s] = normalizedCounter[s];
+ assert(normalizedCounter[s]>=0);
+ symbolNext[s] = (U16)normalizedCounter[s];
} } }
memcpy(dt, &DTableH, sizeof(DTableH));
}
@@ -436,8 +440,8 @@
switch(type)
{
case set_rle :
- if (!srcSize) return ERROR(srcSize_wrong);
- if ( (*(const BYTE*)src) > max) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(!srcSize, srcSize_wrong);
+ RETURN_ERROR_IF((*(const BYTE*)src) > max, corruption_detected);
{ U32 const symbol = *(const BYTE*)src;
U32 const baseline = baseValue[symbol];
U32 const nbBits = nbAdditionalBits[symbol];
@@ -449,7 +453,7 @@
*DTablePtr = defaultTable;
return 0;
case set_repeat:
- if (!flagRepeatTable) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(!flagRepeatTable, corruption_detected);
/* prefetch FSE table if used */
if (ddictIsCold && (nbSeq > 24 /* heuristic */)) {
const void* const pStart = *DTablePtr;
@@ -461,15 +465,15 @@
{ unsigned tableLog;
S16 norm[MaxSeq+1];
size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize);
- if (FSE_isError(headerSize)) return ERROR(corruption_detected);
- if (tableLog > maxLog) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(FSE_isError(headerSize), corruption_detected);
+ RETURN_ERROR_IF(tableLog > maxLog, corruption_detected);
ZSTD_buildFSETable(DTableSpace, norm, max, baseValue, nbAdditionalBits, tableLog);
*DTablePtr = DTableSpace;
return headerSize;
}
- default : /* impossible */
+ default :
assert(0);
- return ERROR(GENERIC);
+ RETURN_ERROR(GENERIC, "impossible");
}
}
@@ -483,28 +487,28 @@
DEBUGLOG(5, "ZSTD_decodeSeqHeaders");
/* check */
- if (srcSize < MIN_SEQUENCES_SIZE) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize < MIN_SEQUENCES_SIZE, srcSize_wrong);
/* SeqHead */
nbSeq = *ip++;
if (!nbSeq) {
*nbSeqPtr=0;
- if (srcSize != 1) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize != 1, srcSize_wrong);
return 1;
}
if (nbSeq > 0x7F) {
if (nbSeq == 0xFF) {
- if (ip+2 > iend) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(ip+2 > iend, srcSize_wrong);
nbSeq = MEM_readLE16(ip) + LONGNBSEQ, ip+=2;
} else {
- if (ip >= iend) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(ip >= iend, srcSize_wrong);
nbSeq = ((nbSeq-0x80)<<8) + *ip++;
}
}
*nbSeqPtr = nbSeq;
/* FSE table descriptors */
- if (ip+4 > iend) return ERROR(srcSize_wrong); /* minimum possible size */
+ RETURN_ERROR_IF(ip+1 > iend, srcSize_wrong); /* minimum possible size: 1 byte for symbol encoding types */
{ symbolEncodingType_e const LLtype = (symbolEncodingType_e)(*ip >> 6);
symbolEncodingType_e const OFtype = (symbolEncodingType_e)((*ip >> 4) & 3);
symbolEncodingType_e const MLtype = (symbolEncodingType_e)((*ip >> 2) & 3);
@@ -517,7 +521,7 @@
LL_base, LL_bits,
LL_defaultDTable, dctx->fseEntropy,
dctx->ddictIsCold, nbSeq);
- if (ZSTD_isError(llhSize)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(ZSTD_isError(llhSize), corruption_detected);
ip += llhSize;
}
@@ -527,7 +531,7 @@
OF_base, OF_bits,
OF_defaultDTable, dctx->fseEntropy,
dctx->ddictIsCold, nbSeq);
- if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(ZSTD_isError(ofhSize), corruption_detected);
ip += ofhSize;
}
@@ -537,7 +541,7 @@
ML_base, ML_bits,
ML_defaultDTable, dctx->fseEntropy,
dctx->ddictIsCold, nbSeq);
- if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(ZSTD_isError(mlhSize), corruption_detected);
ip += mlhSize;
}
}
@@ -590,8 +594,8 @@
const BYTE* match = oLitEnd - sequence.offset;
/* check */
- if (oMatchEnd>oend) return ERROR(dstSize_tooSmall); /* last match must fit within dstBuffer */
- if (iLitEnd > litLimit) return ERROR(corruption_detected); /* try to read beyond literal buffer */
+ RETURN_ERROR_IF(oMatchEnd>oend, dstSize_tooSmall, "last match must fit within dstBuffer");
+ RETURN_ERROR_IF(iLitEnd > litLimit, corruption_detected, "try to read beyond literal buffer");
/* copy literals */
while (op < oLitEnd) *op++ = *(*litPtr)++;
@@ -599,7 +603,7 @@
/* copy Match */
if (sequence.offset > (size_t)(oLitEnd - base)) {
/* offset beyond prefix */
- if (sequence.offset > (size_t)(oLitEnd - vBase)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - vBase),corruption_detected);
match = dictEnd - (base-match);
if (match + sequence.matchLength <= dictEnd) {
memmove(oLitEnd, match, sequence.matchLength);
@@ -631,22 +635,22 @@
const BYTE* match = oLitEnd - sequence.offset;
/* check */
- if (oMatchEnd>oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
- if (iLitEnd > litLimit) return ERROR(corruption_detected); /* over-read beyond lit buffer */
+ RETURN_ERROR_IF(oMatchEnd>oend, dstSize_tooSmall, "last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend");
+ RETURN_ERROR_IF(iLitEnd > litLimit, corruption_detected, "over-read beyond lit buffer");
if (oLitEnd>oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, virtualStart, dictEnd);
/* copy Literals */
- ZSTD_copy8(op, *litPtr);
if (sequence.litLength > 8)
- ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+ ZSTD_wildcopy_16min(op, (*litPtr), sequence.litLength, ZSTD_no_overlap); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+ else
+ ZSTD_copy8(op, *litPtr);
op = oLitEnd;
*litPtr = iLitEnd; /* update for next sequence */
/* copy Match */
if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
/* offset beyond prefix -> go into extDict */
- if (sequence.offset > (size_t)(oLitEnd - virtualStart))
- return ERROR(corruption_detected);
+ RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart), corruption_detected);
match = dictEnd + (match - prefixStart);
if (match + sequence.matchLength <= dictEnd) {
memmove(oLitEnd, match, sequence.matchLength);
@@ -686,13 +690,13 @@
if (oMatchEnd > oend-(16-MINMATCH)) {
if (op < oend_w) {
- ZSTD_wildcopy(op, match, oend_w - op);
+ ZSTD_wildcopy(op, match, oend_w - op, ZSTD_overlap_src_before_dst);
match += oend_w - op;
op = oend_w;
}
while (op < oMatchEnd) *op++ = *match++;
} else {
- ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8); /* works even if matchLength < 8 */
+ ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8, ZSTD_overlap_src_before_dst); /* works even if matchLength < 8 */
}
return sequenceLength;
}
@@ -712,21 +716,23 @@
const BYTE* match = sequence.match;
/* check */
- if (oMatchEnd > oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
- if (iLitEnd > litLimit) return ERROR(corruption_detected); /* over-read beyond lit buffer */
+ RETURN_ERROR_IF(oMatchEnd > oend, dstSize_tooSmall, "last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend");
+ RETURN_ERROR_IF(iLitEnd > litLimit, corruption_detected, "over-read beyond lit buffer");
if (oLitEnd > oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, dictStart, dictEnd);
/* copy Literals */
- ZSTD_copy8(op, *litPtr); /* note : op <= oLitEnd <= oend_w == oend - 8 */
if (sequence.litLength > 8)
- ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+ ZSTD_wildcopy_16min(op, *litPtr, sequence.litLength, ZSTD_no_overlap); /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+ else
+ ZSTD_copy8(op, *litPtr); /* note : op <= oLitEnd <= oend_w == oend - 8 */
+
op = oLitEnd;
*litPtr = iLitEnd; /* update for next sequence */
/* copy Match */
if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
/* offset beyond prefix */
- if (sequence.offset > (size_t)(oLitEnd - dictStart)) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - dictStart), corruption_detected);
if (match + sequence.matchLength <= dictEnd) {
memmove(oLitEnd, match, sequence.matchLength);
return sequenceLength;
@@ -766,13 +772,13 @@
if (oMatchEnd > oend-(16-MINMATCH)) {
if (op < oend_w) {
- ZSTD_wildcopy(op, match, oend_w - op);
+ ZSTD_wildcopy(op, match, oend_w - op, ZSTD_overlap_src_before_dst);
match += oend_w - op;
op = oend_w;
}
while (op < oMatchEnd) *op++ = *match++;
} else {
- ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8); /* works even if matchLength < 8 */
+ ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8, ZSTD_overlap_src_before_dst); /* works even if matchLength < 8 */
}
return sequenceLength;
}
@@ -801,7 +807,7 @@
/* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the maximum
* offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - 1)
* bits before reloading. This value is the maximum number of bytes we read
- * after reloading when we are decoding long offets.
+ * after reloading when we are decoding long offsets.
*/
#define LONG_OFFSETS_MAX_EXTRA_BITS_32 \
(ZSTD_WINDOWLOG_MAX_32 > STREAM_ACCUMULATOR_MIN_32 \
@@ -889,6 +895,7 @@
}
FORCE_INLINE_TEMPLATE size_t
+DONT_VECTORIZE
ZSTD_decompressSequences_body( ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize, int nbSeq,
@@ -911,11 +918,18 @@
seqState_t seqState;
dctx->fseEntropy = 1;
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
- CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
+ RETURN_ERROR_IF(
+ ERR_isError(BIT_initDStream(&seqState.DStream, ip, iend-ip)),
+ corruption_detected);
ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
+ ZSTD_STATIC_ASSERT(
+ BIT_DStream_unfinished < BIT_DStream_completed &&
+ BIT_DStream_endOfBuffer < BIT_DStream_completed &&
+ BIT_DStream_completed < BIT_DStream_overflow);
+
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq ; ) {
nbSeq--;
{ seq_t const sequence = ZSTD_decodeSequence(&seqState, isLongOffset);
@@ -927,14 +941,15 @@
/* check if reached exact end */
DEBUGLOG(5, "ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : %i", nbSeq);
- if (nbSeq) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(nbSeq, corruption_detected);
+ RETURN_ERROR_IF(BIT_reloadDStream(&seqState.DStream) < BIT_DStream_completed, corruption_detected);
/* save reps for next block */
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
}
/* last literal segment */
{ size_t const lastLLSize = litEnd - litPtr;
- if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
}
@@ -1066,7 +1081,9 @@
seqState.pos = (size_t)(op-prefixStart);
seqState.dictEnd = dictEnd;
assert(iend >= ip);
- CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
+ RETURN_ERROR_IF(
+ ERR_isError(BIT_initDStream(&seqState.DStream, ip, iend-ip)),
+ corruption_detected);
ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
@@ -1076,7 +1093,7 @@
sequences[seqNb] = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
PREFETCH_L1(sequences[seqNb].match); PREFETCH_L1(sequences[seqNb].match + sequences[seqNb].matchLength - 1); /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
}
- if (seqNb<seqAdvance) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(seqNb<seqAdvance, corruption_detected);
/* decode and decompress */
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && (seqNb<nbSeq) ; seqNb++) {
@@ -1087,7 +1104,7 @@
sequences[seqNb & STORED_SEQS_MASK] = sequence;
op += oneSeqSize;
}
- if (seqNb<nbSeq) return ERROR(corruption_detected);
+ RETURN_ERROR_IF(seqNb<nbSeq, corruption_detected);
/* finish queue */
seqNb -= seqAdvance;
@@ -1103,7 +1120,7 @@
/* last literal segment */
{ size_t const lastLLSize = litEnd - litPtr;
- if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
+ RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
}
@@ -1127,6 +1144,7 @@
#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
static TARGET_ATTRIBUTE("bmi2") size_t
+DONT_VECTORIZE
ZSTD_decompressSequences_bmi2(ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize, int nbSeq,
@@ -1176,7 +1194,7 @@
/* ZSTD_decompressSequencesLong() :
* decompression function triggered when a minimum share of offsets is considered "long",
* aka out of cache.
- * note : "long" definition seems overloaded here, sometimes meaning "wider than bitstream register", and sometimes mearning "farther than memory cache distance".
+ * note : "long" definition seems overloaded here, sometimes meaning "wider than bitstream register", and sometimes meaning "farther than memory cache distance".
* This function will try to mitigate main memory latency through the use of prefetching */
static size_t
ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx,
@@ -1240,7 +1258,7 @@
ZSTD_longOffset_e const isLongOffset = (ZSTD_longOffset_e)(MEM_32bits() && (!frame || (dctx->fParams.windowSize > (1ULL << STREAM_ACCUMULATOR_MIN))));
DEBUGLOG(5, "ZSTD_decompressBlock_internal (size : %u)", (U32)srcSize);
- if (srcSize >= ZSTD_BLOCKSIZE_MAX) return ERROR(srcSize_wrong);
+ RETURN_ERROR_IF(srcSize >= ZSTD_BLOCKSIZE_MAX, srcSize_wrong);
/* Decode literals section */
{ size_t const litCSize = ZSTD_decodeLiteralsBlock(dctx, src, srcSize);
--- a/contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h Sun Sep 15 20:04:00 2019 -0700
@@ -89,6 +89,12 @@
typedef enum { zdss_init=0, zdss_loadHeader,
zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage;
+typedef enum {
+ ZSTD_use_indefinitely = -1, /* Use the dictionary indefinitely */
+ ZSTD_dont_use = 0, /* Do not use the dictionary (if one exists free it) */
+ ZSTD_use_once = 1 /* Use the dictionary once and set to ZSTD_dont_use */
+} ZSTD_dictUses_e;
+
struct ZSTD_DCtx_s
{
const ZSTD_seqSymbol* LLTptr;
@@ -123,6 +129,7 @@
const ZSTD_DDict* ddict; /* set by ZSTD_initDStream_usingDDict(), or ZSTD_DCtx_refDDict() */
U32 dictID;
int ddictIsCold; /* if == 1 : dictionary is "new" for working context, and presumed "cold" (not in cpu cache) */
+ ZSTD_dictUses_e dictUses;
/* streaming */
ZSTD_dStreamStage streamStage;
--- a/contrib/python-zstandard/zstd/dictBuilder/cover.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/dictBuilder/cover.c Sun Sep 15 20:04:00 2019 -0700
@@ -391,7 +391,7 @@
*
* Score(S) = F(S_1) + F(S_2) + ... + F(S_{k-d+1})
*
- * Once the dmer d is in the dictionay we set F(d) = 0.
+ * Once the dmer d is in the dictionary we set F(d) = 0.
*/
static COVER_segment_t COVER_selectSegment(const COVER_ctx_t *ctx, U32 *freqs,
COVER_map_t *activeDmers, U32 begin,
@@ -435,7 +435,7 @@
U32 *delDmerOcc = COVER_map_at(activeDmers, delDmer);
activeSegment.begin += 1;
*delDmerOcc -= 1;
- /* If this is the last occurence of the dmer, subtract its score */
+ /* If this is the last occurrence of the dmer, subtract its score */
if (*delDmerOcc == 0) {
COVER_map_remove(activeDmers, delDmer);
activeSegment.score -= freqs[delDmer];
@@ -526,10 +526,10 @@
* Prepare a context for dictionary building.
* The context is only dependent on the parameter `d` and can used multiple
* times.
- * Returns 1 on success or zero on error.
+ * Returns 0 on success or error code on error.
* The context must be destroyed with `COVER_ctx_destroy()`.
*/
-static int COVER_ctx_init(COVER_ctx_t *ctx, const void *samplesBuffer,
+static size_t COVER_ctx_init(COVER_ctx_t *ctx, const void *samplesBuffer,
const size_t *samplesSizes, unsigned nbSamples,
unsigned d, double splitPoint) {
const BYTE *const samples = (const BYTE *)samplesBuffer;
@@ -544,17 +544,17 @@
totalSamplesSize >= (size_t)COVER_MAX_SAMPLES_SIZE) {
DISPLAYLEVEL(1, "Total samples size is too large (%u MB), maximum size is %u MB\n",
(unsigned)(totalSamplesSize>>20), (COVER_MAX_SAMPLES_SIZE >> 20));
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Check if there are at least 5 training samples */
if (nbTrainSamples < 5) {
DISPLAYLEVEL(1, "Total number of training samples is %u and is invalid.", nbTrainSamples);
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Check if there's testing sample */
if (nbTestSamples < 1) {
DISPLAYLEVEL(1, "Total number of testing samples is %u and is invalid.", nbTestSamples);
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Zero the context */
memset(ctx, 0, sizeof(*ctx));
@@ -577,7 +577,7 @@
if (!ctx->suffix || !ctx->dmerAt || !ctx->offsets) {
DISPLAYLEVEL(1, "Failed to allocate scratch buffers\n");
COVER_ctx_destroy(ctx);
- return 0;
+ return ERROR(memory_allocation);
}
ctx->freqs = NULL;
ctx->d = d;
@@ -624,7 +624,40 @@
(ctx->d <= 8 ? &COVER_cmp8 : &COVER_cmp), &COVER_group);
ctx->freqs = ctx->suffix;
ctx->suffix = NULL;
- return 1;
+ return 0;
+}
+
+void COVER_warnOnSmallCorpus(size_t maxDictSize, size_t nbDmers, int displayLevel)
+{
+ const double ratio = (double)nbDmers / maxDictSize;
+ if (ratio >= 10) {
+ return;
+ }
+ LOCALDISPLAYLEVEL(displayLevel, 1,
+ "WARNING: The maximum dictionary size %u is too large "
+ "compared to the source size %u! "
+ "size(source)/size(dictionary) = %f, but it should be >= "
+ "10! This may lead to a subpar dictionary! We recommend "
+ "training on sources at least 10x, and up to 100x the "
+ "size of the dictionary!\n", (U32)maxDictSize,
+ (U32)nbDmers, ratio);
+}
+
+COVER_epoch_info_t COVER_computeEpochs(U32 maxDictSize,
+ U32 nbDmers, U32 k, U32 passes)
+{
+ const U32 minEpochSize = k * 10;
+ COVER_epoch_info_t epochs;
+ epochs.num = MAX(1, maxDictSize / k / passes);
+ epochs.size = nbDmers / epochs.num;
+ if (epochs.size >= minEpochSize) {
+ assert(epochs.size * epochs.num <= nbDmers);
+ return epochs;
+ }
+ epochs.size = MIN(minEpochSize, nbDmers);
+ epochs.num = nbDmers / epochs.size;
+ assert(epochs.size * epochs.num <= nbDmers);
+ return epochs;
}
/**
@@ -636,28 +669,34 @@
ZDICT_cover_params_t parameters) {
BYTE *const dict = (BYTE *)dictBuffer;
size_t tail = dictBufferCapacity;
- /* Divide the data up into epochs of equal size.
- * We will select at least one segment from each epoch.
- */
- const unsigned epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k / 4));
- const unsigned epochSize = (U32)(ctx->suffixSize / epochs);
+ /* Divide the data into epochs. We will select one segment from each epoch. */
+ const COVER_epoch_info_t epochs = COVER_computeEpochs(
+ (U32)dictBufferCapacity, (U32)ctx->suffixSize, parameters.k, 4);
+ const size_t maxZeroScoreRun = MAX(10, MIN(100, epochs.num >> 3));
+ size_t zeroScoreRun = 0;
size_t epoch;
DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n",
- epochs, epochSize);
+ (U32)epochs.num, (U32)epochs.size);
/* Loop through the epochs until there are no more segments or the dictionary
* is full.
*/
- for (epoch = 0; tail > 0; epoch = (epoch + 1) % epochs) {
- const U32 epochBegin = (U32)(epoch * epochSize);
- const U32 epochEnd = epochBegin + epochSize;
+ for (epoch = 0; tail > 0; epoch = (epoch + 1) % epochs.num) {
+ const U32 epochBegin = (U32)(epoch * epochs.size);
+ const U32 epochEnd = epochBegin + epochs.size;
size_t segmentSize;
/* Select a segment */
COVER_segment_t segment = COVER_selectSegment(
ctx, freqs, activeDmers, epochBegin, epochEnd, parameters);
- /* If the segment covers no dmers, then we are out of content */
+ /* If the segment covers no dmers, then we are out of content.
+ * There may be new content in other epochs, for continue for some time.
+ */
if (segment.score == 0) {
- break;
+ if (++zeroScoreRun >= maxZeroScoreRun) {
+ break;
+ }
+ continue;
}
+ zeroScoreRun = 0;
/* Trim the segment if necessary and if it is too small then we are done */
segmentSize = MIN(segment.end - segment.begin + parameters.d - 1, tail);
if (segmentSize < parameters.d) {
@@ -690,11 +729,11 @@
/* Checks */
if (!COVER_checkParameters(parameters, dictBufferCapacity)) {
DISPLAYLEVEL(1, "Cover parameters incorrect\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (nbSamples == 0) {
DISPLAYLEVEL(1, "Cover must have at least one input file\n");
- return ERROR(GENERIC);
+ return ERROR(srcSize_wrong);
}
if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
DISPLAYLEVEL(1, "dictBufferCapacity must be at least %u\n",
@@ -702,14 +741,18 @@
return ERROR(dstSize_tooSmall);
}
/* Initialize context and activeDmers */
- if (!COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples,
- parameters.d, parameters.splitPoint)) {
- return ERROR(GENERIC);
+ {
+ size_t const initVal = COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples,
+ parameters.d, parameters.splitPoint);
+ if (ZSTD_isError(initVal)) {
+ return initVal;
+ }
}
+ COVER_warnOnSmallCorpus(dictBufferCapacity, ctx.suffixSize, g_displayLevel);
if (!COVER_map_init(&activeDmers, parameters.k - parameters.d + 1)) {
DISPLAYLEVEL(1, "Failed to allocate dmer map: out of memory\n");
COVER_ctx_destroy(&ctx);
- return ERROR(GENERIC);
+ return ERROR(memory_allocation);
}
DISPLAYLEVEL(2, "Building dictionary\n");
@@ -770,7 +813,7 @@
cctx, dst, dstCapacity, samples + offsets[i],
samplesSizes[i], cdict);
if (ZSTD_isError(size)) {
- totalCompressedSize = ERROR(GENERIC);
+ totalCompressedSize = size;
goto _compressCleanup;
}
totalCompressedSize += size;
@@ -846,9 +889,11 @@
* Decrements liveJobs and signals any waiting threads if liveJobs == 0.
* If this dictionary is the best so far save it and its parameters.
*/
-void COVER_best_finish(COVER_best_t *best, size_t compressedSize,
- ZDICT_cover_params_t parameters, void *dict,
- size_t dictSize) {
+void COVER_best_finish(COVER_best_t *best, ZDICT_cover_params_t parameters,
+ COVER_dictSelection_t selection) {
+ void* dict = selection.dictContent;
+ size_t compressedSize = selection.totalCompressedSize;
+ size_t dictSize = selection.dictSize;
if (!best) {
return;
}
@@ -874,10 +919,12 @@
}
}
/* Save the dictionary, parameters, and size */
- memcpy(best->dict, dict, dictSize);
- best->dictSize = dictSize;
- best->parameters = parameters;
- best->compressedSize = compressedSize;
+ if (dict) {
+ memcpy(best->dict, dict, dictSize);
+ best->dictSize = dictSize;
+ best->parameters = parameters;
+ best->compressedSize = compressedSize;
+ }
}
if (liveJobs == 0) {
ZSTD_pthread_cond_broadcast(&best->cond);
@@ -886,6 +933,111 @@
}
}
+COVER_dictSelection_t COVER_dictSelectionError(size_t error) {
+ COVER_dictSelection_t selection = { NULL, 0, error };
+ return selection;
+}
+
+unsigned COVER_dictSelectionIsError(COVER_dictSelection_t selection) {
+ return (ZSTD_isError(selection.totalCompressedSize) || !selection.dictContent);
+}
+
+void COVER_dictSelectionFree(COVER_dictSelection_t selection){
+ free(selection.dictContent);
+}
+
+COVER_dictSelection_t COVER_selectDict(BYTE* customDictContent,
+ size_t dictContentSize, const BYTE* samplesBuffer, const size_t* samplesSizes, unsigned nbFinalizeSamples,
+ size_t nbCheckSamples, size_t nbSamples, ZDICT_cover_params_t params, size_t* offsets, size_t totalCompressedSize) {
+
+ size_t largestDict = 0;
+ size_t largestCompressed = 0;
+ BYTE* customDictContentEnd = customDictContent + dictContentSize;
+
+ BYTE * largestDictbuffer = (BYTE *)malloc(dictContentSize);
+ BYTE * candidateDictBuffer = (BYTE *)malloc(dictContentSize);
+ double regressionTolerance = ((double)params.shrinkDictMaxRegression / 100.0) + 1.00;
+
+ if (!largestDictbuffer || !candidateDictBuffer) {
+ free(largestDictbuffer);
+ free(candidateDictBuffer);
+ return COVER_dictSelectionError(dictContentSize);
+ }
+
+ /* Initial dictionary size and compressed size */
+ memcpy(largestDictbuffer, customDictContent, dictContentSize);
+ dictContentSize = ZDICT_finalizeDictionary(
+ largestDictbuffer, dictContentSize, customDictContent, dictContentSize,
+ samplesBuffer, samplesSizes, nbFinalizeSamples, params.zParams);
+
+ if (ZDICT_isError(dictContentSize)) {
+ free(largestDictbuffer);
+ free(candidateDictBuffer);
+ return COVER_dictSelectionError(dictContentSize);
+ }
+
+ totalCompressedSize = COVER_checkTotalCompressedSize(params, samplesSizes,
+ samplesBuffer, offsets,
+ nbCheckSamples, nbSamples,
+ largestDictbuffer, dictContentSize);
+
+ if (ZSTD_isError(totalCompressedSize)) {
+ free(largestDictbuffer);
+ free(candidateDictBuffer);
+ return COVER_dictSelectionError(totalCompressedSize);
+ }
+
+ if (params.shrinkDict == 0) {
+ COVER_dictSelection_t selection = { largestDictbuffer, dictContentSize, totalCompressedSize };
+ free(candidateDictBuffer);
+ return selection;
+ }
+
+ largestDict = dictContentSize;
+ largestCompressed = totalCompressedSize;
+ dictContentSize = ZDICT_DICTSIZE_MIN;
+
+ /* Largest dict is initially at least ZDICT_DICTSIZE_MIN */
+ while (dictContentSize < largestDict) {
+ memcpy(candidateDictBuffer, largestDictbuffer, largestDict);
+ dictContentSize = ZDICT_finalizeDictionary(
+ candidateDictBuffer, dictContentSize, customDictContentEnd - dictContentSize, dictContentSize,
+ samplesBuffer, samplesSizes, nbFinalizeSamples, params.zParams);
+
+ if (ZDICT_isError(dictContentSize)) {
+ free(largestDictbuffer);
+ free(candidateDictBuffer);
+ return COVER_dictSelectionError(dictContentSize);
+
+ }
+
+ totalCompressedSize = COVER_checkTotalCompressedSize(params, samplesSizes,
+ samplesBuffer, offsets,
+ nbCheckSamples, nbSamples,
+ candidateDictBuffer, dictContentSize);
+
+ if (ZSTD_isError(totalCompressedSize)) {
+ free(largestDictbuffer);
+ free(candidateDictBuffer);
+ return COVER_dictSelectionError(totalCompressedSize);
+ }
+
+ if (totalCompressedSize <= largestCompressed * regressionTolerance) {
+ COVER_dictSelection_t selection = { candidateDictBuffer, dictContentSize, totalCompressedSize };
+ free(largestDictbuffer);
+ return selection;
+ }
+ dictContentSize *= 2;
+ }
+ dictContentSize = largestDict;
+ totalCompressedSize = largestCompressed;
+ {
+ COVER_dictSelection_t selection = { largestDictbuffer, dictContentSize, totalCompressedSize };
+ free(candidateDictBuffer);
+ return selection;
+ }
+}
+
/**
* Parameters for COVER_tryParameters().
*/
@@ -911,6 +1063,7 @@
/* Allocate space for hash table, dict, and freqs */
COVER_map_t activeDmers;
BYTE *const dict = (BYTE * const)malloc(dictBufferCapacity);
+ COVER_dictSelection_t selection = COVER_dictSelectionError(ERROR(GENERIC));
U32 *freqs = (U32 *)malloc(ctx->suffixSize * sizeof(U32));
if (!COVER_map_init(&activeDmers, parameters.k - parameters.d + 1)) {
DISPLAYLEVEL(1, "Failed to allocate dmer map: out of memory\n");
@@ -926,29 +1079,21 @@
{
const size_t tail = COVER_buildDictionary(ctx, freqs, &activeDmers, dict,
dictBufferCapacity, parameters);
- dictBufferCapacity = ZDICT_finalizeDictionary(
- dict, dictBufferCapacity, dict + tail, dictBufferCapacity - tail,
- ctx->samples, ctx->samplesSizes, (unsigned)ctx->nbTrainSamples,
- parameters.zParams);
- if (ZDICT_isError(dictBufferCapacity)) {
- DISPLAYLEVEL(1, "Failed to finalize dictionary\n");
+ selection = COVER_selectDict(dict + tail, dictBufferCapacity - tail,
+ ctx->samples, ctx->samplesSizes, (unsigned)ctx->nbTrainSamples, ctx->nbTrainSamples, ctx->nbSamples, parameters, ctx->offsets,
+ totalCompressedSize);
+
+ if (COVER_dictSelectionIsError(selection)) {
+ DISPLAYLEVEL(1, "Failed to select dictionary\n");
goto _cleanup;
}
}
- /* Check total compressed size */
- totalCompressedSize = COVER_checkTotalCompressedSize(parameters, ctx->samplesSizes,
- ctx->samples, ctx->offsets,
- ctx->nbTrainSamples, ctx->nbSamples,
- dict, dictBufferCapacity);
-
_cleanup:
- COVER_best_finish(data->best, totalCompressedSize, parameters, dict,
- dictBufferCapacity);
+ free(dict);
+ COVER_best_finish(data->best, parameters, selection);
free(data);
COVER_map_destroy(&activeDmers);
- if (dict) {
- free(dict);
- }
+ COVER_dictSelectionFree(selection);
if (freqs) {
free(freqs);
}
@@ -970,6 +1115,7 @@
const unsigned kStepSize = MAX((kMaxK - kMinK) / kSteps, 1);
const unsigned kIterations =
(1 + (kMaxD - kMinD) / 2) * (1 + (kMaxK - kMinK) / kStepSize);
+ const unsigned shrinkDict = 0;
/* Local variables */
const int displayLevel = parameters->zParams.notificationLevel;
unsigned iteration = 1;
@@ -977,19 +1123,20 @@
unsigned k;
COVER_best_t best;
POOL_ctx *pool = NULL;
+ int warned = 0;
/* Checks */
if (splitPoint <= 0 || splitPoint > 1) {
LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect parameters\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (kMinK < kMaxD || kMaxK < kMinK) {
LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect parameters\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (nbSamples == 0) {
DISPLAYLEVEL(1, "Cover must have at least one input file\n");
- return ERROR(GENERIC);
+ return ERROR(srcSize_wrong);
}
if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
DISPLAYLEVEL(1, "dictBufferCapacity must be at least %u\n",
@@ -1013,11 +1160,18 @@
/* Initialize the context for this value of d */
COVER_ctx_t ctx;
LOCALDISPLAYLEVEL(displayLevel, 3, "d=%u\n", d);
- if (!COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples, d, splitPoint)) {
- LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to initialize context\n");
- COVER_best_destroy(&best);
- POOL_free(pool);
- return ERROR(GENERIC);
+ {
+ const size_t initVal = COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples, d, splitPoint);
+ if (ZSTD_isError(initVal)) {
+ LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to initialize context\n");
+ COVER_best_destroy(&best);
+ POOL_free(pool);
+ return initVal;
+ }
+ }
+ if (!warned) {
+ COVER_warnOnSmallCorpus(dictBufferCapacity, ctx.suffixSize, displayLevel);
+ warned = 1;
}
/* Loop through k reusing the same context */
for (k = kMinK; k <= kMaxK; k += kStepSize) {
@@ -1030,7 +1184,7 @@
COVER_best_destroy(&best);
COVER_ctx_destroy(&ctx);
POOL_free(pool);
- return ERROR(GENERIC);
+ return ERROR(memory_allocation);
}
data->ctx = &ctx;
data->best = &best;
@@ -1040,6 +1194,7 @@
data->parameters.d = d;
data->parameters.splitPoint = splitPoint;
data->parameters.steps = kSteps;
+ data->parameters.shrinkDict = shrinkDict;
data->parameters.zParams.notificationLevel = g_displayLevel;
/* Check the parameters */
if (!COVER_checkParameters(data->parameters, dictBufferCapacity)) {
--- a/contrib/python-zstandard/zstd/dictBuilder/cover.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/dictBuilder/cover.h Sun Sep 15 20:04:00 2019 -0700
@@ -39,6 +39,44 @@
} COVER_segment_t;
/**
+ *Number of epochs and size of each epoch.
+ */
+typedef struct {
+ U32 num;
+ U32 size;
+} COVER_epoch_info_t;
+
+/**
+ * Struct used for the dictionary selection function.
+ */
+typedef struct COVER_dictSelection {
+ BYTE* dictContent;
+ size_t dictSize;
+ size_t totalCompressedSize;
+} COVER_dictSelection_t;
+
+/**
+ * Computes the number of epochs and the size of each epoch.
+ * We will make sure that each epoch gets at least 10 * k bytes.
+ *
+ * The COVER algorithms divide the data up into epochs of equal size and
+ * select one segment from each epoch.
+ *
+ * @param maxDictSize The maximum allowed dictionary size.
+ * @param nbDmers The number of dmers we are training on.
+ * @param k The parameter k (segment size).
+ * @param passes The target number of passes over the dmer corpus.
+ * More passes means a better dictionary.
+ */
+COVER_epoch_info_t COVER_computeEpochs(U32 maxDictSize, U32 nbDmers,
+ U32 k, U32 passes);
+
+/**
+ * Warns the user when their corpus is too small.
+ */
+void COVER_warnOnSmallCorpus(size_t maxDictSize, size_t nbDmers, int displayLevel);
+
+/**
* Checks total compressed size of a dictionary
*/
size_t COVER_checkTotalCompressedSize(const ZDICT_cover_params_t parameters,
@@ -78,6 +116,32 @@
* Decrements liveJobs and signals any waiting threads if liveJobs == 0.
* If this dictionary is the best so far save it and its parameters.
*/
-void COVER_best_finish(COVER_best_t *best, size_t compressedSize,
- ZDICT_cover_params_t parameters, void *dict,
- size_t dictSize);
+void COVER_best_finish(COVER_best_t *best, ZDICT_cover_params_t parameters,
+ COVER_dictSelection_t selection);
+/**
+ * Error function for COVER_selectDict function. Checks if the return
+ * value is an error.
+ */
+unsigned COVER_dictSelectionIsError(COVER_dictSelection_t selection);
+
+ /**
+ * Error function for COVER_selectDict function. Returns a struct where
+ * return.totalCompressedSize is a ZSTD error.
+ */
+COVER_dictSelection_t COVER_dictSelectionError(size_t error);
+
+/**
+ * Always call after selectDict is called to free up used memory from
+ * newly created dictionary.
+ */
+void COVER_dictSelectionFree(COVER_dictSelection_t selection);
+
+/**
+ * Called to finalize the dictionary and select one based on whether or not
+ * the shrink-dict flag was enabled. If enabled the dictionary used is the
+ * smallest dictionary within a specified regression of the compressed size
+ * from the largest dictionary.
+ */
+ COVER_dictSelection_t COVER_selectDict(BYTE* customDictContent,
+ size_t dictContentSize, const BYTE* samplesBuffer, const size_t* samplesSizes, unsigned nbFinalizeSamples,
+ size_t nbCheckSamples, size_t nbSamples, ZDICT_cover_params_t params, size_t* offsets, size_t totalCompressedSize);
--- a/contrib/python-zstandard/zstd/dictBuilder/fastcover.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/dictBuilder/fastcover.c Sun Sep 15 20:04:00 2019 -0700
@@ -132,7 +132,7 @@
*
* Score(S) = F(S_1) + F(S_2) + ... + F(S_{k-d+1})
*
- * Once the dmer with hash value d is in the dictionay we set F(d) = 0.
+ * Once the dmer with hash value d is in the dictionary we set F(d) = 0.
*/
static COVER_segment_t FASTCOVER_selectSegment(const FASTCOVER_ctx_t *ctx,
U32 *freqs, U32 begin, U32 end,
@@ -161,7 +161,7 @@
/* Get hash value of current dmer */
const size_t idx = FASTCOVER_hashPtrToIndex(ctx->samples + activeSegment.end, f, d);
- /* Add frequency of this index to score if this is the first occurence of index in active segment */
+ /* Add frequency of this index to score if this is the first occurrence of index in active segment */
if (segmentFreqs[idx] == 0) {
activeSegment.score += freqs[idx];
}
@@ -287,10 +287,10 @@
* Prepare a context for dictionary building.
* The context is only dependent on the parameter `d` and can used multiple
* times.
- * Returns 1 on success or zero on error.
+ * Returns 0 on success or error code on error.
* The context must be destroyed with `FASTCOVER_ctx_destroy()`.
*/
-static int
+static size_t
FASTCOVER_ctx_init(FASTCOVER_ctx_t* ctx,
const void* samplesBuffer,
const size_t* samplesSizes, unsigned nbSamples,
@@ -310,19 +310,19 @@
totalSamplesSize >= (size_t)FASTCOVER_MAX_SAMPLES_SIZE) {
DISPLAYLEVEL(1, "Total samples size is too large (%u MB), maximum size is %u MB\n",
(unsigned)(totalSamplesSize >> 20), (FASTCOVER_MAX_SAMPLES_SIZE >> 20));
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Check if there are at least 5 training samples */
if (nbTrainSamples < 5) {
DISPLAYLEVEL(1, "Total number of training samples is %u and is invalid\n", nbTrainSamples);
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Check if there's testing sample */
if (nbTestSamples < 1) {
DISPLAYLEVEL(1, "Total number of testing samples is %u and is invalid.\n", nbTestSamples);
- return 0;
+ return ERROR(srcSize_wrong);
}
/* Zero the context */
@@ -347,7 +347,7 @@
if (ctx->offsets == NULL) {
DISPLAYLEVEL(1, "Failed to allocate scratch buffers \n");
FASTCOVER_ctx_destroy(ctx);
- return 0;
+ return ERROR(memory_allocation);
}
/* Fill offsets from the samplesSizes */
@@ -364,13 +364,13 @@
if (ctx->freqs == NULL) {
DISPLAYLEVEL(1, "Failed to allocate frequency table \n");
FASTCOVER_ctx_destroy(ctx);
- return 0;
+ return ERROR(memory_allocation);
}
DISPLAYLEVEL(2, "Computing frequencies\n");
FASTCOVER_computeFrequency(ctx->freqs, ctx);
- return 1;
+ return 0;
}
@@ -386,29 +386,35 @@
{
BYTE *const dict = (BYTE *)dictBuffer;
size_t tail = dictBufferCapacity;
- /* Divide the data up into epochs of equal size.
- * We will select at least one segment from each epoch.
- */
- const unsigned epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k));
- const unsigned epochSize = (U32)(ctx->nbDmers / epochs);
+ /* Divide the data into epochs. We will select one segment from each epoch. */
+ const COVER_epoch_info_t epochs = COVER_computeEpochs(
+ (U32)dictBufferCapacity, (U32)ctx->nbDmers, parameters.k, 1);
+ const size_t maxZeroScoreRun = 10;
+ size_t zeroScoreRun = 0;
size_t epoch;
DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n",
- epochs, epochSize);
+ (U32)epochs.num, (U32)epochs.size);
/* Loop through the epochs until there are no more segments or the dictionary
* is full.
*/
- for (epoch = 0; tail > 0; epoch = (epoch + 1) % epochs) {
- const U32 epochBegin = (U32)(epoch * epochSize);
- const U32 epochEnd = epochBegin + epochSize;
+ for (epoch = 0; tail > 0; epoch = (epoch + 1) % epochs.num) {
+ const U32 epochBegin = (U32)(epoch * epochs.size);
+ const U32 epochEnd = epochBegin + epochs.size;
size_t segmentSize;
/* Select a segment */
COVER_segment_t segment = FASTCOVER_selectSegment(
ctx, freqs, epochBegin, epochEnd, parameters, segmentFreqs);
- /* If the segment covers no dmers, then we are out of content */
+ /* If the segment covers no dmers, then we are out of content.
+ * There may be new content in other epochs, for continue for some time.
+ */
if (segment.score == 0) {
- break;
+ if (++zeroScoreRun >= maxZeroScoreRun) {
+ break;
+ }
+ continue;
}
+ zeroScoreRun = 0;
/* Trim the segment if necessary and if it is too small then we are done */
segmentSize = MIN(segment.end - segment.begin + parameters.d - 1, tail);
@@ -429,7 +435,6 @@
return tail;
}
-
/**
* Parameters for FASTCOVER_tryParameters().
*/
@@ -458,6 +463,7 @@
U16* segmentFreqs = (U16 *)calloc(((U64)1 << ctx->f), sizeof(U16));
/* Allocate space for hash table, dict, and freqs */
BYTE *const dict = (BYTE * const)malloc(dictBufferCapacity);
+ COVER_dictSelection_t selection = COVER_dictSelectionError(ERROR(GENERIC));
U32 *freqs = (U32*) malloc(((U64)1 << ctx->f) * sizeof(U32));
if (!segmentFreqs || !dict || !freqs) {
DISPLAYLEVEL(1, "Failed to allocate buffers: out of memory\n");
@@ -467,27 +473,24 @@
memcpy(freqs, ctx->freqs, ((U64)1 << ctx->f) * sizeof(U32));
/* Build the dictionary */
{ const size_t tail = FASTCOVER_buildDictionary(ctx, freqs, dict, dictBufferCapacity,
- parameters, segmentFreqs);
+ parameters, segmentFreqs);
+
const unsigned nbFinalizeSamples = (unsigned)(ctx->nbTrainSamples * ctx->accelParams.finalize / 100);
- dictBufferCapacity = ZDICT_finalizeDictionary(
- dict, dictBufferCapacity, dict + tail, dictBufferCapacity - tail,
- ctx->samples, ctx->samplesSizes, nbFinalizeSamples, parameters.zParams);
- if (ZDICT_isError(dictBufferCapacity)) {
- DISPLAYLEVEL(1, "Failed to finalize dictionary\n");
+ selection = COVER_selectDict(dict + tail, dictBufferCapacity - tail,
+ ctx->samples, ctx->samplesSizes, nbFinalizeSamples, ctx->nbTrainSamples, ctx->nbSamples, parameters, ctx->offsets,
+ totalCompressedSize);
+
+ if (COVER_dictSelectionIsError(selection)) {
+ DISPLAYLEVEL(1, "Failed to select dictionary\n");
goto _cleanup;
}
}
- /* Check total compressed size */
- totalCompressedSize = COVER_checkTotalCompressedSize(parameters, ctx->samplesSizes,
- ctx->samples, ctx->offsets,
- ctx->nbTrainSamples, ctx->nbSamples,
- dict, dictBufferCapacity);
_cleanup:
- COVER_best_finish(data->best, totalCompressedSize, parameters, dict,
- dictBufferCapacity);
+ free(dict);
+ COVER_best_finish(data->best, parameters, selection);
free(data);
free(segmentFreqs);
- free(dict);
+ COVER_dictSelectionFree(selection);
free(freqs);
}
@@ -502,6 +505,7 @@
coverParams->nbThreads = fastCoverParams.nbThreads;
coverParams->splitPoint = fastCoverParams.splitPoint;
coverParams->zParams = fastCoverParams.zParams;
+ coverParams->shrinkDict = fastCoverParams.shrinkDict;
}
@@ -518,6 +522,7 @@
fastCoverParams->f = f;
fastCoverParams->accel = accel;
fastCoverParams->zParams = coverParams.zParams;
+ fastCoverParams->shrinkDict = coverParams.shrinkDict;
}
@@ -544,11 +549,11 @@
if (!FASTCOVER_checkParameters(coverParams, dictBufferCapacity, parameters.f,
parameters.accel)) {
DISPLAYLEVEL(1, "FASTCOVER parameters incorrect\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (nbSamples == 0) {
DISPLAYLEVEL(1, "FASTCOVER must have at least one input file\n");
- return ERROR(GENERIC);
+ return ERROR(srcSize_wrong);
}
if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
DISPLAYLEVEL(1, "dictBufferCapacity must be at least %u\n",
@@ -558,12 +563,16 @@
/* Assign corresponding FASTCOVER_accel_t to accelParams*/
accelParams = FASTCOVER_defaultAccelParameters[parameters.accel];
/* Initialize context */
- if (!FASTCOVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples,
+ {
+ size_t const initVal = FASTCOVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples,
coverParams.d, parameters.splitPoint, parameters.f,
- accelParams)) {
- DISPLAYLEVEL(1, "Failed to initialize context\n");
- return ERROR(GENERIC);
+ accelParams);
+ if (ZSTD_isError(initVal)) {
+ DISPLAYLEVEL(1, "Failed to initialize context\n");
+ return initVal;
+ }
}
+ COVER_warnOnSmallCorpus(dictBufferCapacity, ctx.nbDmers, g_displayLevel);
/* Build the dictionary */
DISPLAYLEVEL(2, "Building dictionary\n");
{
@@ -609,6 +618,7 @@
(1 + (kMaxD - kMinD) / 2) * (1 + (kMaxK - kMinK) / kStepSize);
const unsigned f = parameters->f == 0 ? DEFAULT_F : parameters->f;
const unsigned accel = parameters->accel == 0 ? DEFAULT_ACCEL : parameters->accel;
+ const unsigned shrinkDict = 0;
/* Local variables */
const int displayLevel = parameters->zParams.notificationLevel;
unsigned iteration = 1;
@@ -616,22 +626,23 @@
unsigned k;
COVER_best_t best;
POOL_ctx *pool = NULL;
+ int warned = 0;
/* Checks */
if (splitPoint <= 0 || splitPoint > 1) {
LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect splitPoint\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (accel == 0 || accel > FASTCOVER_MAX_ACCEL) {
LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect accel\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (kMinK < kMaxD || kMaxK < kMinK) {
LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect k\n");
- return ERROR(GENERIC);
+ return ERROR(parameter_outOfBound);
}
if (nbSamples == 0) {
LOCALDISPLAYLEVEL(displayLevel, 1, "FASTCOVER must have at least one input file\n");
- return ERROR(GENERIC);
+ return ERROR(srcSize_wrong);
}
if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
LOCALDISPLAYLEVEL(displayLevel, 1, "dictBufferCapacity must be at least %u\n",
@@ -658,11 +669,18 @@
/* Initialize the context for this value of d */
FASTCOVER_ctx_t ctx;
LOCALDISPLAYLEVEL(displayLevel, 3, "d=%u\n", d);
- if (!FASTCOVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples, d, splitPoint, f, accelParams)) {
- LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to initialize context\n");
- COVER_best_destroy(&best);
- POOL_free(pool);
- return ERROR(GENERIC);
+ {
+ size_t const initVal = FASTCOVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples, d, splitPoint, f, accelParams);
+ if (ZSTD_isError(initVal)) {
+ LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to initialize context\n");
+ COVER_best_destroy(&best);
+ POOL_free(pool);
+ return initVal;
+ }
+ }
+ if (!warned) {
+ COVER_warnOnSmallCorpus(dictBufferCapacity, ctx.nbDmers, displayLevel);
+ warned = 1;
}
/* Loop through k reusing the same context */
for (k = kMinK; k <= kMaxK; k += kStepSize) {
@@ -675,7 +693,7 @@
COVER_best_destroy(&best);
FASTCOVER_ctx_destroy(&ctx);
POOL_free(pool);
- return ERROR(GENERIC);
+ return ERROR(memory_allocation);
}
data->ctx = &ctx;
data->best = &best;
@@ -685,6 +703,7 @@
data->parameters.d = d;
data->parameters.splitPoint = splitPoint;
data->parameters.steps = kSteps;
+ data->parameters.shrinkDict = shrinkDict;
data->parameters.zParams.notificationLevel = g_displayLevel;
/* Check the parameters */
if (!FASTCOVER_checkParameters(data->parameters, dictBufferCapacity,
--- a/contrib/python-zstandard/zstd/dictBuilder/zdict.c Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/dictBuilder/zdict.c Sun Sep 15 20:04:00 2019 -0700
@@ -741,7 +741,7 @@
/* analyze, build stats, starting with literals */
{ size_t maxNbBits = HUF_buildCTable (hufTable, countLit, 255, huffLog);
if (HUF_isError(maxNbBits)) {
- eSize = ERROR(GENERIC);
+ eSize = maxNbBits;
DISPLAYLEVEL(1, " HUF_buildCTable error \n");
goto _cleanup;
}
@@ -764,7 +764,7 @@
total=0; for (u=0; u<=offcodeMax; u++) total+=offcodeCount[u];
errorCode = FSE_normalizeCount(offcodeNCount, Offlog, offcodeCount, total, offcodeMax);
if (FSE_isError(errorCode)) {
- eSize = ERROR(GENERIC);
+ eSize = errorCode;
DISPLAYLEVEL(1, "FSE_normalizeCount error with offcodeCount \n");
goto _cleanup;
}
@@ -773,7 +773,7 @@
total=0; for (u=0; u<=MaxML; u++) total+=matchLengthCount[u];
errorCode = FSE_normalizeCount(matchLengthNCount, mlLog, matchLengthCount, total, MaxML);
if (FSE_isError(errorCode)) {
- eSize = ERROR(GENERIC);
+ eSize = errorCode;
DISPLAYLEVEL(1, "FSE_normalizeCount error with matchLengthCount \n");
goto _cleanup;
}
@@ -782,7 +782,7 @@
total=0; for (u=0; u<=MaxLL; u++) total+=litLengthCount[u];
errorCode = FSE_normalizeCount(litLengthNCount, llLog, litLengthCount, total, MaxLL);
if (FSE_isError(errorCode)) {
- eSize = ERROR(GENERIC);
+ eSize = errorCode;
DISPLAYLEVEL(1, "FSE_normalizeCount error with litLengthCount \n");
goto _cleanup;
}
@@ -791,7 +791,7 @@
/* write result to buffer */
{ size_t const hhSize = HUF_writeCTable(dstPtr, maxDstSize, hufTable, 255, huffLog);
if (HUF_isError(hhSize)) {
- eSize = ERROR(GENERIC);
+ eSize = hhSize;
DISPLAYLEVEL(1, "HUF_writeCTable error \n");
goto _cleanup;
}
@@ -802,7 +802,7 @@
{ size_t const ohSize = FSE_writeNCount(dstPtr, maxDstSize, offcodeNCount, OFFCODE_MAX, Offlog);
if (FSE_isError(ohSize)) {
- eSize = ERROR(GENERIC);
+ eSize = ohSize;
DISPLAYLEVEL(1, "FSE_writeNCount error with offcodeNCount \n");
goto _cleanup;
}
@@ -813,7 +813,7 @@
{ size_t const mhSize = FSE_writeNCount(dstPtr, maxDstSize, matchLengthNCount, MaxML, mlLog);
if (FSE_isError(mhSize)) {
- eSize = ERROR(GENERIC);
+ eSize = mhSize;
DISPLAYLEVEL(1, "FSE_writeNCount error with matchLengthNCount \n");
goto _cleanup;
}
@@ -824,7 +824,7 @@
{ size_t const lhSize = FSE_writeNCount(dstPtr, maxDstSize, litLengthNCount, MaxLL, llLog);
if (FSE_isError(lhSize)) {
- eSize = ERROR(GENERIC);
+ eSize = lhSize;
DISPLAYLEVEL(1, "FSE_writeNCount error with litlengthNCount \n");
goto _cleanup;
}
@@ -834,7 +834,7 @@
}
if (maxDstSize<12) {
- eSize = ERROR(GENERIC);
+ eSize = ERROR(dstSize_tooSmall);
DISPLAYLEVEL(1, "not enough space to write RepOffsets \n");
goto _cleanup;
}
--- a/contrib/python-zstandard/zstd/dictBuilder/zdict.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/dictBuilder/zdict.h Sun Sep 15 20:04:00 2019 -0700
@@ -46,7 +46,12 @@
* The resulting dictionary will be saved into `dictBuffer`.
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
* or an error code, which can be tested with ZDICT_isError().
- * Note: ZDICT_trainFromBuffer() requires about 9 bytes of memory for each input byte.
+ * Note: Dictionary training will fail if there are not enough samples to construct a
+ * dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
+ * If dictionary training fails, you should use zstd without a dictionary, as the dictionary
+ * would've been ineffective anyways. If you believe your samples would benefit from a dictionary
+ * please open an issue with details, and we can look into it.
+ * Note: ZDICT_trainFromBuffer()'s memory usage is about 6 MB.
* Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
* It's possible to select smaller or larger size, just by specifying `dictBufferCapacity`.
* In general, it's recommended to provide a few thousands samples, though this can vary a lot.
@@ -89,6 +94,8 @@
unsigned steps; /* Number of steps : Only used for optimization : 0 means default (40) : Higher means more parameters checked */
unsigned nbThreads; /* Number of threads : constraint: 0 < nbThreads : 1 means single-threaded : Only used for optimization : Ignored if ZSTD_MULTITHREAD is not defined */
double splitPoint; /* Percentage of samples used for training: Only used for optimization : the first nbSamples * splitPoint samples will be used to training, the last nbSamples * (1 - splitPoint) samples will be used for testing, 0 means default (1.0), 1.0 when all samples are used for both training and testing */
+ unsigned shrinkDict; /* Train dictionaries to shrink in size starting from the minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the largest dictionary. 0 means no shrinking and 1 means shrinking */
+ unsigned shrinkDictMaxRegression; /* Sets shrinkDictMaxRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the max dict size dictionary. */
ZDICT_params_t zParams;
} ZDICT_cover_params_t;
@@ -100,6 +107,9 @@
unsigned nbThreads; /* Number of threads : constraint: 0 < nbThreads : 1 means single-threaded : Only used for optimization : Ignored if ZSTD_MULTITHREAD is not defined */
double splitPoint; /* Percentage of samples used for training: Only used for optimization : the first nbSamples * splitPoint samples will be used to training, the last nbSamples * (1 - splitPoint) samples will be used for testing, 0 means default (0.75), 1.0 when all samples are used for both training and testing */
unsigned accel; /* Acceleration level: constraint: 0 < accel <= 10, higher means faster and less accurate, 0 means default(1) */
+ unsigned shrinkDict; /* Train dictionaries to shrink in size starting from the minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the largest dictionary. 0 means no shrinking and 1 means shrinking */
+ unsigned shrinkDictMaxRegression; /* Sets shrinkDictMaxRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the max dict size dictionary. */
+
ZDICT_params_t zParams;
} ZDICT_fastCover_params_t;
@@ -110,6 +120,7 @@
* The resulting dictionary will be saved into `dictBuffer`.
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
* or an error code, which can be tested with ZDICT_isError().
+ * See ZDICT_trainFromBuffer() for details on failure modes.
* Note: ZDICT_trainFromBuffer_cover() requires about 9 bytes of memory for each input byte.
* Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
* It's possible to select smaller or larger size, just by specifying `dictBufferCapacity`.
@@ -133,8 +144,9 @@
* If k is non-zero then we don't check multiple values of k, otherwise we check steps values in [50, 2000].
*
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
- * or an error code, which can be tested with ZDICT_isError().
- * On success `*parameters` contains the parameters selected.
+ * or an error code, which can be tested with ZDICT_isError().
+ * On success `*parameters` contains the parameters selected.
+ * See ZDICT_trainFromBuffer() for details on failure modes.
* Note: ZDICT_optimizeTrainFromBuffer_cover() requires about 8 bytes of memory for each input byte and additionally another 5 bytes of memory for each byte of memory for each thread.
*/
ZDICTLIB_API size_t ZDICT_optimizeTrainFromBuffer_cover(
@@ -151,7 +163,8 @@
* The resulting dictionary will be saved into `dictBuffer`.
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
* or an error code, which can be tested with ZDICT_isError().
- * Note: ZDICT_trainFromBuffer_fastCover() requires about 1 bytes of memory for each input byte and additionally another 6 * 2^f bytes of memory .
+ * See ZDICT_trainFromBuffer() for details on failure modes.
+ * Note: ZDICT_trainFromBuffer_fastCover() requires 6 * 2^f bytes of memory.
* Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
* It's possible to select smaller or larger size, just by specifying `dictBufferCapacity`.
* In general, it's recommended to provide a few thousands samples, though this can vary a lot.
@@ -175,9 +188,10 @@
* If accel is zero, default value of 1 is used.
*
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
- * or an error code, which can be tested with ZDICT_isError().
- * On success `*parameters` contains the parameters selected.
- * Note: ZDICT_optimizeTrainFromBuffer_fastCover() requires about 1 byte of memory for each input byte and additionally another 6 * 2^f bytes of memory for each thread.
+ * or an error code, which can be tested with ZDICT_isError().
+ * On success `*parameters` contains the parameters selected.
+ * See ZDICT_trainFromBuffer() for details on failure modes.
+ * Note: ZDICT_optimizeTrainFromBuffer_fastCover() requires about 6 * 2^f bytes of memory for each thread.
*/
ZDICTLIB_API size_t ZDICT_optimizeTrainFromBuffer_fastCover(void* dictBuffer,
size_t dictBufferCapacity, const void* samplesBuffer,
@@ -195,7 +209,7 @@
* maxDictSize must be >= dictContentSize, and must be >= ZDICT_DICTSIZE_MIN bytes.
*
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`),
- * or an error code, which can be tested by ZDICT_isError().
+ * or an error code, which can be tested by ZDICT_isError().
* Note: ZDICT_finalizeDictionary() will push notifications into stderr if instructed to, using notificationLevel>0.
* Note 2: dictBuffer and dictContent can overlap
*/
@@ -219,6 +233,7 @@
* `parameters` is optional and can be provided with values set to 0 to mean "default".
* @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
* or an error code, which can be tested with ZDICT_isError().
+ * See ZDICT_trainFromBuffer() for details on failure modes.
* Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
* It's possible to select smaller or larger size, just by specifying `dictBufferCapacity`.
* In general, it's recommended to provide a few thousands samples, though this can vary a lot.
--- a/contrib/python-zstandard/zstd/zstd.h Sun Sep 15 00:07:30 2019 -0400
+++ b/contrib/python-zstandard/zstd/zstd.h Sun Sep 15 20:04:00 2019 -0700
@@ -70,8 +70,8 @@
/*------ Version ------*/
#define ZSTD_VERSION_MAJOR 1
-#define ZSTD_VERSION_MINOR 3
-#define ZSTD_VERSION_RELEASE 8
+#define ZSTD_VERSION_MINOR 4
+#define ZSTD_VERSION_RELEASE 3
#define ZSTD_VERSION_NUMBER (ZSTD_VERSION_MAJOR *100*100 + ZSTD_VERSION_MINOR *100 + ZSTD_VERSION_RELEASE)
ZSTDLIB_API unsigned ZSTD_versionNumber(void); /**< to check runtime library version */
@@ -82,13 +82,28 @@
#define ZSTD_VERSION_STRING ZSTD_EXPAND_AND_QUOTE(ZSTD_LIB_VERSION)
ZSTDLIB_API const char* ZSTD_versionString(void); /* requires v1.3.0+ */
-/***************************************
-* Default constant
-***************************************/
+/* *************************************
+ * Default constant
+ ***************************************/
#ifndef ZSTD_CLEVEL_DEFAULT
# define ZSTD_CLEVEL_DEFAULT 3
#endif
+/* *************************************
+ * Constants
+ ***************************************/
+
+/* All magic numbers are supposed read/written to/from files/memory using little-endian convention */
+#define ZSTD_MAGICNUMBER 0xFD2FB528 /* valid since v0.8.0 */
+#define ZSTD_MAGIC_DICTIONARY 0xEC30A437 /* valid since v0.7.0 */
+#define ZSTD_MAGIC_SKIPPABLE_START 0x184D2A50 /* all 16 values, from 0x184D2A50 to 0x184D2A5F, signal the beginning of a skippable frame */
+#define ZSTD_MAGIC_SKIPPABLE_MASK 0xFFFFFFF0
+
+#define ZSTD_BLOCKSIZELOG_MAX 17
+#define ZSTD_BLOCKSIZE_MAX (1<<ZSTD_BLOCKSIZELOG_MAX)
+
+
+
/***************************************
* Simple API
***************************************/
@@ -145,12 +160,21 @@
* @return : decompressed size of `src` frame content _if known and not empty_, 0 otherwise. */
ZSTDLIB_API unsigned long long ZSTD_getDecompressedSize(const void* src, size_t srcSize);
+/*! ZSTD_findFrameCompressedSize() :
+ * `src` should point to the start of a ZSTD frame or skippable frame.
+ * `srcSize` must be >= first frame size
+ * @return : the compressed size of the first frame starting at `src`,
+ * suitable to pass as `srcSize` to `ZSTD_decompress` or similar,
+ * or an error code if input is invalid */
+ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize);
+
/*====== Helper functions ======*/
#define ZSTD_COMPRESSBOUND(srcSize) ((srcSize) + ((srcSize)>>8) + (((srcSize) < (128<<10)) ? (((128<<10) - (srcSize)) >> 11) /* margin, from 64 to 0 */ : 0)) /* this formula ensures that bound(A) + bound(B) <= bound(A+B) as long as A and B >= 128 KB */
ZSTDLIB_API size_t ZSTD_compressBound(size_t srcSize); /*!< maximum compressed size in worst case single-pass scenario */
ZSTDLIB_API unsigned ZSTD_isError(size_t code); /*!< tells if a `size_t` function result is an error code */
ZSTDLIB_API const char* ZSTD_getErrorName(size_t code); /*!< provides readable string from an error code */
+ZSTDLIB_API int ZSTD_minCLevel(void); /*!< minimum negative compression level allowed */
ZSTDLIB_API int ZSTD_maxCLevel(void); /*!< maximum compression level available */
@@ -159,9 +183,14 @@
***************************************/
/*= Compression context
* When compressing many times,
- * it is recommended to allocate a context just once, and re-use it for each successive compression operation.
+ * it is recommended to allocate a context just once,
+ * and re-use it for each successive compression operation.
* This will make workload friendlier for system's memory.
- * Use one context per thread for parallel execution in multi-threaded environments. */
+ * Note : re-using context is just a speed / resource optimization.
+ * It doesn't change the compression ratio, which remains identical.
+ * Note 2 : In multi-threaded environments,
+ * use one different context per thread for parallel execution.
+ */
typedef struct ZSTD_CCtx_s ZSTD_CCtx;
ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void);
ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx);
@@ -195,279 +224,6 @@
const void* src, size_t srcSize);
-/**************************
-* Simple dictionary API
-***************************/
-/*! ZSTD_compress_usingDict() :
- * Compression at an explicit compression level using a Dictionary.
- * A dictionary can be any arbitrary data segment (also called a prefix),
- * or a buffer with specified information (see dictBuilder/zdict.h).
- * Note : This function loads the dictionary, resulting in significant startup delay.
- * It's intended for a dictionary used only once.
- * Note 2 : When `dict == NULL || dictSize < 8` no dictionary is used. */
-ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- const void* dict,size_t dictSize,
- int compressionLevel);
-
-/*! ZSTD_decompress_usingDict() :
- * Decompression using a known Dictionary.
- * Dictionary must be identical to the one used during compression.
- * Note : This function loads the dictionary, resulting in significant startup delay.
- * It's intended for a dictionary used only once.
- * Note : When `dict == NULL || dictSize < 8` no dictionary is used. */
-ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- const void* dict,size_t dictSize);
-
-
-/***********************************
- * Bulk processing dictionary API
- **********************************/
-typedef struct ZSTD_CDict_s ZSTD_CDict;
-
-/*! ZSTD_createCDict() :
- * When compressing multiple messages / blocks using the same dictionary, it's recommended to load it only once.
- * ZSTD_createCDict() will create a digested dictionary, ready to start future compression operations without startup cost.
- * ZSTD_CDict can be created once and shared by multiple threads concurrently, since its usage is read-only.
- * `dictBuffer` can be released after ZSTD_CDict creation, because its content is copied within CDict.
- * Consider experimental function `ZSTD_createCDict_byReference()` if you prefer to not duplicate `dictBuffer` content.
- * Note : A ZSTD_CDict can be created from an empty dictBuffer, but it is inefficient when used to compress small data. */
-ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dictBuffer, size_t dictSize,
- int compressionLevel);
-
-/*! ZSTD_freeCDict() :
- * Function frees memory allocated by ZSTD_createCDict(). */
-ZSTDLIB_API size_t ZSTD_freeCDict(ZSTD_CDict* CDict);
-
-/*! ZSTD_compress_usingCDict() :
- * Compression using a digested Dictionary.
- * Recommended when same dictionary is used multiple times.
- * Note : compression level is _decided at dictionary creation time_,
- * and frame parameters are hardcoded (dictID=yes, contentSize=yes, checksum=no) */
-ZSTDLIB_API size_t ZSTD_compress_usingCDict(ZSTD_CCtx* cctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- const ZSTD_CDict* cdict);
-
-
-typedef struct ZSTD_DDict_s ZSTD_DDict;
-
-/*! ZSTD_createDDict() :
- * Create a digested dictionary, ready to start decompression operation without startup delay.
- * dictBuffer can be released after DDict creation, as its content is copied inside DDict. */
-ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dictBuffer, size_t dictSize);
-
-/*! ZSTD_freeDDict() :
- * Function frees memory allocated with ZSTD_createDDict() */
-ZSTDLIB_API size_t ZSTD_freeDDict(ZSTD_DDict* ddict);
-
-/*! ZSTD_decompress_usingDDict() :
- * Decompression using a digested Dictionary.
- * Recommended when same dictionary is used multiple times. */
-ZSTDLIB_API size_t ZSTD_decompress_usingDDict(ZSTD_DCtx* dctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize,
- const ZSTD_DDict* ddict);
-
-
-/****************************
-* Streaming
-****************************/
-
-typedef struct ZSTD_inBuffer_s {
- const void* src; /**< start of input buffer */
- size_t size; /**< size of input buffer */
- size_t pos; /**< position where reading stopped. Will be updated. Necessarily 0 <= pos <= size */
-} ZSTD_inBuffer;
-
-typedef struct ZSTD_outBuffer_s {
- void* dst; /**< start of output buffer */
- size_t size; /**< size of output buffer */
- size_t pos; /**< position where writing stopped. Will be updated. Necessarily 0 <= pos <= size */
-} ZSTD_outBuffer;
-
-
-
-/*-***********************************************************************
-* Streaming compression - HowTo
-*
-* A ZSTD_CStream object is required to track streaming operation.
-* Use ZSTD_createCStream() and ZSTD_freeCStream() to create/release resources.
-* ZSTD_CStream objects can be reused multiple times on consecutive compression operations.
-* It is recommended to re-use ZSTD_CStream since it will play nicer with system's memory, by re-using already allocated memory.
-*
-* For parallel execution, use one separate ZSTD_CStream per thread.
-*
-* note : since v1.3.0, ZSTD_CStream and ZSTD_CCtx are the same thing.
-*
-* Parameters are sticky : when starting a new compression on the same context,
-* it will re-use the same sticky parameters as previous compression session.
-* When in doubt, it's recommended to fully initialize the context before usage.
-* Use ZSTD_initCStream() to set the parameter to a selected compression level.
-* Use advanced API (ZSTD_CCtx_setParameter(), etc.) to set more specific parameters.
-*
-* Use ZSTD_compressStream() as many times as necessary to consume input stream.
-* The function will automatically update both `pos` fields within `input` and `output`.
-* Note that the function may not consume the entire input,
-* for example, because the output buffer is already full,
-* in which case `input.pos < input.size`.
-* The caller must check if input has been entirely consumed.
-* If not, the caller must make some room to receive more compressed data,
-* and then present again remaining input data.
-* @return : a size hint, preferred nb of bytes to use as input for next function call
-* or an error code, which can be tested using ZSTD_isError().
-* Note 1 : it's just a hint, to help latency a little, any value will work fine.
-* Note 2 : size hint is guaranteed to be <= ZSTD_CStreamInSize()
-*
-* At any moment, it's possible to flush whatever data might remain stuck within internal buffer,
-* using ZSTD_flushStream(). `output->pos` will be updated.
-* Note that, if `output->size` is too small, a single invocation of ZSTD_flushStream() might not be enough (return code > 0).
-* In which case, make some room to receive more compressed data, and call again ZSTD_flushStream().
-* @return : 0 if internal buffers are entirely flushed,
-* >0 if some data still present within internal buffer (the value is minimal estimation of remaining size),
-* or an error code, which can be tested using ZSTD_isError().
-*
-* ZSTD_endStream() instructs to finish a frame.
-* It will perform a flush and write frame epilogue.
-* The epilogue is required for decoders to consider a frame completed.
-* flush() operation is the same, and follows same rules as ZSTD_flushStream().
-* @return : 0 if frame fully completed and fully flushed,
-* >0 if some data still present within internal buffer (the value is minimal estimation of remaining size),
-* or an error code, which can be tested using ZSTD_isError().
-*
-* *******************************************************************/
-
-typedef ZSTD_CCtx ZSTD_CStream; /**< CCtx and CStream are now effectively same object (>= v1.3.0) */
- /* Continue to distinguish them for compatibility with older versions <= v1.2.0 */
-/*===== ZSTD_CStream management functions =====*/
-ZSTDLIB_API ZSTD_CStream* ZSTD_createCStream(void);
-ZSTDLIB_API size_t ZSTD_freeCStream(ZSTD_CStream* zcs);
-
-/*===== Streaming compression functions =====*/
-ZSTDLIB_API size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLevel);
-ZSTDLIB_API size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
-ZSTDLIB_API size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output);
-ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output);
-
-ZSTDLIB_API size_t ZSTD_CStreamInSize(void); /**< recommended size for input buffer */
-ZSTDLIB_API size_t ZSTD_CStreamOutSize(void); /**< recommended size for output buffer. Guarantee to successfully flush at least one complete compressed block in all circumstances. */
-
-
-
-/*-***************************************************************************
-* Streaming decompression - HowTo
-*
-* A ZSTD_DStream object is required to track streaming operations.
-* Use ZSTD_createDStream() and ZSTD_freeDStream() to create/release resources.
-* ZSTD_DStream objects can be re-used multiple times.
-*
-* Use ZSTD_initDStream() to start a new decompression operation.
-* @return : recommended first input size
-* Alternatively, use advanced API to set specific properties.
-*
-* Use ZSTD_decompressStream() repetitively to consume your input.
-* The function will update both `pos` fields.
-* If `input.pos < input.size`, some input has not been consumed.
-* It's up to the caller to present again remaining data.
-* The function tries to flush all data decoded immediately, respecting output buffer size.
-* If `output.pos < output.size`, decoder has flushed everything it could.
-* But if `output.pos == output.size`, there might be some data left within internal buffers.,
-* In which case, call ZSTD_decompressStream() again to flush whatever remains in the buffer.
-* Note : with no additional input provided, amount of data flushed is necessarily <= ZSTD_BLOCKSIZE_MAX.
-* @return : 0 when a frame is completely decoded and fully flushed,
-* or an error code, which can be tested using ZSTD_isError(),
-* or any other value > 0, which means there is still some decoding or flushing to do to complete current frame :
-* the return value is a suggested next input size (just a hint for better latency)
-* that will never request more than the remaining frame size.
-* *******************************************************************************/
-
-typedef ZSTD_DCtx ZSTD_DStream; /**< DCtx and DStream are now effectively same object (>= v1.3.0) */
- /* For compatibility with versions <= v1.2.0, prefer differentiating them. */
-/*===== ZSTD_DStream management functions =====*/
-ZSTDLIB_API ZSTD_DStream* ZSTD_createDStream(void);
-ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds);
-
-/*===== Streaming decompression functions =====*/
-ZSTDLIB_API size_t ZSTD_initDStream(ZSTD_DStream* zds);
-ZSTDLIB_API size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
-
-ZSTDLIB_API size_t ZSTD_DStreamInSize(void); /*!< recommended size for input buffer */
-ZSTDLIB_API size_t ZSTD_DStreamOutSize(void); /*!< recommended size for output buffer. Guarantee to successfully flush at least one complete block in all circumstances. */
-
-#endif /* ZSTD_H_235446 */
-
-
-
-
-/****************************************************************************************
- * ADVANCED AND EXPERIMENTAL FUNCTIONS
- ****************************************************************************************
- * The definitions in the following section are considered experimental.
- * They are provided for advanced scenarios.
- * They should never be used with a dynamic library, as prototypes may change in the future.
- * Use them only in association with static linking.
- * ***************************************************************************************/
-
-#if defined(ZSTD_STATIC_LINKING_ONLY) && !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
-#define ZSTD_H_ZSTD_STATIC_LINKING_ONLY
-
-
-/****************************************************************************************
- * Candidate API for promotion to stable status
- ****************************************************************************************
- * The following symbols and constants form the "staging area" :
- * they are considered to join "stable API" by v1.4.0.
- * The proposal is written so that it can be made stable "as is",
- * though it's still possible to suggest improvements.
- * Staging is in fact last chance for changes,
- * the API is locked once reaching "stable" status.
- * ***************************************************************************************/
-
-
-/* === Constants === */
-
-/* all magic numbers are supposed read/written to/from files/memory using little-endian convention */
-#define ZSTD_MAGICNUMBER 0xFD2FB528 /* valid since v0.8.0 */
-#define ZSTD_MAGIC_DICTIONARY 0xEC30A437 /* valid since v0.7.0 */
-#define ZSTD_MAGIC_SKIPPABLE_START 0x184D2A50 /* all 16 values, from 0x184D2A50 to 0x184D2A5F, signal the beginning of a skippable frame */
-#define ZSTD_MAGIC_SKIPPABLE_MASK 0xFFFFFFF0
-
-#define ZSTD_BLOCKSIZELOG_MAX 17
-#define ZSTD_BLOCKSIZE_MAX (1<<ZSTD_BLOCKSIZELOG_MAX)
-
-
-/* === query limits === */
-
-ZSTDLIB_API int ZSTD_minCLevel(void); /*!< minimum negative compression level allowed */
-
-
-/* === frame size === */
-
-/*! ZSTD_findFrameCompressedSize() :
- * `src` should point to the start of a ZSTD frame or skippable frame.
- * `srcSize` must be >= first frame size
- * @return : the compressed size of the first frame starting at `src`,
- * suitable to pass as `srcSize` to `ZSTD_decompress` or similar,
- * or an error code if input is invalid */
-ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize);
-
-
-/* === Memory management === */
-
-/*! ZSTD_sizeof_*() :
- * These functions give the _current_ memory usage of selected object.
- * Note that object memory usage can evolve (increase or decrease) over time. */
-ZSTDLIB_API size_t ZSTD_sizeof_CCtx(const ZSTD_CCtx* cctx);
-ZSTDLIB_API size_t ZSTD_sizeof_DCtx(const ZSTD_DCtx* dctx);
-ZSTDLIB_API size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs);
-ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DStream* zds);
-ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict);
-ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
-
-
/***************************************
* Advanced compression API
***************************************/
@@ -503,7 +259,10 @@
typedef enum {
- /* compression parameters */
+ /* compression parameters
+ * Note: When compressing with a ZSTD_CDict these parameters are superseded
+ * by the parameters used to construct the ZSTD_CDict. See ZSTD_CCtx_refCDict()
+ * for more info (superseded-by-cdict). */
ZSTD_c_compressionLevel=100, /* Update all compression parameters according to pre-defined cLevel table
* Default level is ZSTD_CLEVEL_DEFAULT==3.
* Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT.
@@ -625,6 +384,8 @@
* ZSTD_c_format
* ZSTD_c_forceMaxWindow
* ZSTD_c_forceAttachDict
+ * ZSTD_c_literalCompressionMode
+ * ZSTD_c_targetCBlockSize
* Because they are not stable, it's necessary to define ZSTD_STATIC_LINKING_ONLY to access them.
* note : never ever use experimentalParam? names directly;
* also, the enums values themselves are unstable and can still change.
@@ -632,10 +393,11 @@
ZSTD_c_experimentalParam1=500,
ZSTD_c_experimentalParam2=10,
ZSTD_c_experimentalParam3=1000,
- ZSTD_c_experimentalParam4=1001
+ ZSTD_c_experimentalParam4=1001,
+ ZSTD_c_experimentalParam5=1002,
+ ZSTD_c_experimentalParam6=1003,
} ZSTD_cParameter;
-
typedef struct {
size_t error;
int lowerBound;
@@ -677,10 +439,443 @@
* Note 3 : Whenever all input data is provided and consumed in a single round,
* for example with ZSTD_compress2(),
* or invoking immediately ZSTD_compressStream2(,,,ZSTD_e_end),
- * this value is automatically overriden by srcSize instead.
+ * this value is automatically overridden by srcSize instead.
*/
ZSTDLIB_API size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long pledgedSrcSize);
+typedef enum {
+ ZSTD_reset_session_only = 1,
+ ZSTD_reset_parameters = 2,
+ ZSTD_reset_session_and_parameters = 3
+} ZSTD_ResetDirective;
+
+/*! ZSTD_CCtx_reset() :
+ * There are 2 different things that can be reset, independently or jointly :
+ * - The session : will stop compressing current frame, and make CCtx ready to start a new one.
+ * Useful after an error, or to interrupt any ongoing compression.
+ * Any internal data not yet flushed is cancelled.
+ * Compression parameters and dictionary remain unchanged.
+ * They will be used to compress next frame.
+ * Resetting session never fails.
+ * - The parameters : changes all parameters back to "default".
+ * This removes any reference to any dictionary too.
+ * Parameters can only be changed between 2 sessions (i.e. no compression is currently ongoing)
+ * otherwise the reset fails, and function returns an error value (which can be tested using ZSTD_isError())
+ * - Both : similar to resetting the session, followed by resetting parameters.
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDirective reset);
+
+/*! ZSTD_compress2() :
+ * Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
+ * ZSTD_compress2() always starts a new frame.
+ * Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
+ * - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
+ * - The function is always blocking, returns when compression is completed.
+ * Hint : compression runs faster if `dstCapacity` >= `ZSTD_compressBound(srcSize)`.
+ * @return : compressed size written into `dst` (<= `dstCapacity),
+ * or an error code if it fails (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize);
+
+
+/***************************************
+* Advanced decompression API
+***************************************/
+
+/* The advanced API pushes parameters one by one into an existing DCtx context.
+ * Parameters are sticky, and remain valid for all following frames
+ * using the same DCtx context.
+ * It's possible to reset parameters to default values using ZSTD_DCtx_reset().
+ * Note : This API is compatible with existing ZSTD_decompressDCtx() and ZSTD_decompressStream().
+ * Therefore, no new decompression function is necessary.
+ */
+
+typedef enum {
+
+ ZSTD_d_windowLogMax=100, /* Select a size limit (in power of 2) beyond which
+ * the streaming API will refuse to allocate memory buffer
+ * in order to protect the host from unreasonable memory requirements.
+ * This parameter is only useful in streaming mode, since no internal buffer is allocated in single-pass mode.
+ * By default, a decompression context accepts window sizes <= (1 << ZSTD_WINDOWLOG_LIMIT_DEFAULT).
+ * Special: value 0 means "use default maximum windowLog". */
+
+ /* note : additional experimental parameters are also available
+ * within the experimental section of the API.
+ * At the time of this writing, they include :
+ * ZSTD_c_format
+ * Because they are not stable, it's necessary to define ZSTD_STATIC_LINKING_ONLY to access them.
+ * note : never ever use experimentalParam? names directly
+ */
+ ZSTD_d_experimentalParam1=1000
+
+} ZSTD_dParameter;
+
+/*! ZSTD_dParam_getBounds() :
+ * All parameters must belong to an interval with lower and upper bounds,
+ * otherwise they will either trigger an error or be automatically clamped.
+ * @return : a structure, ZSTD_bounds, which contains
+ * - an error status field, which must be tested using ZSTD_isError()
+ * - both lower and upper bounds, inclusive
+ */
+ZSTDLIB_API ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dParam);
+
+/*! ZSTD_DCtx_setParameter() :
+ * Set one compression parameter, selected by enum ZSTD_dParameter.
+ * All parameters have valid bounds. Bounds can be queried using ZSTD_dParam_getBounds().
+ * Providing a value beyond bound will either clamp it, or trigger an error (depending on parameter).
+ * Setting a parameter is only possible during frame initialization (before starting decompression).
+ * @return : 0, or an error code (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_dParameter param, int value);
+
+/*! ZSTD_DCtx_reset() :
+ * Return a DCtx to clean state.
+ * Session and parameters can be reset jointly or separately.
+ * Parameters can only be reset when no active frame is being decompressed.
+ * @return : 0, or an error code, which can be tested with ZSTD_isError()
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDirective reset);
+
+
+/****************************
+* Streaming
+****************************/
+
+typedef struct ZSTD_inBuffer_s {
+ const void* src; /**< start of input buffer */
+ size_t size; /**< size of input buffer */
+ size_t pos; /**< position where reading stopped. Will be updated. Necessarily 0 <= pos <= size */
+} ZSTD_inBuffer;
+
+typedef struct ZSTD_outBuffer_s {
+ void* dst; /**< start of output buffer */
+ size_t size; /**< size of output buffer */
+ size_t pos; /**< position where writing stopped. Will be updated. Necessarily 0 <= pos <= size */
+} ZSTD_outBuffer;
+
+
+
+/*-***********************************************************************
+* Streaming compression - HowTo
+*
+* A ZSTD_CStream object is required to track streaming operation.
+* Use ZSTD_createCStream() and ZSTD_freeCStream() to create/release resources.
+* ZSTD_CStream objects can be reused multiple times on consecutive compression operations.
+* It is recommended to re-use ZSTD_CStream since it will play nicer with system's memory, by re-using already allocated memory.
+*
+* For parallel execution, use one separate ZSTD_CStream per thread.
+*
+* note : since v1.3.0, ZSTD_CStream and ZSTD_CCtx are the same thing.
+*
+* Parameters are sticky : when starting a new compression on the same context,
+* it will re-use the same sticky parameters as previous compression session.
+* When in doubt, it's recommended to fully initialize the context before usage.
+* Use ZSTD_CCtx_reset() to reset the context and ZSTD_CCtx_setParameter(),
+* ZSTD_CCtx_setPledgedSrcSize(), or ZSTD_CCtx_loadDictionary() and friends to
+* set more specific parameters, the pledged source size, or load a dictionary.
+*
+* Use ZSTD_compressStream2() with ZSTD_e_continue as many times as necessary to
+* consume input stream. The function will automatically update both `pos`
+* fields within `input` and `output`.
+* Note that the function may not consume the entire input, for example, because
+* the output buffer is already full, in which case `input.pos < input.size`.
+* The caller must check if input has been entirely consumed.
+* If not, the caller must make some room to receive more compressed data,
+* and then present again remaining input data.
+* note: ZSTD_e_continue is guaranteed to make some forward progress when called,
+* but doesn't guarantee maximal forward progress. This is especially relevant
+* when compressing with multiple threads. The call won't block if it can
+* consume some input, but if it can't it will wait for some, but not all,
+* output to be flushed.
+* @return : provides a minimum amount of data remaining to be flushed from internal buffers
+* or an error code, which can be tested using ZSTD_isError().
+*
+* At any moment, it's possible to flush whatever data might remain stuck within internal buffer,
+* using ZSTD_compressStream2() with ZSTD_e_flush. `output->pos` will be updated.
+* Note that, if `output->size` is too small, a single invocation with ZSTD_e_flush might not be enough (return code > 0).
+* In which case, make some room to receive more compressed data, and call again ZSTD_compressStream2() with ZSTD_e_flush.
+* You must continue calling ZSTD_compressStream2() with ZSTD_e_flush until it returns 0, at which point you can change the
+* operation.
+* note: ZSTD_e_flush will flush as much output as possible, meaning when compressing with multiple threads, it will
+* block until the flush is complete or the output buffer is full.
+* @return : 0 if internal buffers are entirely flushed,
+* >0 if some data still present within internal buffer (the value is minimal estimation of remaining size),
+* or an error code, which can be tested using ZSTD_isError().
+*
+* Calling ZSTD_compressStream2() with ZSTD_e_end instructs to finish a frame.
+* It will perform a flush and write frame epilogue.
+* The epilogue is required for decoders to consider a frame completed.
+* flush operation is the same, and follows same rules as calling ZSTD_compressStream2() with ZSTD_e_flush.
+* You must continue calling ZSTD_compressStream2() with ZSTD_e_end until it returns 0, at which point you are free to
+* start a new frame.
+* note: ZSTD_e_end will flush as much output as possible, meaning when compressing with multiple threads, it will
+* block until the flush is complete or the output buffer is full.
+* @return : 0 if frame fully completed and fully flushed,
+* >0 if some data still present within internal buffer (the value is minimal estimation of remaining size),
+* or an error code, which can be tested using ZSTD_isError().
+*
+* *******************************************************************/
+
+typedef ZSTD_CCtx ZSTD_CStream; /**< CCtx and CStream are now effectively same object (>= v1.3.0) */
+ /* Continue to distinguish them for compatibility with older versions <= v1.2.0 */
+/*===== ZSTD_CStream management functions =====*/
+ZSTDLIB_API ZSTD_CStream* ZSTD_createCStream(void);
+ZSTDLIB_API size_t ZSTD_freeCStream(ZSTD_CStream* zcs);
+
+/*===== Streaming compression functions =====*/
+typedef enum {
+ ZSTD_e_continue=0, /* collect more data, encoder decides when to output compressed result, for optimal compression ratio */
+ ZSTD_e_flush=1, /* flush any data provided so far,
+ * it creates (at least) one new block, that can be decoded immediately on reception;
+ * frame will continue: any future data can still reference previously compressed data, improving compression.
+ * note : multithreaded compression will block to flush as much output as possible. */
+ ZSTD_e_end=2 /* flush any remaining data _and_ close current frame.
+ * note that frame is only closed after compressed data is fully flushed (return value == 0).
+ * After that point, any additional data starts a new frame.
+ * note : each frame is independent (does not reference any content from previous frame).
+ : note : multithreaded compression will block to flush as much output as possible. */
+} ZSTD_EndDirective;
+
+/*! ZSTD_compressStream2() :
+ * Behaves about the same as ZSTD_compressStream, with additional control on end directive.
+ * - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
+ * - Compression parameters cannot be changed once compression is started (save a list of exceptions in multi-threading mode)
+ * - output->pos must be <= dstCapacity, input->pos must be <= srcSize
+ * - output->pos and input->pos will be updated. They are guaranteed to remain below their respective limit.
+ * - When nbWorkers==0 (default), function is blocking : it completes its job before returning to caller.
+ * - When nbWorkers>=1, function is non-blocking : it just acquires a copy of input, and distributes jobs to internal worker threads, flush whatever is available,
+ * and then immediately returns, just indicating that there is some data remaining to be flushed.
+ * The function nonetheless guarantees forward progress : it will return only after it reads or write at least 1+ byte.
+ * - Exception : if the first call requests a ZSTD_e_end directive and provides enough dstCapacity, the function delegates to ZSTD_compress2() which is always blocking.
+ * - @return provides a minimum amount of data remaining to be flushed from internal buffers
+ * or an error code, which can be tested using ZSTD_isError().
+ * if @return != 0, flush is not fully completed, there is still some data left within internal buffers.
+ * This is useful for ZSTD_e_flush, since in this case more flushes are necessary to empty all buffers.
+ * For ZSTD_e_end, @return == 0 when internal buffers are fully flushed and frame is completed.
+ * - after a ZSTD_e_end directive, if internal buffer is not fully flushed (@return != 0),
+ * only ZSTD_e_end or ZSTD_e_flush operations are allowed.
+ * Before starting a new compression job, or changing compression parameters,
+ * it is required to fully flush internal buffers.
+ */
+ZSTDLIB_API size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
+ ZSTD_outBuffer* output,
+ ZSTD_inBuffer* input,
+ ZSTD_EndDirective endOp);
+
+
+/* These buffer sizes are softly recommended.
+ * They are not required : ZSTD_compressStream*() happily accepts any buffer size, for both input and output.
+ * Respecting the recommended size just makes it a bit easier for ZSTD_compressStream*(),
+ * reducing the amount of memory shuffling and buffering, resulting in minor performance savings.
+ *
+ * However, note that these recommendations are from the perspective of a C caller program.
+ * If the streaming interface is invoked from some other language,
+ * especially managed ones such as Java or Go, through a foreign function interface such as jni or cgo,
+ * a major performance rule is to reduce crossing such interface to an absolute minimum.
+ * It's not rare that performance ends being spent more into the interface, rather than compression itself.
+ * In which cases, prefer using large buffers, as large as practical,
+ * for both input and output, to reduce the nb of roundtrips.
+ */
+ZSTDLIB_API size_t ZSTD_CStreamInSize(void); /**< recommended size for input buffer */
+ZSTDLIB_API size_t ZSTD_CStreamOutSize(void); /**< recommended size for output buffer. Guarantee to successfully flush at least one complete compressed block. */
+
+
+/* *****************************************************************************
+ * This following is a legacy streaming API.
+ * It can be replaced by ZSTD_CCtx_reset() and ZSTD_compressStream2().
+ * It is redundant, but remains fully supported.
+ * Advanced parameters and dictionary compression can only be used through the
+ * new API.
+ ******************************************************************************/
+
+/*!
+ * Equivalent to:
+ *
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_refCDict(zcs, NULL); // clear the dictionary (if any)
+ * ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel);
+ */
+ZSTDLIB_API size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLevel);
+/*!
+ * Alternative for ZSTD_compressStream2(zcs, output, input, ZSTD_e_continue).
+ * NOTE: The return value is different. ZSTD_compressStream() returns a hint for
+ * the next read size (if non-zero and not an error). ZSTD_compressStream2()
+ * returns the minimum nb of bytes left to flush (if non-zero and not an error).
+ */
+ZSTDLIB_API size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
+/*! Equivalent to ZSTD_compressStream2(zcs, output, &emptyInput, ZSTD_e_flush). */
+ZSTDLIB_API size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output);
+/*! Equivalent to ZSTD_compressStream2(zcs, output, &emptyInput, ZSTD_e_end). */
+ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output);
+
+
+/*-***************************************************************************
+* Streaming decompression - HowTo
+*
+* A ZSTD_DStream object is required to track streaming operations.
+* Use ZSTD_createDStream() and ZSTD_freeDStream() to create/release resources.
+* ZSTD_DStream objects can be re-used multiple times.
+*
+* Use ZSTD_initDStream() to start a new decompression operation.
+* @return : recommended first input size
+* Alternatively, use advanced API to set specific properties.
+*
+* Use ZSTD_decompressStream() repetitively to consume your input.
+* The function will update both `pos` fields.
+* If `input.pos < input.size`, some input has not been consumed.
+* It's up to the caller to present again remaining data.
+* The function tries to flush all data decoded immediately, respecting output buffer size.
+* If `output.pos < output.size`, decoder has flushed everything it could.
+* But if `output.pos == output.size`, there might be some data left within internal buffers.,
+* In which case, call ZSTD_decompressStream() again to flush whatever remains in the buffer.
+* Note : with no additional input provided, amount of data flushed is necessarily <= ZSTD_BLOCKSIZE_MAX.
+* @return : 0 when a frame is completely decoded and fully flushed,
+* or an error code, which can be tested using ZSTD_isError(),
+* or any other value > 0, which means there is still some decoding or flushing to do to complete current frame :
+* the return value is a suggested next input size (just a hint for better latency)
+* that will never request more than the remaining frame size.
+* *******************************************************************************/
+
+typedef ZSTD_DCtx ZSTD_DStream; /**< DCtx and DStream are now effectively same object (>= v1.3.0) */
+ /* For compatibility with versions <= v1.2.0, prefer differentiating them. */
+/*===== ZSTD_DStream management functions =====*/
+ZSTDLIB_API ZSTD_DStream* ZSTD_createDStream(void);
+ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds);
+
+/*===== Streaming decompression functions =====*/
+
+/* This function is redundant with the advanced API and equivalent to:
+ *
+ * ZSTD_DCtx_reset(zds);
+ * ZSTD_DCtx_refDDict(zds, NULL);
+ */
+ZSTDLIB_API size_t ZSTD_initDStream(ZSTD_DStream* zds);
+
+ZSTDLIB_API size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
+
+ZSTDLIB_API size_t ZSTD_DStreamInSize(void); /*!< recommended size for input buffer */
+ZSTDLIB_API size_t ZSTD_DStreamOutSize(void); /*!< recommended size for output buffer. Guarantee to successfully flush at least one complete block in all circumstances. */
+
+
+/**************************
+* Simple dictionary API
+***************************/
+/*! ZSTD_compress_usingDict() :
+ * Compression at an explicit compression level using a Dictionary.
+ * A dictionary can be any arbitrary data segment (also called a prefix),
+ * or a buffer with specified information (see dictBuilder/zdict.h).
+ * Note : This function loads the dictionary, resulting in significant startup delay.
+ * It's intended for a dictionary used only once.
+ * Note 2 : When `dict == NULL || dictSize < 8` no dictionary is used. */
+ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const void* dict,size_t dictSize,
+ int compressionLevel);
+
+/*! ZSTD_decompress_usingDict() :
+ * Decompression using a known Dictionary.
+ * Dictionary must be identical to the one used during compression.
+ * Note : This function loads the dictionary, resulting in significant startup delay.
+ * It's intended for a dictionary used only once.
+ * Note : When `dict == NULL || dictSize < 8` no dictionary is used. */
+ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const void* dict,size_t dictSize);
+
+
+/***********************************
+ * Bulk processing dictionary API
+ **********************************/
+typedef struct ZSTD_CDict_s ZSTD_CDict;
+
+/*! ZSTD_createCDict() :
+ * When compressing multiple messages / blocks using the same dictionary, it's recommended to load it only once.
+ * ZSTD_createCDict() will create a digested dictionary, ready to start future compression operations without startup cost.
+ * ZSTD_CDict can be created once and shared by multiple threads concurrently, since its usage is read-only.
+ * `dictBuffer` can be released after ZSTD_CDict creation, because its content is copied within CDict.
+ * Consider experimental function `ZSTD_createCDict_byReference()` if you prefer to not duplicate `dictBuffer` content.
+ * Note : A ZSTD_CDict can be created from an empty dictBuffer, but it is inefficient when used to compress small data. */
+ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dictBuffer, size_t dictSize,
+ int compressionLevel);
+
+/*! ZSTD_freeCDict() :
+ * Function frees memory allocated by ZSTD_createCDict(). */
+ZSTDLIB_API size_t ZSTD_freeCDict(ZSTD_CDict* CDict);
+
+/*! ZSTD_compress_usingCDict() :
+ * Compression using a digested Dictionary.
+ * Recommended when same dictionary is used multiple times.
+ * Note : compression level is _decided at dictionary creation time_,
+ * and frame parameters are hardcoded (dictID=yes, contentSize=yes, checksum=no) */
+ZSTDLIB_API size_t ZSTD_compress_usingCDict(ZSTD_CCtx* cctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const ZSTD_CDict* cdict);
+
+
+typedef struct ZSTD_DDict_s ZSTD_DDict;
+
+/*! ZSTD_createDDict() :
+ * Create a digested dictionary, ready to start decompression operation without startup delay.
+ * dictBuffer can be released after DDict creation, as its content is copied inside DDict. */
+ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dictBuffer, size_t dictSize);
+
+/*! ZSTD_freeDDict() :
+ * Function frees memory allocated with ZSTD_createDDict() */
+ZSTDLIB_API size_t ZSTD_freeDDict(ZSTD_DDict* ddict);
+
+/*! ZSTD_decompress_usingDDict() :
+ * Decompression using a digested Dictionary.
+ * Recommended when same dictionary is used multiple times. */
+ZSTDLIB_API size_t ZSTD_decompress_usingDDict(ZSTD_DCtx* dctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const ZSTD_DDict* ddict);
+
+
+/********************************
+ * Dictionary helper functions
+ *******************************/
+
+/*! ZSTD_getDictID_fromDict() :
+ * Provides the dictID stored within dictionary.
+ * if @return == 0, the dictionary is not conformant with Zstandard specification.
+ * It can still be loaded, but as a content-only dictionary. */
+ZSTDLIB_API unsigned ZSTD_getDictID_fromDict(const void* dict, size_t dictSize);
+
+/*! ZSTD_getDictID_fromDDict() :
+ * Provides the dictID of the dictionary loaded into `ddict`.
+ * If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
+ * Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
+ZSTDLIB_API unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict);
+
+/*! ZSTD_getDictID_fromFrame() :
+ * Provides the dictID required to decompressed the frame stored within `src`.
+ * If @return == 0, the dictID could not be decoded.
+ * This could for one of the following reasons :
+ * - The frame does not require a dictionary to be decoded (most common case).
+ * - The frame was built with dictID intentionally removed. Whatever dictionary is necessary is a hidden information.
+ * Note : this use case also happens when using a non-conformant dictionary.
+ * - `srcSize` is too small, and as a result, the frame header could not be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`).
+ * - This is not a Zstandard frame.
+ * When identifying the exact failure cause, it's possible to use ZSTD_getFrameHeader(), which will provide a more precise error code. */
+ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize);
+
+
+/*******************************************************************************
+ * Advanced dictionary and prefix API
+ *
+ * This API allows dictionaries to be used with ZSTD_compress2(),
+ * ZSTD_compressStream2(), and ZSTD_decompress(). Dictionaries are sticky, and
+ * only reset with the context is reset with ZSTD_reset_parameters or
+ * ZSTD_reset_session_and_parameters. Prefixes are single-use.
+ ******************************************************************************/
+
+
/*! ZSTD_CCtx_loadDictionary() :
* Create an internal CDict from `dict` buffer.
* Decompression will have to use same dictionary.
@@ -703,7 +898,9 @@
/*! ZSTD_CCtx_refCDict() :
* Reference a prepared dictionary, to be used for all next compressed frames.
* Note that compression parameters are enforced from within CDict,
- * and supercede any compression parameter previously set within CCtx.
+ * and supersede any compression parameter previously set within CCtx.
+ * The parameters ignored are labled as "superseded-by-cdict" in the ZSTD_cParameter enum docs.
+ * The ignored parameters will be used again if the CCtx is returned to no-dictionary mode.
* The dictionary will remain valid for future compressed frames using same CCtx.
* @result : 0, or an error code (which can be tested with ZSTD_isError()).
* Special : Referencing a NULL CDict means "return to no-dictionary mode".
@@ -733,136 +930,6 @@
ZSTDLIB_API size_t ZSTD_CCtx_refPrefix(ZSTD_CCtx* cctx,
const void* prefix, size_t prefixSize);
-
-typedef enum {
- ZSTD_reset_session_only = 1,
- ZSTD_reset_parameters = 2,
- ZSTD_reset_session_and_parameters = 3
-} ZSTD_ResetDirective;
-
-/*! ZSTD_CCtx_reset() :
- * There are 2 different things that can be reset, independently or jointly :
- * - The session : will stop compressing current frame, and make CCtx ready to start a new one.
- * Useful after an error, or to interrupt any ongoing compression.
- * Any internal data not yet flushed is cancelled.
- * Compression parameters and dictionary remain unchanged.
- * They will be used to compress next frame.
- * Resetting session never fails.
- * - The parameters : changes all parameters back to "default".
- * This removes any reference to any dictionary too.
- * Parameters can only be changed between 2 sessions (i.e. no compression is currently ongoing)
- * otherwise the reset fails, and function returns an error value (which can be tested using ZSTD_isError())
- * - Both : similar to resetting the session, followed by resetting parameters.
- */
-ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDirective reset);
-
-
-
-/*! ZSTD_compress2() :
- * Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
- * ZSTD_compress2() always starts a new frame.
- * Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
- * - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
- * - The function is always blocking, returns when compression is completed.
- * Hint : compression runs faster if `dstCapacity` >= `ZSTD_compressBound(srcSize)`.
- * @return : compressed size written into `dst` (<= `dstCapacity),
- * or an error code if it fails (which can be tested using ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
- void* dst, size_t dstCapacity,
- const void* src, size_t srcSize);
-
-typedef enum {
- ZSTD_e_continue=0, /* collect more data, encoder decides when to output compressed result, for optimal compression ratio */
- ZSTD_e_flush=1, /* flush any data provided so far,
- * it creates (at least) one new block, that can be decoded immediately on reception;
- * frame will continue: any future data can still reference previously compressed data, improving compression. */
- ZSTD_e_end=2 /* flush any remaining data _and_ close current frame.
- * note that frame is only closed after compressed data is fully flushed (return value == 0).
- * After that point, any additional data starts a new frame.
- * note : each frame is independent (does not reference any content from previous frame). */
-} ZSTD_EndDirective;
-
-/*! ZSTD_compressStream2() :
- * Behaves about the same as ZSTD_compressStream, with additional control on end directive.
- * - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
- * - Compression parameters cannot be changed once compression is started (save a list of exceptions in multi-threading mode)
- * - outpot->pos must be <= dstCapacity, input->pos must be <= srcSize
- * - outpot->pos and input->pos will be updated. They are guaranteed to remain below their respective limit.
- * - When nbWorkers==0 (default), function is blocking : it completes its job before returning to caller.
- * - When nbWorkers>=1, function is non-blocking : it just acquires a copy of input, and distributes jobs to internal worker threads, flush whatever is available,
- * and then immediately returns, just indicating that there is some data remaining to be flushed.
- * The function nonetheless guarantees forward progress : it will return only after it reads or write at least 1+ byte.
- * - Exception : if the first call requests a ZSTD_e_end directive and provides enough dstCapacity, the function delegates to ZSTD_compress2() which is always blocking.
- * - @return provides a minimum amount of data remaining to be flushed from internal buffers
- * or an error code, which can be tested using ZSTD_isError().
- * if @return != 0, flush is not fully completed, there is still some data left within internal buffers.
- * This is useful for ZSTD_e_flush, since in this case more flushes are necessary to empty all buffers.
- * For ZSTD_e_end, @return == 0 when internal buffers are fully flushed and frame is completed.
- * - after a ZSTD_e_end directive, if internal buffer is not fully flushed (@return != 0),
- * only ZSTD_e_end or ZSTD_e_flush operations are allowed.
- * Before starting a new compression job, or changing compression parameters,
- * it is required to fully flush internal buffers.
- */
-ZSTDLIB_API size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
- ZSTD_outBuffer* output,
- ZSTD_inBuffer* input,
- ZSTD_EndDirective endOp);
-
-
-
-/* ============================== */
-/* Advanced decompression API */
-/* ============================== */
-
-/* The advanced API pushes parameters one by one into an existing DCtx context.
- * Parameters are sticky, and remain valid for all following frames
- * using the same DCtx context.
- * It's possible to reset parameters to default values using ZSTD_DCtx_reset().
- * Note : This API is compatible with existing ZSTD_decompressDCtx() and ZSTD_decompressStream().
- * Therefore, no new decompression function is necessary.
- */
-
-
-typedef enum {
-
- ZSTD_d_windowLogMax=100, /* Select a size limit (in power of 2) beyond which
- * the streaming API will refuse to allocate memory buffer
- * in order to protect the host from unreasonable memory requirements.
- * This parameter is only useful in streaming mode, since no internal buffer is allocated in single-pass mode.
- * By default, a decompression context accepts window sizes <= (1 << ZSTD_WINDOWLOG_LIMIT_DEFAULT) */
-
- /* note : additional experimental parameters are also available
- * within the experimental section of the API.
- * At the time of this writing, they include :
- * ZSTD_c_format
- * Because they are not stable, it's necessary to define ZSTD_STATIC_LINKING_ONLY to access them.
- * note : never ever use experimentalParam? names directly
- */
- ZSTD_d_experimentalParam1=1000
-
-} ZSTD_dParameter;
-
-
-/*! ZSTD_dParam_getBounds() :
- * All parameters must belong to an interval with lower and upper bounds,
- * otherwise they will either trigger an error or be automatically clamped.
- * @return : a structure, ZSTD_bounds, which contains
- * - an error status field, which must be tested using ZSTD_isError()
- * - both lower and upper bounds, inclusive
- */
-ZSTDLIB_API ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dParam);
-
-/*! ZSTD_DCtx_setParameter() :
- * Set one compression parameter, selected by enum ZSTD_dParameter.
- * All parameters have valid bounds. Bounds can be queried using ZSTD_dParam_getBounds().
- * Providing a value beyond bound will either clamp it, or trigger an error (depending on parameter).
- * Setting a parameter is only possible during frame initialization (before starting decompression).
- * @return : 0, or an error code (which can be tested using ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_dParameter param, int value);
-
-
/*! ZSTD_DCtx_loadDictionary() :
* Create an internal DDict from dict buffer,
* to be used to decompress next frames.
@@ -910,15 +977,32 @@
ZSTDLIB_API size_t ZSTD_DCtx_refPrefix(ZSTD_DCtx* dctx,
const void* prefix, size_t prefixSize);
-/*! ZSTD_DCtx_reset() :
- * Return a DCtx to clean state.
- * Session and parameters can be reset jointly or separately.
- * Parameters can only be reset when no active frame is being decompressed.
- * @return : 0, or an error code, which can be tested with ZSTD_isError()
- */
-ZSTDLIB_API size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDirective reset);
+/* === Memory management === */
+
+/*! ZSTD_sizeof_*() :
+ * These functions give the _current_ memory usage of selected object.
+ * Note that object memory usage can evolve (increase or decrease) over time. */
+ZSTDLIB_API size_t ZSTD_sizeof_CCtx(const ZSTD_CCtx* cctx);
+ZSTDLIB_API size_t ZSTD_sizeof_DCtx(const ZSTD_DCtx* dctx);
+ZSTDLIB_API size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs);
+ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DStream* zds);
+ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict);
+ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
+
+#endif /* ZSTD_H_235446 */
+/* **************************************************************************************
+ * ADVANCED AND EXPERIMENTAL FUNCTIONS
+ ****************************************************************************************
+ * The definitions in the following section are considered experimental.
+ * They are provided for advanced scenarios.
+ * They should never be used with a dynamic library, as prototypes may change in the future.
+ * Use them only in association with static linking.
+ * ***************************************************************************************/
+
+#if defined(ZSTD_STATIC_LINKING_ONLY) && !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
+#define ZSTD_H_ZSTD_STATIC_LINKING_ONLY
/****************************************************************************************
* experimental API (static linking only)
@@ -962,7 +1046,7 @@
#define ZSTD_WINDOWLOG_LIMIT_DEFAULT 27 /* by default, the streaming decoder will refuse any frame
* requiring larger than (1<<ZSTD_WINDOWLOG_LIMIT_DEFAULT) window size,
* to preserve host's memory from unreasonable requirements.
- * This limit can be overriden using ZSTD_DCtx_setParameter(,ZSTD_d_windowLogMax,).
+ * This limit can be overridden using ZSTD_DCtx_setParameter(,ZSTD_d_windowLogMax,).
* The limit does not apply for one-pass decoders (such as ZSTD_decompress()), since no additional memory is allocated */
@@ -976,6 +1060,10 @@
#define ZSTD_LDM_HASHRATELOG_MIN 0
#define ZSTD_LDM_HASHRATELOG_MAX (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN)
+/* Advanced parameter bounds */
+#define ZSTD_TARGETCBLOCKSIZE_MIN 64
+#define ZSTD_TARGETCBLOCKSIZE_MAX ZSTD_BLOCKSIZE_MAX
+
/* internal */
#define ZSTD_HASHLOG3_MAX 17
@@ -1064,15 +1152,24 @@
ZSTD_dictForceCopy = 2, /* Always copy the dictionary. */
} ZSTD_dictAttachPref_e;
+typedef enum {
+ ZSTD_lcm_auto = 0, /**< Automatically determine the compression mode based on the compression level.
+ * Negative compression levels will be uncompressed, and positive compression
+ * levels will be compressed. */
+ ZSTD_lcm_huffman = 1, /**< Always attempt Huffman compression. Uncompressed literals will still be
+ * emitted if Huffman compression is not profitable. */
+ ZSTD_lcm_uncompressed = 2, /**< Always emit uncompressed literals. */
+} ZSTD_literalCompressionMode_e;
+
/***************************************
* Frame size functions
***************************************/
/*! ZSTD_findDecompressedSize() :
- * `src` should point the start of a series of ZSTD encoded and/or skippable frames
+ * `src` should point to the start of a series of ZSTD encoded and/or skippable frames
* `srcSize` must be the _exact_ size of this series
- * (i.e. there should be a frame boundary exactly at `srcSize` bytes after `src`)
+ * (i.e. there should be a frame boundary at `src + srcSize`)
* @return : - decompressed size of all data in all successive frames
* - if the decompressed size cannot be determined: ZSTD_CONTENTSIZE_UNKNOWN
* - if an error occurred: ZSTD_CONTENTSIZE_ERROR
@@ -1092,6 +1189,21 @@
* however it does mean that all frame data must be present and valid. */
ZSTDLIB_API unsigned long long ZSTD_findDecompressedSize(const void* src, size_t srcSize);
+/*! ZSTD_decompressBound() :
+ * `src` should point to the start of a series of ZSTD encoded and/or skippable frames
+ * `srcSize` must be the _exact_ size of this series
+ * (i.e. there should be a frame boundary at `src + srcSize`)
+ * @return : - upper-bound for the decompressed size of all data in all successive frames
+ * - if an error occured: ZSTD_CONTENTSIZE_ERROR
+ *
+ * note 1 : an error can occur if `src` contains an invalid or incorrectly formatted frame.
+ * note 2 : the upper-bound is exact when the decompressed size field is available in every ZSTD encoded frame of `src`.
+ * in this case, `ZSTD_findDecompressedSize` and `ZSTD_decompressBound` return the same value.
+ * note 3 : when the decompressed size field isn't available, the upper-bound for that frame is calculated by:
+ * upper-bound = # blocks * min(128 KB, Window_Size)
+ */
+ZSTDLIB_API unsigned long long ZSTD_decompressBound(const void* src, size_t srcSize);
+
/*! ZSTD_frameHeaderSize() :
* srcSize must be >= ZSTD_FRAMEHEADERSIZE_PREFIX.
* @return : size of the Frame Header,
@@ -1110,7 +1222,7 @@
* It will also consider src size to be arbitrarily "large", which is worst case.
* If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation.
* ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
- * ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
+ * ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParams_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
* Note : CCtx size estimation is only correct for single-threaded compression. */
ZSTDLIB_API size_t ZSTD_estimateCCtxSize(int compressionLevel);
ZSTDLIB_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cParams);
@@ -1122,7 +1234,7 @@
* It will also consider src size to be arbitrarily "large", which is worst case.
* If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation.
* ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
- * ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
+ * ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParams_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
* Note : CStream size estimation is only correct for single-threaded compression.
* ZSTD_DStream memory budget depends on window Size.
* This information can be passed manually, using ZSTD_estimateDStreamSize,
@@ -1226,22 +1338,26 @@
ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_byReference(const void* dictBuffer, size_t dictSize, int compressionLevel);
/*! ZSTD_getCParams() :
-* @return ZSTD_compressionParameters structure for a selected compression level and estimated srcSize.
-* `estimatedSrcSize` value is optional, select 0 if not known */
+ * @return ZSTD_compressionParameters structure for a selected compression level and estimated srcSize.
+ * `estimatedSrcSize` value is optional, select 0 if not known */
ZSTDLIB_API ZSTD_compressionParameters ZSTD_getCParams(int compressionLevel, unsigned long long estimatedSrcSize, size_t dictSize);
/*! ZSTD_getParams() :
-* same as ZSTD_getCParams(), but @return a full `ZSTD_parameters` object instead of sub-component `ZSTD_compressionParameters`.
-* All fields of `ZSTD_frameParameters` are set to default : contentSize=1, checksum=0, noDictID=0 */
+ * same as ZSTD_getCParams(), but @return a full `ZSTD_parameters` object instead of sub-component `ZSTD_compressionParameters`.
+ * All fields of `ZSTD_frameParameters` are set to default : contentSize=1, checksum=0, noDictID=0 */
ZSTDLIB_API ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long estimatedSrcSize, size_t dictSize);
/*! ZSTD_checkCParams() :
-* Ensure param values remain within authorized range */
+ * Ensure param values remain within authorized range.
+ * @return 0 on success, or an error code (can be checked with ZSTD_isError()) */
ZSTDLIB_API size_t ZSTD_checkCParams(ZSTD_compressionParameters params);
/*! ZSTD_adjustCParams() :
* optimize params for a given `srcSize` and `dictSize`.
- * both values are optional, select `0` if unknown. */
+ * `srcSize` can be unknown, in which case use ZSTD_CONTENTSIZE_UNKNOWN.
+ * `dictSize` must be `0` when there is no dictionary.
+ * cPar can be invalid : all parameters will be clamped within valid range in the @return struct.
+ * This function never fails (wide contract) */
ZSTDLIB_API ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_compressionParameters cPar, unsigned long long srcSize, size_t dictSize);
/*! ZSTD_compress_advanced() :
@@ -1314,6 +1430,17 @@
* See the comments on that enum for an explanation of the feature. */
#define ZSTD_c_forceAttachDict ZSTD_c_experimentalParam4
+/* Controls how the literals are compressed (default is auto).
+ * The value must be of type ZSTD_literalCompressionMode_e.
+ * See ZSTD_literalCompressionMode_t enum definition for details.
+ */
+#define ZSTD_c_literalCompressionMode ZSTD_c_experimentalParam5
+
+/* Tries to fit compressed block size to be around targetCBlockSize.
+ * No target when targetCBlockSize == 0.
+ * There is no guarantee on compressed block size (default:0) */
+#define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
+
/*! ZSTD_CCtx_getParameter() :
* Get the requested compression parameter value, selected by enum ZSTD_cParameter,
* and store it into int* value.
@@ -1325,10 +1452,10 @@
/*! ZSTD_CCtx_params :
* Quick howto :
* - ZSTD_createCCtxParams() : Create a ZSTD_CCtx_params structure
- * - ZSTD_CCtxParam_setParameter() : Push parameters one by one into
- * an existing ZSTD_CCtx_params structure.
- * This is similar to
- * ZSTD_CCtx_setParameter().
+ * - ZSTD_CCtxParams_setParameter() : Push parameters one by one into
+ * an existing ZSTD_CCtx_params structure.
+ * This is similar to
+ * ZSTD_CCtx_setParameter().
* - ZSTD_CCtx_setParametersUsingCCtxParams() : Apply parameters to
* an existing CCtx.
* These parameters will be applied to
@@ -1359,20 +1486,20 @@
*/
ZSTDLIB_API size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_parameters params);
-/*! ZSTD_CCtxParam_setParameter() :
+/*! ZSTD_CCtxParams_setParameter() :
* Similar to ZSTD_CCtx_setParameter.
* Set one compression parameter, selected by enum ZSTD_cParameter.
* Parameters must be applied to a ZSTD_CCtx using ZSTD_CCtx_setParametersUsingCCtxParams().
* @result : 0, or an error code (which can be tested with ZSTD_isError()).
*/
-ZSTDLIB_API size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value);
+ZSTDLIB_API size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value);
-/*! ZSTD_CCtxParam_getParameter() :
+/*! ZSTD_CCtxParams_getParameter() :
* Similar to ZSTD_CCtx_getParameter.
* Get the requested value of one compression parameter, selected by enum ZSTD_cParameter.
* @result : 0, or an error code (which can be tested with ZSTD_isError()).
*/
-ZSTDLIB_API size_t ZSTD_CCtxParam_getParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int* value);
+ZSTDLIB_API size_t ZSTD_CCtxParams_getParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int* value);
/*! ZSTD_CCtx_setParametersUsingCCtxParams() :
* Apply a set of ZSTD_CCtx_params to the compression context.
@@ -1415,31 +1542,6 @@
* it must remain read accessible throughout the lifetime of DDict */
ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict_byReference(const void* dictBuffer, size_t dictSize);
-
-/*! ZSTD_getDictID_fromDict() :
- * Provides the dictID stored within dictionary.
- * if @return == 0, the dictionary is not conformant with Zstandard specification.
- * It can still be loaded, but as a content-only dictionary. */
-ZSTDLIB_API unsigned ZSTD_getDictID_fromDict(const void* dict, size_t dictSize);
-
-/*! ZSTD_getDictID_fromDDict() :
- * Provides the dictID of the dictionary loaded into `ddict`.
- * If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
- * Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
-ZSTDLIB_API unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict);
-
-/*! ZSTD_getDictID_fromFrame() :
- * Provides the dictID required to decompressed the frame stored within `src`.
- * If @return == 0, the dictID could not be decoded.
- * This could for one of the following reasons :
- * - The frame does not require a dictionary to be decoded (most common case).
- * - The frame was built with dictID intentionally removed. Whatever dictionary is necessary is a hidden information.
- * Note : this use case also happens when using a non-conformant dictionary.
- * - `srcSize` is too small, and as a result, the frame header could not be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`).
- * - This is not a Zstandard frame.
- * When identifying the exact failure cause, it's possible to use ZSTD_getFrameHeader(), which will provide a more precise error code. */
-ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize);
-
/*! ZSTD_DCtx_loadDictionary_byReference() :
* Same as ZSTD_DCtx_loadDictionary(),
* but references `dict` content instead of copying it into `dctx`.
@@ -1501,14 +1603,68 @@
********************************************************************/
/*===== Advanced Streaming compression functions =====*/
-ZSTDLIB_API size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pledgedSrcSize); /**< pledgedSrcSize must be correct. If it is not known at init time, use ZSTD_CONTENTSIZE_UNKNOWN. Note that, for compatibility with older programs, "0" also disables frame content size field. It may be enabled in the future. */
-ZSTDLIB_API size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel); /**< creates of an internal CDict (incompatible with static CCtx), except if dict == NULL or dictSize < 8, in which case no dict is used. Note: dict is loaded with ZSTD_dm_auto (treated as a full zstd dictionary if it begins with ZSTD_MAGIC_DICTIONARY, else as raw content) and ZSTD_dlm_byCopy.*/
+/**! ZSTD_initCStream_srcSize() :
+ * This function is deprecated, and equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_refCDict(zcs, NULL); // clear the dictionary (if any)
+ * ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel);
+ * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
+ *
+ * pledgedSrcSize must be correct. If it is not known at init time, use
+ * ZSTD_CONTENTSIZE_UNKNOWN. Note that, for compatibility with older programs,
+ * "0" also disables frame content size field. It may be enabled in the future.
+ */
+ZSTDLIB_API size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pledgedSrcSize);
+/**! ZSTD_initCStream_usingDict() :
+ * This function is deprecated, and is equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLevel);
+ * ZSTD_CCtx_loadDictionary(zcs, dict, dictSize);
+ *
+ * Creates of an internal CDict (incompatible with static CCtx), except if
+ * dict == NULL or dictSize < 8, in which case no dict is used.
+ * Note: dict is loaded with ZSTD_dm_auto (treated as a full zstd dictionary if
+ * it begins with ZSTD_MAGIC_DICTIONARY, else as raw content) and ZSTD_dlm_byCopy.
+ */
+ZSTDLIB_API size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel);
+/**! ZSTD_initCStream_advanced() :
+ * This function is deprecated, and is approximately equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_setZstdParams(zcs, params); // Set the zstd params and leave the rest as-is
+ * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
+ * ZSTD_CCtx_loadDictionary(zcs, dict, dictSize);
+ *
+ * pledgedSrcSize must be correct. If srcSize is not known at init time, use
+ * value ZSTD_CONTENTSIZE_UNKNOWN. dict is loaded with ZSTD_dm_auto and ZSTD_dlm_byCopy.
+ */
ZSTDLIB_API size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs, const void* dict, size_t dictSize,
- ZSTD_parameters params, unsigned long long pledgedSrcSize); /**< pledgedSrcSize must be correct. If srcSize is not known at init time, use value ZSTD_CONTENTSIZE_UNKNOWN. dict is loaded with ZSTD_dm_auto and ZSTD_dlm_byCopy. */
-ZSTDLIB_API size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cdict); /**< note : cdict will just be referenced, and must outlive compression session */
-ZSTDLIB_API size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStream* zcs, const ZSTD_CDict* cdict, ZSTD_frameParameters fParams, unsigned long long pledgedSrcSize); /**< same as ZSTD_initCStream_usingCDict(), with control over frame parameters. pledgedSrcSize must be correct. If srcSize is not known at init time, use value ZSTD_CONTENTSIZE_UNKNOWN. */
+ ZSTD_parameters params, unsigned long long pledgedSrcSize);
+/**! ZSTD_initCStream_usingCDict() :
+ * This function is deprecated, and equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_refCDict(zcs, cdict);
+ *
+ * note : cdict will just be referenced, and must outlive compression session
+ */
+ZSTDLIB_API size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cdict);
+/**! ZSTD_initCStream_usingCDict_advanced() :
+ * This function is deprecated, and is approximately equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_setZstdFrameParams(zcs, fParams); // Set the zstd frame params and leave the rest as-is
+ * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
+ * ZSTD_CCtx_refCDict(zcs, cdict);
+ *
+ * same as ZSTD_initCStream_usingCDict(), with control over frame parameters.
+ * pledgedSrcSize must be correct. If srcSize is not known at init time, use
+ * value ZSTD_CONTENTSIZE_UNKNOWN.
+ */
+ZSTDLIB_API size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStream* zcs, const ZSTD_CDict* cdict, ZSTD_frameParameters fParams, unsigned long long pledgedSrcSize);
/*! ZSTD_resetCStream() :
+ * This function is deprecated, and is equivalent to:
+ * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
+ * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
+ *
* start a new frame, using same parameters from previous frame.
* This is typically useful to skip dictionary loading stage, since it will re-use it in-place.
* Note that zcs must be init at least once before using ZSTD_resetCStream().
@@ -1555,9 +1711,32 @@
/*===== Advanced Streaming decompression functions =====*/
-ZSTDLIB_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize); /**< note: no dictionary will be used if dict == NULL or dictSize < 8 */
-ZSTDLIB_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, const ZSTD_DDict* ddict); /**< note : ddict is referenced, it must outlive decompression session */
-ZSTDLIB_API size_t ZSTD_resetDStream(ZSTD_DStream* zds); /**< re-use decompression parameters from previous init; saves dictionary loading */
+/**
+ * This function is deprecated, and is equivalent to:
+ *
+ * ZSTD_DCtx_reset(zds, ZSTD_reset_session_only);
+ * ZSTD_DCtx_loadDictionary(zds, dict, dictSize);
+ *
+ * note: no dictionary will be used if dict == NULL or dictSize < 8
+ */
+ZSTDLIB_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize);
+/**
+ * This function is deprecated, and is equivalent to:
+ *
+ * ZSTD_DCtx_reset(zds, ZSTD_reset_session_only);
+ * ZSTD_DCtx_refDDict(zds, ddict);
+ *
+ * note : ddict is referenced, it must outlive decompression session
+ */
+ZSTDLIB_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, const ZSTD_DDict* ddict);
+/**
+ * This function is deprecated, and is equivalent to:
+ *
+ * ZSTD_DCtx_reset(zds, ZSTD_reset_session_only);
+ *
+ * re-use decompression parameters from previous init; saves dictionary loading
+ */
+ZSTDLIB_API size_t ZSTD_resetDStream(ZSTD_DStream* zds);
/*********************************************************************
@@ -1696,7 +1875,7 @@
unsigned checksumFlag;
} ZSTD_frameHeader;
-/** ZSTD_getFrameHeader() :
+/*! ZSTD_getFrameHeader() :
* decode Frame Header, or requires larger `srcSize`.
* @return : 0, `zfhPtr` is correctly filled,
* >0, `srcSize` is too small, value is wanted `srcSize` amount,
@@ -1730,7 +1909,7 @@
/*!
Block functions produce and decode raw zstd blocks, without frame metadata.
Frame metadata cost is typically ~18 bytes, which can be non-negligible for very small blocks (< 100 bytes).
- User will have to take in charge required information to regenerate data, such as compressed and content sizes.
+ But users will have to take in charge needed metadata to regenerate data, such as compressed and content sizes.
A few rules to respect :
- Compressing and decompressing require a context structure
@@ -1741,12 +1920,14 @@
+ copyCCtx() and copyDCtx() can be used too
- Block size is limited, it must be <= ZSTD_getBlockSize() <= ZSTD_BLOCKSIZE_MAX == 128 KB
+ If input is larger than a block size, it's necessary to split input data into multiple blocks
- + For inputs larger than a single block, really consider using regular ZSTD_compress() instead.
- Frame metadata is not that costly, and quickly becomes negligible as source size grows larger.
- - When a block is considered not compressible enough, ZSTD_compressBlock() result will be zero.
- In which case, nothing is produced into `dst` !
- + User must test for such outcome and deal directly with uncompressed data
- + ZSTD_decompressBlock() doesn't accept uncompressed data as input !!!
+ + For inputs larger than a single block, consider using regular ZSTD_compress() instead.
+ Frame metadata is not that costly, and quickly becomes negligible as source size grows larger than a block.
+ - When a block is considered not compressible enough, ZSTD_compressBlock() result will be 0 (zero) !
+ ===> In which case, nothing is produced into `dst` !
+ + User __must__ test for such outcome and deal directly with uncompressed data
+ + A block cannot be declared incompressible if ZSTD_compressBlock() return value was != 0.
+ Doing so would mess up with statistics history, leading to potential data corruption.
+ + ZSTD_decompressBlock() _doesn't accept uncompressed data as input_ !!
+ In case of multiple successive blocks, should some of them be uncompressed,
decoder must be informed of their existence in order to follow proper history.
Use ZSTD_insertBlock() for such a case.
--- a/tests/test-repo-compengines.t Sun Sep 15 00:07:30 2019 -0400
+++ b/tests/test-repo-compengines.t Sun Sep 15 20:04:00 2019 -0700
@@ -169,7 +169,7 @@
> done
$ $RUNTESTDIR/f -s zstd-*/.hg/store/data/*
- zstd-level-1/.hg/store/data/a.i: size=4097
+ zstd-level-1/.hg/store/data/a.i: size=4114
zstd-level-22/.hg/store/data/a.i: size=4091
zstd-level-default/\.hg/store/data/a\.i: size=(4094|4102) (re)