annotate contrib/python-zstandard/zstd/dictBuilder/zdict.c @ 30434:2e484bdea8c4

zstd: vendor zstd 1.1.1 zstd is a new compression format and it is awesome, yielding higher compression ratios and significantly faster compression and decompression operations compared to zlib (our current compression engine of choice) across the board. We want zstd to be a 1st class citizen in Mercurial and to eventually be the preferred compression format for various operations. This patch starts the formal process of supporting zstd by vendoring a copy of zstd. Why do we need to vendor zstd? Good question. First, zstd is relatively new and not widely available yet. If we didn't vendor zstd or distribute it with Mercurial, most users likely wouldn't have zstd installed or even available to install. What good is a feature if you can't use it? Vendoring and distributing the zstd sources gives us the highest liklihood that zstd will be available to Mercurial installs. Second, the Python bindings to zstd (which will be vendored in a separate changeset) make use of zstd APIs that are only available via static linking. One reason they are only available via static linking is that they are unstable and could change at any time. While it might be possible for the Python bindings to attempt to talk to different versions of the zstd C library, the safest thing to do is link against a specific, known-working version of zstd. This is why the Python zstd bindings themselves vendor zstd and why we must as well. This also explains why the added files are in a "python-zstandard" directory. The added files are from the 1.1.1 release of zstd (Git commit 4c0b44f8ced84c4c8edfa07b564d31e4fa3e8885 from https://github.com/facebook/zstd) and are added without modifications. Not all files from the zstd "distribution" have been added. Notably missing are files to support interacting with "legacy," pre-1.0 versions of zstd. The decision of which files to include is made by the upstream python-zstandard project (which I'm the author of). The files in this commit are a snapshot of the files from the 0.5.0 release of that project, Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard.
author Gregory Szorc <gregory.szorc@gmail.com>
date Thu, 10 Nov 2016 21:45:29 -0800
parents
children b54a2984cdd4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
30434
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1 /**
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
2 * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
3 * All rights reserved.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
4 *
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
5 * This source code is licensed under the BSD-style license found in the
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
6 * LICENSE file in the root directory of this source tree. An additional grant
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
7 * of patent rights can be found in the PATENTS file in the same directory.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
8 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
9
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
10
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
11 /*-**************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
12 * Tuning parameters
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
13 ****************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
14 #define ZDICT_MAX_SAMPLES_SIZE (2000U << 20)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
15 #define ZDICT_MIN_SAMPLES_SIZE 512
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
16
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
17
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
18 /*-**************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
19 * Compiler Options
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
20 ****************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
21 /* Unix Large Files support (>4GB) */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
22 #define _FILE_OFFSET_BITS 64
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
23 #if (defined(__sun__) && (!defined(__LP64__))) /* Sun Solaris 32-bits requires specific definitions */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
24 # define _LARGEFILE_SOURCE
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
25 #elif ! defined(__LP64__) /* No point defining Large file for 64 bit */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
26 # define _LARGEFILE64_SOURCE
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
27 #endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
28
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
29
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
30 /*-*************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
31 * Dependencies
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
32 ***************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
33 #include <stdlib.h> /* malloc, free */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
34 #include <string.h> /* memset */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
35 #include <stdio.h> /* fprintf, fopen, ftello64 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
36 #include <time.h> /* clock */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
37
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
38 #include "mem.h" /* read */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
39 #include "error_private.h"
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
40 #include "fse.h" /* FSE_normalizeCount, FSE_writeNCount */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
41 #define HUF_STATIC_LINKING_ONLY
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
42 #include "huf.h"
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
43 #include "zstd_internal.h" /* includes zstd.h */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
44 #include "xxhash.h"
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
45 #include "divsufsort.h"
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
46 #ifndef ZDICT_STATIC_LINKING_ONLY
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
47 # define ZDICT_STATIC_LINKING_ONLY
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
48 #endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
49 #include "zdict.h"
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
50
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
51
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
52 /*-*************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
53 * Constants
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
54 ***************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
55 #define KB *(1 <<10)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
56 #define MB *(1 <<20)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
57 #define GB *(1U<<30)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
58
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
59 #define DICTLISTSIZE_DEFAULT 10000
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
60
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
61 #define NOISELENGTH 32
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
62
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
63 #define MINRATIO 4
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
64 static const int g_compressionLevel_default = 5;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
65 static const U32 g_selectivity_default = 9;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
66 static const size_t g_provision_entropySize = 200;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
67 static const size_t g_min_fast_dictContent = 192;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
68
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
69
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
70 /*-*************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
71 * Console display
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
72 ***************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
73 #define DISPLAY(...) { fprintf(stderr, __VA_ARGS__); fflush( stderr ); }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
74 #define DISPLAYLEVEL(l, ...) if (notificationLevel>=l) { DISPLAY(__VA_ARGS__); } /* 0 : no display; 1: errors; 2: default; 3: details; 4: debug */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
75
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
76 static clock_t ZDICT_clockSpan(clock_t nPrevious) { return clock() - nPrevious; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
77
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
78 static void ZDICT_printHex(const void* ptr, size_t length)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
79 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
80 const BYTE* const b = (const BYTE*)ptr;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
81 size_t u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
82 for (u=0; u<length; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
83 BYTE c = b[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
84 if (c<32 || c>126) c = '.'; /* non-printable char */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
85 DISPLAY("%c", c);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
86 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
87 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
88
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
89
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
90 /*-********************************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
91 * Helper functions
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
92 **********************************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
93 unsigned ZDICT_isError(size_t errorCode) { return ERR_isError(errorCode); }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
94
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
95 const char* ZDICT_getErrorName(size_t errorCode) { return ERR_getErrorName(errorCode); }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
96
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
97 unsigned ZDICT_getDictID(const void* dictBuffer, size_t dictSize)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
98 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
99 if (dictSize < 8) return 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
100 if (MEM_readLE32(dictBuffer) != ZSTD_DICT_MAGIC) return 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
101 return MEM_readLE32((const char*)dictBuffer + 4);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
102 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
103
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
104
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
105 /*-********************************************************
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
106 * Dictionary training functions
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
107 **********************************************************/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
108 static unsigned ZDICT_NbCommonBytes (register size_t val)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
109 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
110 if (MEM_isLittleEndian()) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
111 if (MEM_64bits()) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
112 # if defined(_MSC_VER) && defined(_WIN64)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
113 unsigned long r = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
114 _BitScanForward64( &r, (U64)val );
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
115 return (unsigned)(r>>3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
116 # elif defined(__GNUC__) && (__GNUC__ >= 3)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
117 return (__builtin_ctzll((U64)val) >> 3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
118 # else
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
119 static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2, 0, 3, 1, 3, 1, 4, 2, 7, 0, 2, 3, 6, 1, 5, 3, 5, 1, 3, 4, 4, 2, 5, 6, 7, 7, 0, 1, 2, 3, 3, 4, 6, 2, 6, 5, 5, 3, 4, 5, 6, 7, 1, 2, 4, 6, 4, 4, 5, 7, 2, 6, 5, 7, 6, 7, 7 };
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
120 return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218A392CDABBD3FULL)) >> 58];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
121 # endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
122 } else { /* 32 bits */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
123 # if defined(_MSC_VER)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
124 unsigned long r=0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
125 _BitScanForward( &r, (U32)val );
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
126 return (unsigned)(r>>3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
127 # elif defined(__GNUC__) && (__GNUC__ >= 3)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
128 return (__builtin_ctz((U32)val) >> 3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
129 # else
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
130 static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0, 3, 2, 2, 1, 3, 2, 0, 1, 3, 3, 1, 2, 2, 2, 2, 0, 3, 1, 2, 0, 1, 0, 1, 1 };
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
131 return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)) >> 27];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
132 # endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
133 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
134 } else { /* Big Endian CPU */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
135 if (MEM_64bits()) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
136 # if defined(_MSC_VER) && defined(_WIN64)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
137 unsigned long r = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
138 _BitScanReverse64( &r, val );
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
139 return (unsigned)(r>>3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
140 # elif defined(__GNUC__) && (__GNUC__ >= 3)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
141 return (__builtin_clzll(val) >> 3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
142 # else
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
143 unsigned r;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
144 const unsigned n32 = sizeof(size_t)*4; /* calculate this way due to compiler complaining in 32-bits mode */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
145 if (!(val>>n32)) { r=4; } else { r=0; val>>=n32; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
146 if (!(val>>16)) { r+=2; val>>=8; } else { val>>=24; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
147 r += (!val);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
148 return r;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
149 # endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
150 } else { /* 32 bits */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
151 # if defined(_MSC_VER)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
152 unsigned long r = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
153 _BitScanReverse( &r, (unsigned long)val );
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
154 return (unsigned)(r>>3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
155 # elif defined(__GNUC__) && (__GNUC__ >= 3)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
156 return (__builtin_clz((U32)val) >> 3);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
157 # else
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
158 unsigned r;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
159 if (!(val>>16)) { r=2; val>>=8; } else { r=0; val>>=24; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
160 r += (!val);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
161 return r;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
162 # endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
163 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
164 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
165
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
166
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
167 /*! ZDICT_count() :
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
168 Count the nb of common bytes between 2 pointers.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
169 Note : this function presumes end of buffer followed by noisy guard band.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
170 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
171 static size_t ZDICT_count(const void* pIn, const void* pMatch)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
172 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
173 const char* const pStart = (const char*)pIn;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
174 for (;;) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
175 size_t const diff = MEM_readST(pMatch) ^ MEM_readST(pIn);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
176 if (!diff) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
177 pIn = (const char*)pIn+sizeof(size_t);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
178 pMatch = (const char*)pMatch+sizeof(size_t);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
179 continue;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
180 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
181 pIn = (const char*)pIn+ZDICT_NbCommonBytes(diff);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
182 return (size_t)((const char*)pIn - pStart);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
183 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
184 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
185
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
186
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
187 typedef struct {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
188 U32 pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
189 U32 length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
190 U32 savings;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
191 } dictItem;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
192
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
193 static void ZDICT_initDictItem(dictItem* d)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
194 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
195 d->pos = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
196 d->length = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
197 d->savings = (U32)(-1);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
198 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
199
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
200
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
201 #define LLIMIT 64 /* heuristic determined experimentally */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
202 #define MINMATCHLENGTH 7 /* heuristic determined experimentally */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
203 static dictItem ZDICT_analyzePos(
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
204 BYTE* doneMarks,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
205 const int* suffix, U32 start,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
206 const void* buffer, U32 minRatio, U32 notificationLevel)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
207 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
208 U32 lengthList[LLIMIT] = {0};
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
209 U32 cumulLength[LLIMIT] = {0};
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
210 U32 savings[LLIMIT] = {0};
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
211 const BYTE* b = (const BYTE*)buffer;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
212 size_t length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
213 size_t maxLength = LLIMIT;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
214 size_t pos = suffix[start];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
215 U32 end = start;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
216 dictItem solution;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
217
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
218 /* init */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
219 memset(&solution, 0, sizeof(solution));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
220 doneMarks[pos] = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
221
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
222 /* trivial repetition cases */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
223 if ( (MEM_read16(b+pos+0) == MEM_read16(b+pos+2))
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
224 ||(MEM_read16(b+pos+1) == MEM_read16(b+pos+3))
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
225 ||(MEM_read16(b+pos+2) == MEM_read16(b+pos+4)) ) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
226 /* skip and mark segment */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
227 U16 u16 = MEM_read16(b+pos+4);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
228 U32 u, e = 6;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
229 while (MEM_read16(b+pos+e) == u16) e+=2 ;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
230 if (b[pos+e] == b[pos+e-1]) e++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
231 for (u=1; u<e; u++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
232 doneMarks[pos+u] = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
233 return solution;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
234 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
235
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
236 /* look forward */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
237 do {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
238 end++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
239 length = ZDICT_count(b + pos, b + suffix[end]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
240 } while (length >=MINMATCHLENGTH);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
241
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
242 /* look backward */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
243 do {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
244 length = ZDICT_count(b + pos, b + *(suffix+start-1));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
245 if (length >=MINMATCHLENGTH) start--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
246 } while(length >= MINMATCHLENGTH);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
247
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
248 /* exit if not found a minimum nb of repetitions */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
249 if (end-start < minRatio) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
250 U32 idx;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
251 for(idx=start; idx<end; idx++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
252 doneMarks[suffix[idx]] = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
253 return solution;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
254 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
255
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
256 { int i;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
257 U32 searchLength;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
258 U32 refinedStart = start;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
259 U32 refinedEnd = end;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
260
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
261 DISPLAYLEVEL(4, "\n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
262 DISPLAYLEVEL(4, "found %3u matches of length >= %i at pos %7u ", (U32)(end-start), MINMATCHLENGTH, (U32)pos);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
263 DISPLAYLEVEL(4, "\n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
264
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
265 for (searchLength = MINMATCHLENGTH ; ; searchLength++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
266 BYTE currentChar = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
267 U32 currentCount = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
268 U32 currentID = refinedStart;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
269 U32 id;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
270 U32 selectedCount = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
271 U32 selectedID = currentID;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
272 for (id =refinedStart; id < refinedEnd; id++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
273 if (b[ suffix[id] + searchLength] != currentChar) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
274 if (currentCount > selectedCount) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
275 selectedCount = currentCount;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
276 selectedID = currentID;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
277 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
278 currentID = id;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
279 currentChar = b[ suffix[id] + searchLength];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
280 currentCount = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
281 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
282 currentCount ++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
283 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
284 if (currentCount > selectedCount) { /* for last */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
285 selectedCount = currentCount;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
286 selectedID = currentID;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
287 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
288
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
289 if (selectedCount < minRatio)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
290 break;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
291 refinedStart = selectedID;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
292 refinedEnd = refinedStart + selectedCount;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
293 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
294
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
295 /* evaluate gain based on new ref */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
296 start = refinedStart;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
297 pos = suffix[refinedStart];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
298 end = start;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
299 memset(lengthList, 0, sizeof(lengthList));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
300
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
301 /* look forward */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
302 do {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
303 end++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
304 length = ZDICT_count(b + pos, b + suffix[end]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
305 if (length >= LLIMIT) length = LLIMIT-1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
306 lengthList[length]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
307 } while (length >=MINMATCHLENGTH);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
308
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
309 /* look backward */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
310 length = MINMATCHLENGTH;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
311 while ((length >= MINMATCHLENGTH) & (start > 0)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
312 length = ZDICT_count(b + pos, b + suffix[start - 1]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
313 if (length >= LLIMIT) length = LLIMIT - 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
314 lengthList[length]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
315 if (length >= MINMATCHLENGTH) start--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
316 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
317
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
318 /* largest useful length */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
319 memset(cumulLength, 0, sizeof(cumulLength));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
320 cumulLength[maxLength-1] = lengthList[maxLength-1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
321 for (i=(int)(maxLength-2); i>=0; i--)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
322 cumulLength[i] = cumulLength[i+1] + lengthList[i];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
323
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
324 for (i=LLIMIT-1; i>=MINMATCHLENGTH; i--) if (cumulLength[i]>=minRatio) break;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
325 maxLength = i;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
326
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
327 /* reduce maxLength in case of final into repetitive data */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
328 { U32 l = (U32)maxLength;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
329 BYTE const c = b[pos + maxLength-1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
330 while (b[pos+l-2]==c) l--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
331 maxLength = l;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
332 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
333 if (maxLength < MINMATCHLENGTH) return solution; /* skip : no long-enough solution */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
334
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
335 /* calculate savings */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
336 savings[5] = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
337 for (i=MINMATCHLENGTH; i<=(int)maxLength; i++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
338 savings[i] = savings[i-1] + (lengthList[i] * (i-3));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
339
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
340 DISPLAYLEVEL(4, "Selected ref at position %u, of length %u : saves %u (ratio: %.2f) \n",
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
341 (U32)pos, (U32)maxLength, savings[maxLength], (double)savings[maxLength] / maxLength);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
342
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
343 solution.pos = (U32)pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
344 solution.length = (U32)maxLength;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
345 solution.savings = savings[maxLength];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
346
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
347 /* mark positions done */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
348 { U32 id;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
349 for (id=start; id<end; id++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
350 U32 p, pEnd;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
351 U32 const testedPos = suffix[id];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
352 if (testedPos == pos)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
353 length = solution.length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
354 else {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
355 length = ZDICT_count(b+pos, b+testedPos);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
356 if (length > solution.length) length = solution.length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
357 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
358 pEnd = (U32)(testedPos + length);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
359 for (p=testedPos; p<pEnd; p++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
360 doneMarks[p] = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
361 } } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
362
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
363 return solution;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
364 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
365
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
366
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
367 /*! ZDICT_checkMerge
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
368 check if dictItem can be merged, do it if possible
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
369 @return : id of destination elt, 0 if not merged
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
370 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
371 static U32 ZDICT_checkMerge(dictItem* table, dictItem elt, U32 eltNbToSkip)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
372 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
373 const U32 tableSize = table->pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
374 const U32 eltEnd = elt.pos + elt.length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
375
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
376 /* tail overlap */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
377 U32 u; for (u=1; u<tableSize; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
378 if (u==eltNbToSkip) continue;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
379 if ((table[u].pos > elt.pos) && (table[u].pos <= eltEnd)) { /* overlap, existing > new */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
380 /* append */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
381 U32 addedLength = table[u].pos - elt.pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
382 table[u].length += addedLength;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
383 table[u].pos = elt.pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
384 table[u].savings += elt.savings * addedLength / elt.length; /* rough approx */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
385 table[u].savings += elt.length / 8; /* rough approx bonus */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
386 elt = table[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
387 /* sort : improve rank */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
388 while ((u>1) && (table[u-1].savings < elt.savings))
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
389 table[u] = table[u-1], u--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
390 table[u] = elt;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
391 return u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
392 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
393
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
394 /* front overlap */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
395 for (u=1; u<tableSize; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
396 if (u==eltNbToSkip) continue;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
397 if ((table[u].pos + table[u].length >= elt.pos) && (table[u].pos < elt.pos)) { /* overlap, existing < new */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
398 /* append */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
399 int addedLength = (int)eltEnd - (table[u].pos + table[u].length);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
400 table[u].savings += elt.length / 8; /* rough approx bonus */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
401 if (addedLength > 0) { /* otherwise, elt fully included into existing */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
402 table[u].length += addedLength;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
403 table[u].savings += elt.savings * addedLength / elt.length; /* rough approx */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
404 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
405 /* sort : improve rank */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
406 elt = table[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
407 while ((u>1) && (table[u-1].savings < elt.savings))
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
408 table[u] = table[u-1], u--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
409 table[u] = elt;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
410 return u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
411 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
412
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
413 return 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
414 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
415
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
416
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
417 static void ZDICT_removeDictItem(dictItem* table, U32 id)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
418 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
419 /* convention : first element is nb of elts */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
420 U32 const max = table->pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
421 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
422 if (!id) return; /* protection, should never happen */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
423 for (u=id; u<max-1; u++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
424 table[u] = table[u+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
425 table->pos--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
426 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
427
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
428
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
429 static void ZDICT_insertDictItem(dictItem* table, U32 maxSize, dictItem elt)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
430 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
431 /* merge if possible */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
432 U32 mergeId = ZDICT_checkMerge(table, elt, 0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
433 if (mergeId) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
434 U32 newMerge = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
435 while (newMerge) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
436 newMerge = ZDICT_checkMerge(table, table[mergeId], mergeId);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
437 if (newMerge) ZDICT_removeDictItem(table, mergeId);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
438 mergeId = newMerge;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
439 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
440 return;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
441 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
442
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
443 /* insert */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
444 { U32 current;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
445 U32 nextElt = table->pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
446 if (nextElt >= maxSize) nextElt = maxSize-1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
447 current = nextElt-1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
448 while (table[current].savings < elt.savings) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
449 table[current+1] = table[current];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
450 current--;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
451 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
452 table[current+1] = elt;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
453 table->pos = nextElt+1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
454 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
455 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
456
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
457
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
458 static U32 ZDICT_dictSize(const dictItem* dictList)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
459 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
460 U32 u, dictSize = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
461 for (u=1; u<dictList[0].pos; u++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
462 dictSize += dictList[u].length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
463 return dictSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
464 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
465
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
466
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
467 static size_t ZDICT_trainBuffer(dictItem* dictList, U32 dictListSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
468 const void* const buffer, size_t bufferSize, /* buffer must end with noisy guard band */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
469 const size_t* fileSizes, unsigned nbFiles,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
470 U32 minRatio, U32 notificationLevel)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
471 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
472 int* const suffix0 = (int*)malloc((bufferSize+2)*sizeof(*suffix0));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
473 int* const suffix = suffix0+1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
474 U32* reverseSuffix = (U32*)malloc((bufferSize)*sizeof(*reverseSuffix));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
475 BYTE* doneMarks = (BYTE*)malloc((bufferSize+16)*sizeof(*doneMarks)); /* +16 for overflow security */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
476 U32* filePos = (U32*)malloc(nbFiles * sizeof(*filePos));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
477 size_t result = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
478 clock_t displayClock = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
479 clock_t const refreshRate = CLOCKS_PER_SEC * 3 / 10;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
480
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
481 # define DISPLAYUPDATE(l, ...) if (notificationLevel>=l) { \
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
482 if (ZDICT_clockSpan(displayClock) > refreshRate) \
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
483 { displayClock = clock(); DISPLAY(__VA_ARGS__); \
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
484 if (notificationLevel>=4) fflush(stdout); } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
485
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
486 /* init */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
487 DISPLAYLEVEL(2, "\r%70s\r", ""); /* clean display line */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
488 if (!suffix0 || !reverseSuffix || !doneMarks || !filePos) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
489 result = ERROR(memory_allocation);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
490 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
491 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
492 if (minRatio < MINRATIO) minRatio = MINRATIO;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
493 memset(doneMarks, 0, bufferSize+16);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
494
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
495 /* limit sample set size (divsufsort limitation)*/
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
496 if (bufferSize > ZDICT_MAX_SAMPLES_SIZE) DISPLAYLEVEL(3, "sample set too large : reduced to %u MB ...\n", (U32)(ZDICT_MAX_SAMPLES_SIZE>>20));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
497 while (bufferSize > ZDICT_MAX_SAMPLES_SIZE) bufferSize -= fileSizes[--nbFiles];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
498
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
499 /* sort */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
500 DISPLAYLEVEL(2, "sorting %u files of total size %u MB ...\n", nbFiles, (U32)(bufferSize>>20));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
501 { int const divSuftSortResult = divsufsort((const unsigned char*)buffer, suffix, (int)bufferSize, 0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
502 if (divSuftSortResult != 0) { result = ERROR(GENERIC); goto _cleanup; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
503 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
504 suffix[bufferSize] = (int)bufferSize; /* leads into noise */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
505 suffix0[0] = (int)bufferSize; /* leads into noise */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
506 /* build reverse suffix sort */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
507 { size_t pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
508 for (pos=0; pos < bufferSize; pos++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
509 reverseSuffix[suffix[pos]] = (U32)pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
510 /* note filePos tracks borders between samples.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
511 It's not used at this stage, but planned to become useful in a later update */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
512 filePos[0] = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
513 for (pos=1; pos<nbFiles; pos++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
514 filePos[pos] = (U32)(filePos[pos-1] + fileSizes[pos-1]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
515 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
516
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
517 DISPLAYLEVEL(2, "finding patterns ... \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
518 DISPLAYLEVEL(3, "minimum ratio : %u \n", minRatio);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
519
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
520 { U32 cursor; for (cursor=0; cursor < bufferSize; ) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
521 dictItem solution;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
522 if (doneMarks[cursor]) { cursor++; continue; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
523 solution = ZDICT_analyzePos(doneMarks, suffix, reverseSuffix[cursor], buffer, minRatio, notificationLevel);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
524 if (solution.length==0) { cursor++; continue; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
525 ZDICT_insertDictItem(dictList, dictListSize, solution);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
526 cursor += solution.length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
527 DISPLAYUPDATE(2, "\r%4.2f %% \r", (double)cursor / bufferSize * 100);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
528 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
529
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
530 _cleanup:
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
531 free(suffix0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
532 free(reverseSuffix);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
533 free(doneMarks);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
534 free(filePos);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
535 return result;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
536 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
537
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
538
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
539 static void ZDICT_fillNoise(void* buffer, size_t length)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
540 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
541 unsigned const prime1 = 2654435761U;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
542 unsigned const prime2 = 2246822519U;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
543 unsigned acc = prime1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
544 size_t p=0;;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
545 for (p=0; p<length; p++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
546 acc *= prime2;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
547 ((unsigned char*)buffer)[p] = (unsigned char)(acc >> 21);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
548 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
549 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
550
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
551
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
552 typedef struct
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
553 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
554 ZSTD_CCtx* ref;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
555 ZSTD_CCtx* zc;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
556 void* workPlace; /* must be ZSTD_BLOCKSIZE_ABSOLUTEMAX allocated */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
557 } EStats_ress_t;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
558
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
559 #define MAXREPOFFSET 1024
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
560
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
561 static void ZDICT_countEStats(EStats_ress_t esr, ZSTD_parameters params,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
562 U32* countLit, U32* offsetcodeCount, U32* matchlengthCount, U32* litlengthCount, U32* repOffsets,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
563 const void* src, size_t srcSize, U32 notificationLevel)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
564 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
565 size_t const blockSizeMax = MIN (ZSTD_BLOCKSIZE_ABSOLUTEMAX, 1 << params.cParams.windowLog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
566 size_t cSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
567
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
568 if (srcSize > blockSizeMax) srcSize = blockSizeMax; /* protection vs large samples */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
569 { size_t const errorCode = ZSTD_copyCCtx(esr.zc, esr.ref, 0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
570 if (ZSTD_isError(errorCode)) { DISPLAYLEVEL(1, "warning : ZSTD_copyCCtx failed \n"); return; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
571 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
572 cSize = ZSTD_compressBlock(esr.zc, esr.workPlace, ZSTD_BLOCKSIZE_ABSOLUTEMAX, src, srcSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
573 if (ZSTD_isError(cSize)) { DISPLAYLEVEL(1, "warning : could not compress sample size %u \n", (U32)srcSize); return; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
574
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
575 if (cSize) { /* if == 0; block is not compressible */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
576 const seqStore_t* seqStorePtr = ZSTD_getSeqStore(esr.zc);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
577
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
578 /* literals stats */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
579 { const BYTE* bytePtr;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
580 for(bytePtr = seqStorePtr->litStart; bytePtr < seqStorePtr->lit; bytePtr++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
581 countLit[*bytePtr]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
582 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
583
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
584 /* seqStats */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
585 { U32 const nbSeq = (U32)(seqStorePtr->sequences - seqStorePtr->sequencesStart);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
586 ZSTD_seqToCodes(seqStorePtr);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
587
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
588 { const BYTE* codePtr = seqStorePtr->ofCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
589 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
590 for (u=0; u<nbSeq; u++) offsetcodeCount[codePtr[u]]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
591 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
592
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
593 { const BYTE* codePtr = seqStorePtr->mlCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
594 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
595 for (u=0; u<nbSeq; u++) matchlengthCount[codePtr[u]]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
596 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
597
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
598 { const BYTE* codePtr = seqStorePtr->llCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
599 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
600 for (u=0; u<nbSeq; u++) litlengthCount[codePtr[u]]++;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
601 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
602
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
603 if (nbSeq >= 2) { /* rep offsets */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
604 const seqDef* const seq = seqStorePtr->sequencesStart;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
605 U32 offset1 = seq[0].offset - 3;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
606 U32 offset2 = seq[1].offset - 3;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
607 if (offset1 >= MAXREPOFFSET) offset1 = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
608 if (offset2 >= MAXREPOFFSET) offset2 = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
609 repOffsets[offset1] += 3;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
610 repOffsets[offset2] += 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
611 } } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
612 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
613
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
614 /*
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
615 static size_t ZDICT_maxSampleSize(const size_t* fileSizes, unsigned nbFiles)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
616 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
617 unsigned u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
618 size_t max=0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
619 for (u=0; u<nbFiles; u++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
620 if (max < fileSizes[u]) max = fileSizes[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
621 return max;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
622 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
623 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
624
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
625 static size_t ZDICT_totalSampleSize(const size_t* fileSizes, unsigned nbFiles)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
626 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
627 size_t total=0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
628 unsigned u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
629 for (u=0; u<nbFiles; u++) total += fileSizes[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
630 return total;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
631 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
632
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
633 typedef struct { U32 offset; U32 count; } offsetCount_t;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
634
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
635 static void ZDICT_insertSortCount(offsetCount_t table[ZSTD_REP_NUM+1], U32 val, U32 count)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
636 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
637 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
638 table[ZSTD_REP_NUM].offset = val;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
639 table[ZSTD_REP_NUM].count = count;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
640 for (u=ZSTD_REP_NUM; u>0; u--) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
641 offsetCount_t tmp;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
642 if (table[u-1].count >= table[u].count) break;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
643 tmp = table[u-1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
644 table[u-1] = table[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
645 table[u] = tmp;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
646 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
647 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
648
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
649
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
650 #define OFFCODE_MAX 30 /* only applicable to first block */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
651 static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
652 unsigned compressionLevel,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
653 const void* srcBuffer, const size_t* fileSizes, unsigned nbFiles,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
654 const void* dictBuffer, size_t dictBufferSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
655 unsigned notificationLevel)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
656 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
657 U32 countLit[256];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
658 HUF_CREATE_STATIC_CTABLE(hufTable, 255);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
659 U32 offcodeCount[OFFCODE_MAX+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
660 short offcodeNCount[OFFCODE_MAX+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
661 U32 offcodeMax = ZSTD_highbit32((U32)(dictBufferSize + 128 KB));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
662 U32 matchLengthCount[MaxML+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
663 short matchLengthNCount[MaxML+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
664 U32 litLengthCount[MaxLL+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
665 short litLengthNCount[MaxLL+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
666 U32 repOffset[MAXREPOFFSET];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
667 offsetCount_t bestRepOffset[ZSTD_REP_NUM+1];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
668 EStats_ress_t esr;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
669 ZSTD_parameters params;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
670 U32 u, huffLog = 11, Offlog = OffFSELog, mlLog = MLFSELog, llLog = LLFSELog, total;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
671 size_t pos = 0, errorCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
672 size_t eSize = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
673 size_t const totalSrcSize = ZDICT_totalSampleSize(fileSizes, nbFiles);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
674 size_t const averageSampleSize = totalSrcSize / (nbFiles + !nbFiles);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
675 BYTE* dstPtr = (BYTE*)dstBuffer;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
676
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
677 /* init */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
678 esr.ref = ZSTD_createCCtx();
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
679 esr.zc = ZSTD_createCCtx();
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
680 esr.workPlace = malloc(ZSTD_BLOCKSIZE_ABSOLUTEMAX);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
681 if (!esr.ref || !esr.zc || !esr.workPlace) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
682 eSize = ERROR(memory_allocation);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
683 DISPLAYLEVEL(1, "Not enough memory \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
684 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
685 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
686 if (offcodeMax>OFFCODE_MAX) { eSize = ERROR(dictionary_wrong); goto _cleanup; } /* too large dictionary */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
687 for (u=0; u<256; u++) countLit[u]=1; /* any character must be described */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
688 for (u=0; u<=offcodeMax; u++) offcodeCount[u]=1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
689 for (u=0; u<=MaxML; u++) matchLengthCount[u]=1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
690 for (u=0; u<=MaxLL; u++) litLengthCount[u]=1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
691 memset(repOffset, 0, sizeof(repOffset));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
692 repOffset[1] = repOffset[4] = repOffset[8] = 1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
693 memset(bestRepOffset, 0, sizeof(bestRepOffset));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
694 if (compressionLevel==0) compressionLevel=g_compressionLevel_default;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
695 params = ZSTD_getParams(compressionLevel, averageSampleSize, dictBufferSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
696 { size_t const beginResult = ZSTD_compressBegin_advanced(esr.ref, dictBuffer, dictBufferSize, params, 0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
697 if (ZSTD_isError(beginResult)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
698 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
699 DISPLAYLEVEL(1, "error : ZSTD_compressBegin_advanced failed \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
700 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
701 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
702
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
703 /* collect stats on all files */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
704 for (u=0; u<nbFiles; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
705 ZDICT_countEStats(esr, params,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
706 countLit, offcodeCount, matchLengthCount, litLengthCount, repOffset,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
707 (const char*)srcBuffer + pos, fileSizes[u],
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
708 notificationLevel);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
709 pos += fileSizes[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
710 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
711
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
712 /* analyze */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
713 errorCode = HUF_buildCTable (hufTable, countLit, 255, huffLog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
714 if (HUF_isError(errorCode)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
715 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
716 DISPLAYLEVEL(1, "HUF_buildCTable error \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
717 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
718 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
719 huffLog = (U32)errorCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
720
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
721 /* looking for most common first offsets */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
722 { U32 offset;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
723 for (offset=1; offset<MAXREPOFFSET; offset++)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
724 ZDICT_insertSortCount(bestRepOffset, offset, repOffset[offset]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
725 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
726 /* note : the result of this phase should be used to better appreciate the impact on statistics */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
727
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
728 total=0; for (u=0; u<=offcodeMax; u++) total+=offcodeCount[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
729 errorCode = FSE_normalizeCount(offcodeNCount, Offlog, offcodeCount, total, offcodeMax);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
730 if (FSE_isError(errorCode)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
731 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
732 DISPLAYLEVEL(1, "FSE_normalizeCount error with offcodeCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
733 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
734 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
735 Offlog = (U32)errorCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
736
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
737 total=0; for (u=0; u<=MaxML; u++) total+=matchLengthCount[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
738 errorCode = FSE_normalizeCount(matchLengthNCount, mlLog, matchLengthCount, total, MaxML);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
739 if (FSE_isError(errorCode)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
740 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
741 DISPLAYLEVEL(1, "FSE_normalizeCount error with matchLengthCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
742 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
743 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
744 mlLog = (U32)errorCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
745
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
746 total=0; for (u=0; u<=MaxLL; u++) total+=litLengthCount[u];
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
747 errorCode = FSE_normalizeCount(litLengthNCount, llLog, litLengthCount, total, MaxLL);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
748 if (FSE_isError(errorCode)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
749 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
750 DISPLAYLEVEL(1, "FSE_normalizeCount error with litLengthCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
751 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
752 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
753 llLog = (U32)errorCode;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
754
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
755 /* write result to buffer */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
756 { size_t const hhSize = HUF_writeCTable(dstPtr, maxDstSize, hufTable, 255, huffLog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
757 if (HUF_isError(hhSize)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
758 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
759 DISPLAYLEVEL(1, "HUF_writeCTable error \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
760 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
761 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
762 dstPtr += hhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
763 maxDstSize -= hhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
764 eSize += hhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
765 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
766
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
767 { size_t const ohSize = FSE_writeNCount(dstPtr, maxDstSize, offcodeNCount, OFFCODE_MAX, Offlog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
768 if (FSE_isError(ohSize)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
769 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
770 DISPLAYLEVEL(1, "FSE_writeNCount error with offcodeNCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
771 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
772 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
773 dstPtr += ohSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
774 maxDstSize -= ohSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
775 eSize += ohSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
776 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
777
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
778 { size_t const mhSize = FSE_writeNCount(dstPtr, maxDstSize, matchLengthNCount, MaxML, mlLog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
779 if (FSE_isError(mhSize)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
780 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
781 DISPLAYLEVEL(1, "FSE_writeNCount error with matchLengthNCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
782 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
783 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
784 dstPtr += mhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
785 maxDstSize -= mhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
786 eSize += mhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
787 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
788
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
789 { size_t const lhSize = FSE_writeNCount(dstPtr, maxDstSize, litLengthNCount, MaxLL, llLog);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
790 if (FSE_isError(lhSize)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
791 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
792 DISPLAYLEVEL(1, "FSE_writeNCount error with litlengthNCount \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
793 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
794 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
795 dstPtr += lhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
796 maxDstSize -= lhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
797 eSize += lhSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
798 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
799
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
800 if (maxDstSize<12) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
801 eSize = ERROR(GENERIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
802 DISPLAYLEVEL(1, "not enough space to write RepOffsets \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
803 goto _cleanup;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
804 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
805 # if 0
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
806 MEM_writeLE32(dstPtr+0, bestRepOffset[0].offset);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
807 MEM_writeLE32(dstPtr+4, bestRepOffset[1].offset);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
808 MEM_writeLE32(dstPtr+8, bestRepOffset[2].offset);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
809 #else
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
810 /* at this stage, we don't use the result of "most common first offset",
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
811 as the impact of statistics is not properly evaluated */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
812 MEM_writeLE32(dstPtr+0, repStartValue[0]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
813 MEM_writeLE32(dstPtr+4, repStartValue[1]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
814 MEM_writeLE32(dstPtr+8, repStartValue[2]);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
815 #endif
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
816 //dstPtr += 12;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
817 eSize += 12;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
818
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
819 _cleanup:
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
820 ZSTD_freeCCtx(esr.ref);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
821 ZSTD_freeCCtx(esr.zc);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
822 free(esr.workPlace);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
823
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
824 return eSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
825 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
826
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
827
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
828 size_t ZDICT_addEntropyTablesFromBuffer_advanced(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
829 const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
830 ZDICT_params_t params)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
831 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
832 size_t hSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
833 int const compressionLevel = (params.compressionLevel <= 0) ? g_compressionLevel_default : params.compressionLevel;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
834 U32 const notificationLevel = params.notificationLevel;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
835
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
836 /* dictionary header */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
837 MEM_writeLE32(dictBuffer, ZSTD_DICT_MAGIC);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
838 { U64 const randomID = XXH64((char*)dictBuffer + dictBufferCapacity - dictContentSize, dictContentSize, 0);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
839 U32 const compliantID = (randomID % ((1U<<31)-32768)) + 32768;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
840 U32 const dictID = params.dictID ? params.dictID : compliantID;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
841 MEM_writeLE32((char*)dictBuffer+4, dictID);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
842 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
843 hSize = 8;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
844
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
845 /* entropy tables */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
846 DISPLAYLEVEL(2, "\r%70s\r", ""); /* clean display line */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
847 DISPLAYLEVEL(2, "statistics ... \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
848 { size_t const eSize = ZDICT_analyzeEntropy((char*)dictBuffer+hSize, dictBufferCapacity-hSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
849 compressionLevel,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
850 samplesBuffer, samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
851 (char*)dictBuffer + dictBufferCapacity - dictContentSize, dictContentSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
852 notificationLevel);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
853 if (ZDICT_isError(eSize)) return eSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
854 hSize += eSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
855 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
856
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
857
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
858 if (hSize + dictContentSize < dictBufferCapacity)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
859 memmove((char*)dictBuffer + hSize, (char*)dictBuffer + dictBufferCapacity - dictContentSize, dictContentSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
860 return MIN(dictBufferCapacity, hSize+dictContentSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
861 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
862
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
863
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
864 /*! ZDICT_trainFromBuffer_unsafe() :
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
865 * Warning : `samplesBuffer` must be followed by noisy guard band.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
866 * @return : size of dictionary, or an error code which can be tested with ZDICT_isError()
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
867 */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
868 size_t ZDICT_trainFromBuffer_unsafe(
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
869 void* dictBuffer, size_t maxDictSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
870 const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
871 ZDICT_params_t params)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
872 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
873 U32 const dictListSize = MAX(MAX(DICTLISTSIZE_DEFAULT, nbSamples), (U32)(maxDictSize/16));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
874 dictItem* const dictList = (dictItem*)malloc(dictListSize * sizeof(*dictList));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
875 unsigned const selectivity = params.selectivityLevel == 0 ? g_selectivity_default : params.selectivityLevel;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
876 unsigned const minRep = (selectivity > 30) ? MINRATIO : nbSamples >> selectivity;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
877 size_t const targetDictSize = maxDictSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
878 size_t const samplesBuffSize = ZDICT_totalSampleSize(samplesSizes, nbSamples);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
879 size_t dictSize = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
880 U32 const notificationLevel = params.notificationLevel;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
881
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
882 /* checks */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
883 if (!dictList) return ERROR(memory_allocation);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
884 if (maxDictSize <= g_provision_entropySize + g_min_fast_dictContent) { free(dictList); return ERROR(dstSize_tooSmall); }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
885 if (samplesBuffSize < ZDICT_MIN_SAMPLES_SIZE) { free(dictList); return 0; } /* not enough source to create dictionary */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
886
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
887 /* init */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
888 ZDICT_initDictItem(dictList);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
889
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
890 /* build dictionary */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
891 ZDICT_trainBuffer(dictList, dictListSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
892 samplesBuffer, samplesBuffSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
893 samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
894 minRep, notificationLevel);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
895
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
896 /* display best matches */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
897 if (params.notificationLevel>= 3) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
898 U32 const nb = MIN(25, dictList[0].pos);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
899 U32 const dictContentSize = ZDICT_dictSize(dictList);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
900 U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
901 DISPLAYLEVEL(3, "\n %u segments found, of total size %u \n", dictList[0].pos, dictContentSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
902 DISPLAYLEVEL(3, "list %u best segments \n", nb);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
903 for (u=1; u<=nb; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
904 U32 pos = dictList[u].pos;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
905 U32 length = dictList[u].length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
906 U32 printedLength = MIN(40, length);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
907 DISPLAYLEVEL(3, "%3u:%3u bytes at pos %8u, savings %7u bytes |",
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
908 u, length, pos, dictList[u].savings);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
909 ZDICT_printHex((const char*)samplesBuffer+pos, printedLength);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
910 DISPLAYLEVEL(3, "| \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
911 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
912
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
913
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
914 /* create dictionary */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
915 { U32 dictContentSize = ZDICT_dictSize(dictList);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
916 if (dictContentSize < targetDictSize/3) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
917 DISPLAYLEVEL(2, "! warning : selected content significantly smaller than requested (%u < %u) \n", dictContentSize, (U32)maxDictSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
918 if (minRep > MINRATIO) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
919 DISPLAYLEVEL(2, "! consider increasing selectivity to produce larger dictionary (-s%u) \n", selectivity+1);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
920 DISPLAYLEVEL(2, "! note : larger dictionaries are not necessarily better, test its efficiency on samples \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
921 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
922 if (samplesBuffSize < 10 * targetDictSize)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
923 DISPLAYLEVEL(2, "! consider increasing the number of samples (total size : %u MB)\n", (U32)(samplesBuffSize>>20));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
924 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
925
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
926 if ((dictContentSize > targetDictSize*3) && (nbSamples > 2*MINRATIO) && (selectivity>1)) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
927 U32 proposedSelectivity = selectivity-1;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
928 while ((nbSamples >> proposedSelectivity) <= MINRATIO) { proposedSelectivity--; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
929 DISPLAYLEVEL(2, "! note : calculated dictionary significantly larger than requested (%u > %u) \n", dictContentSize, (U32)maxDictSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
930 DISPLAYLEVEL(2, "! consider increasing dictionary size, or produce denser dictionary (-s%u) \n", proposedSelectivity);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
931 DISPLAYLEVEL(2, "! always test dictionary efficiency on samples \n");
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
932 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
933
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
934 /* limit dictionary size */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
935 { U32 const max = dictList->pos; /* convention : nb of useful elts within dictList */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
936 U32 currentSize = 0;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
937 U32 n; for (n=1; n<max; n++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
938 currentSize += dictList[n].length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
939 if (currentSize > targetDictSize) { currentSize -= dictList[n].length; break; }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
940 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
941 dictList->pos = n;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
942 dictContentSize = currentSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
943 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
944
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
945 /* build dict content */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
946 { U32 u;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
947 BYTE* ptr = (BYTE*)dictBuffer + maxDictSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
948 for (u=1; u<dictList->pos; u++) {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
949 U32 l = dictList[u].length;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
950 ptr -= l;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
951 if (ptr<(BYTE*)dictBuffer) { free(dictList); return ERROR(GENERIC); } /* should not happen */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
952 memcpy(ptr, (const char*)samplesBuffer+dictList[u].pos, l);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
953 } }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
954
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
955 dictSize = ZDICT_addEntropyTablesFromBuffer_advanced(dictBuffer, dictContentSize, maxDictSize,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
956 samplesBuffer, samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
957 params);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
958 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
959
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
960 /* clean up */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
961 free(dictList);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
962 return dictSize;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
963 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
964
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
965
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
966 /* issue : samplesBuffer need to be followed by a noisy guard band.
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
967 * work around : duplicate the buffer, and add the noise */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
968 size_t ZDICT_trainFromBuffer_advanced(void* dictBuffer, size_t dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
969 const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
970 ZDICT_params_t params)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
971 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
972 size_t result;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
973 void* newBuff;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
974 size_t const sBuffSize = ZDICT_totalSampleSize(samplesSizes, nbSamples);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
975 if (sBuffSize < ZDICT_MIN_SAMPLES_SIZE) return 0; /* not enough content => no dictionary */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
976
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
977 newBuff = malloc(sBuffSize + NOISELENGTH);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
978 if (!newBuff) return ERROR(memory_allocation);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
979
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
980 memcpy(newBuff, samplesBuffer, sBuffSize);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
981 ZDICT_fillNoise((char*)newBuff + sBuffSize, NOISELENGTH); /* guard band, for end of buffer condition */
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
982
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
983 result = ZDICT_trainFromBuffer_unsafe(
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
984 dictBuffer, dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
985 newBuff, samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
986 params);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
987 free(newBuff);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
988 return result;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
989 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
990
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
991
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
992 size_t ZDICT_trainFromBuffer(void* dictBuffer, size_t dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
993 const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
994 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
995 ZDICT_params_t params;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
996 memset(&params, 0, sizeof(params));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
997 return ZDICT_trainFromBuffer_advanced(dictBuffer, dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
998 samplesBuffer, samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
999 params);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1000 }
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1001
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1002 size_t ZDICT_addEntropyTablesFromBuffer(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1003 const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples)
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1004 {
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1005 ZDICT_params_t params;
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1006 memset(&params, 0, sizeof(params));
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1007 return ZDICT_addEntropyTablesFromBuffer_advanced(dictBuffer, dictContentSize, dictBufferCapacity,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1008 samplesBuffer, samplesSizes, nbSamples,
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1009 params);
2e484bdea8c4 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1010 }