annotate contrib/fuzz/mpatch.cc @ 44363:f7459da77f23

nodemap: introduce an option to use mmap to read the nodemap mapping The performance and memory benefit is much greater if we don't have to copy all the data in memory for each information. So we introduce an option (on by default) to read the data using mmap. This changeset is the last one definition the API for index support nodemap data. (they have to be able to use the mmaping). Below are some benchmark comparing the best we currently have in 5.3 with the final step of this series (using the persistent nodemap implementation in Rust). The benchmark run `hg perfindex` with various revset and the following variants: Before: * do not use the persistent nodemap * use the CPython implementation of the index for nodemap * use mmapping of the changelog index After: * use the MixedIndex Rust code, with the NodeTree object for nodemap access (still in review) * use the persistent nodemap data from disk * access the persistent nodemap data through mmap * use mmapping of the changelog index The persistent nodemap greatly speed up most operation on very large repositories. Some of the previously very fast lookup end up a bit slower because the persistent nodemap has to be setup. However the absolute slowdown is very small and won't matters in the big picture. Here are some numbers (in seconds) for the reference copy of mozilla-try: Revset Before After abs-change speedup -10000: 0.004622 0.005532 0.000910 × 0.83 -10: 0.000050 0.000132 0.000082 × 0.37 tip 0.000052 0.000085 0.000033 × 0.61 0 + (-10000:) 0.028222 0.005337 -0.022885 × 5.29 0 0.023521 0.000084 -0.023437 × 280.01 (-10000:) + 0 0.235539 0.005308 -0.230231 × 44.37 (-10:) + :9 0.232883 0.000180 -0.232703 ×1293.79 (-10000:) + (:99) 0.238735 0.005358 -0.233377 × 44.55 :99 + (-10000:) 0.317942 0.005593 -0.312349 × 56.84 :9 + (-10:) 0.313372 0.000179 -0.313193 ×1750.68 :9 0.316450 0.000143 -0.316307 ×2212.93 On smaller repositories, the cost of nodemap related operation is not as big, so the win is much more modest. Yet it helps shaving a handful of millisecond here and there. Here are some numbers (in seconds) for the reference copy of mercurial: Revset Before After abs-change speedup -10: 0.000065 0.000097 0.000032 × 0.67 tip 0.000063 0.000078 0.000015 × 0.80 0 0.000561 0.000079 -0.000482 × 7.10 -10000: 0.004609 0.003648 -0.000961 × 1.26 0 + (-10000:) 0.005023 0.003715 -0.001307 × 1.35 (-10:) + :9 0.002187 0.000108 -0.002079 ×20.25 (-10000:) + 0 0.006252 0.003716 -0.002536 × 1.68 (-10000:) + (:99) 0.006367 0.003707 -0.002660 × 1.71 :9 + (-10:) 0.003846 0.000110 -0.003736 ×34.96 :9 0.003854 0.000099 -0.003755 ×38.92 :99 + (-10000:) 0.007644 0.003778 -0.003866 × 2.02 Differential Revision: https://phab.mercurial-scm.org/D7894
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Tue, 11 Feb 2020 11:18:52 +0100
parents d37658efbec2
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
38246
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
1 /*
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
2 * mpatch.cc - fuzzer harness for mpatch.c
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
3 *
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
4 * Copyright 2018, Google Inc.
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
5 *
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
6 * This software may be used and distributed according to the terms of
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
7 * the GNU General Public License, incorporated herein by reference.
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
8 */
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
9 #include <iostream>
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
10 #include <memory>
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
11 #include <stdint.h>
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
12 #include <stdlib.h>
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
13 #include <vector>
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
14
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
15 #include "fuzzutil.h"
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
16
43809
51a99e09c54b fuzz: always define LLVMFuzzerInitialize() even if we don't need it
Augie Fackler <augie@google.com>
parents: 38246
diff changeset
17 extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv)
51a99e09c54b fuzz: always define LLVMFuzzerInitialize() even if we don't need it
Augie Fackler <augie@google.com>
parents: 38246
diff changeset
18 {
51a99e09c54b fuzz: always define LLVMFuzzerInitialize() even if we don't need it
Augie Fackler <augie@google.com>
parents: 38246
diff changeset
19 return 0;
51a99e09c54b fuzz: always define LLVMFuzzerInitialize() even if we don't need it
Augie Fackler <augie@google.com>
parents: 38246
diff changeset
20 }
51a99e09c54b fuzz: always define LLVMFuzzerInitialize() even if we don't need it
Augie Fackler <augie@google.com>
parents: 38246
diff changeset
21
38246
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
22 // To avoid having too many OOMs from the fuzzer infrastructure, we'll
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
23 // skip patch application if the resulting fulltext would be bigger
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
24 // than 10MiB.
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
25 #define MAX_OUTPUT_SIZE 10485760
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
26
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
27 extern "C" {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
28 #include "bitmanipulation.h"
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
29 #include "mpatch.h"
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
30
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
31 struct mpatchbin {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
32 std::unique_ptr<char[]> data;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
33 size_t len;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
34 };
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
35
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
36 static mpatch_flist *getitem(void *vbins, ssize_t pos)
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
37 {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
38 std::vector<mpatchbin> *bins = (std::vector<mpatchbin> *)vbins;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
39 const mpatchbin &bin = bins->at(pos + 1);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
40 struct mpatch_flist *res;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
41 LOG(2) << "mpatch_decode " << bin.len << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
42 if (mpatch_decode(bin.data.get(), bin.len, &res) < 0)
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
43 return NULL;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
44 return res;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
45 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
46
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
47 // input format:
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
48 // u8 number of inputs
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
49 // one u16 for each input, its length
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
50 // the inputs
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
51 int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size)
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
52 {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
53 if (!Size) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
54 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
55 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
56 // First byte of data is how many texts we expect, first text
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
57 // being the base the rest being the deltas.
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
58 ssize_t numtexts = Data[0];
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
59 if (numtexts < 2) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
60 // No point if we don't have at least a base text and a delta...
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
61 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
62 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
63 // Each text will be described by a byte for how long it
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
64 // should be, so give up if we don't have enough.
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
65 if ((Size - 1) < (numtexts * 2)) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
66 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
67 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
68 size_t consumed = 1 + (numtexts * 2);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
69 LOG(2) << "input contains " << Size << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
70 LOG(2) << numtexts << " texts, consuming " << consumed << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
71 std::vector<mpatchbin> bins;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
72 bins.reserve(numtexts);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
73 for (int i = 0; i < numtexts; ++i) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
74 mpatchbin bin;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
75 size_t nthsize = getbeuint16((char *)Data + 1 + (2 * i));
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
76 LOG(2) << "text " << i << " is " << nthsize << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
77 char *start = (char *)Data + consumed;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
78 consumed += nthsize;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
79 if (consumed > Size) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
80 LOG(2) << "ran out of data, consumed " << consumed
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
81 << " of " << Size << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
82 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
83 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
84 bin.len = nthsize;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
85 bin.data.reset(new char[nthsize]);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
86 memcpy(bin.data.get(), start, nthsize);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
87 bins.push_back(std::move(bin));
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
88 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
89 LOG(2) << "mpatch_flist" << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
90 struct mpatch_flist *patch =
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
91 mpatch_fold(&bins, getitem, 0, numtexts - 1);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
92 if (!patch) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
93 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
94 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
95 LOG(2) << "mpatch_calcsize" << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
96 ssize_t outlen = mpatch_calcsize(bins[0].len, patch);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
97 LOG(2) << "outlen " << outlen << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
98 if (outlen < 0 || outlen > MAX_OUTPUT_SIZE) {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
99 goto cleanup;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
100 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
101 {
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
102 char *dest = (char *)malloc(outlen);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
103 LOG(2) << "expecting " << outlen << " total bytes at "
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
104 << (void *)dest << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
105 mpatch_apply(dest, bins[0].data.get(), bins[0].len, patch);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
106 free(dest);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
107 LOG(1) << "applied a complete patch" << std::endl;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
108 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
109 cleanup:
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
110 mpatch_lfree(patch);
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
111 return 0;
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
112 }
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
113
46dcb9f14900 fuzz: new fuzzer for the mpatch code
Augie Fackler <augie@google.com>
parents:
diff changeset
114 } // extern "C"