Fri, 26 Apr 2019 00:48:12 +0200 deltas: set estimated compression upper bound to "3x" instead of "10x"
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 26 Apr 2019 00:48:12 +0200] rev 42469
deltas: set estimated compression upper bound to "3x" instead of "10x" In pratice, we very rarely observer compression better than "3x" on manifest deltas. Having a more aggressive estimate significantly helps our pathological use case on a private repository. Here are a comparison of timings using different upper bound. Estimated compression | ø | ×10 | ×5 | ×3 | timing | 14.11 | 2.61 | 1.96 | 1.53 | We also tested the impact of this series on an array of public repositories. This shown no impact in either size nor timing. Full data set below for those interested. Size ---- Regarding size, not significant impact have been noticed on neither public nor private repositories. Here are the number we gathered on public repositories: zlib/upperbound | no | 10x | 5x | 3x mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730 pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913 netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%) mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%) mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%) private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%) zstd/upperbound | no | 10x | 5x | 3x mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206 pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230 netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%) mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%) mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%) private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%) Speed ------ Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds. mercurial zlib | no | 10x | 5x | 3x | total | 65.551783 | 65.388887 | 65.260658 | 65.321199 | max | 0.034544 | 0.034571 | 0.034659 | 0.034521 | 99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 | zstd | no | 10x | 5x | 3x | total | 49.118449 | 49.054062 | 48.753588 | 48.740230 | max | 0.009338 | 0.009239 | 0.009202 | 0.009178 | 99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 | pypy zlib | no | 10x | 5x | 3x | total | 560.865984 | 558.983817 | 559.083815 | 559.349152 | max | 0.219614 | 0.215922 | 0.218112 | 0.218107 | 99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 | zstd | no | 10x | 5x | 3x | total | 349.393280 | 347.395819 | 347.185407 | 345.643985 | max | 0.084143 | 0.083536 | 0.081834 | 0.082178 | 99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 | netbeans zlib | no | 10x | 5x | 3x | total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 | max | 2.666852 | 2.672059 | 2.662453 | 2.662936 | 99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 | zstd | no | 10x | 5x | 3x | total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 | max | 2.063482 | 2.062851 | 2.065229 | 2.060147 | 99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 | mozilla zlib | no | 10x | 5x | 3x | total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 | max | 3.383474 | 3.387400 | 3.405711 | 3.387316 | 99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 | zstd | no | 10x | 5x | 3x | total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 | max | 1.460822 | 1.449640 | 1.439747 | 1.465304 | 99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |
Mon, 21 Jan 2019 22:46:31 +0100 deltas: skip if projected compressed size is bigger than previous snapshot
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:46:31 +0100] rev 42468
deltas: skip if projected compressed size is bigger than previous snapshot If we have a delta, we check constraints against a lower bound estimate of the resulting compressed delta. We then checks this projected size against the `size(snapshotⁿ) > size(snapshotⁿ⁺¹)` constraint. This allows to exclude potential base candidates before doing any expensive computation. This only apply to the intermediate-snapshot case since this constraint only apply to them. For some pathological cases of a private repository this step provide a further performance boost (timing from `hg perfrevlogwrite`): before: 3.010646 seconds after: 2.609307 seconds
Mon, 21 Jan 2019 22:46:18 +0100 deltas: skip if projected compressed size does not match text size constraint
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:46:18 +0100] rev 42467
deltas: skip if projected compressed size does not match text size constraint If we have a delta, we check constraints against a lower bound estimate of the resulting compressed delta. We then checks this projected size against the ½ⁿ size constraints. This allows to exclude potential base candidates before doing any expensive computation. This only apply to the intermediate-snapshot case since this constraint only apply to them. For some pathological cases of a private repository this step provide a further performance boost (timing from `hg perfrevlogwrite`): before: 3.145906 seconds after: 3.010646 seconds
Mon, 21 Jan 2019 22:37:30 +0100 deltas: accept and skip None return for delta info
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:37:30 +0100] rev 42466
deltas: accept and skip None return for delta info They are some extra computation that will shortcut the delta compression if the delta seems hopeless, returning None.
Mon, 21 Jan 2019 22:36:16 +0100 delta: move some delta chain related computation earlier in deltainfo
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:36:16 +0100] rev 42465
delta: move some delta chain related computation earlier in deltainfo They are some more optimization change that will make use of this in the function. So we retrieve the data earlier.
Thu, 25 Apr 2019 22:50:33 +0200 deltas: skip if projected delta size is bigger than previous snapshot
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Thu, 25 Apr 2019 22:50:33 +0200] rev 42464
deltas: skip if projected delta size is bigger than previous snapshot Before computing any delta, we get a basic estimation of the delta size we can expect and the resulted compressed value. We then checks this projected size against the `size(snapshotⁿ) > size(snapshotⁿ⁺¹)` constraint. This allows to exclude potential base candidates before doing any expensive computation. This only apply to the intermediate-snapshot case since this constraint only apply to them. For some pathological cases of a private repository this step provide a significant performance boost (timing from `hg perfrevlogwrite`): before: 14.115908 seconds after: 3.145906 seconds
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -6 +6 +10 +30 +100 +300 +1000 +3000 tip