Taapas Agrawal <taapas2897@gmail.com> [Fri, 14 Jun 2019 18:25:14 +0530] rev 42470
strip: during merge allow strip only when -f is used
This ensures to abort strip to `hg strip` when we have a merge
in progress and allow it only when a `--force` flag is used.
Differential Revision: https://phab.mercurial-scm.org/D6529
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 26 Apr 2019 00:48:12 +0200] rev 42469
deltas: set estimated compression upper bound to "3x" instead of "10x"
In pratice, we very rarely observer compression better than "3x" on manifest
deltas. Having a more aggressive estimate significantly helps our pathological
use case on a private repository. Here are a comparison of timings using
different upper bound.
Estimated compression | ø | ×10 | ×5 | ×3 |
timing | 14.11 | 2.61 | 1.96 | 1.53 |
We also tested the impact of this series on an array of public repositories.
This shown no impact in either size nor timing.
Full data set below for those interested.
Size
----
Regarding size, not significant impact have been noticed on neither public nor
private repositories. Here are the number we gathered on public repositories:
zlib/upperbound | no | 10x | 5x | 3x
mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730
pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913
netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%)
mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%)
mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%)
private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%)
zstd/upperbound | no | 10x | 5x | 3x
mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206
pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230
netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%)
mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%)
mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%)
private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%)
Speed
------
Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds.
mercurial
zlib | no | 10x | 5x | 3x |
total | 65.551783 | 65.388887 | 65.260658 | 65.321199 |
max | 0.034544 | 0.034571 | 0.034659 | 0.034521 |
99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 |
zstd | no | 10x | 5x | 3x |
total | 49.118449 | 49.054062 | 48.753588 | 48.740230 |
max | 0.009338 | 0.009239 | 0.009202 | 0.009178 |
99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 |
pypy
zlib | no | 10x | 5x | 3x |
total | 560.865984 | 558.983817 | 559.083815 | 559.349152 |
max | 0.219614 | 0.215922 | 0.218112 | 0.218107 |
99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 |
zstd | no | 10x | 5x | 3x |
total | 349.393280 | 347.395819 | 347.185407 | 345.643985 |
max | 0.084143 | 0.083536 | 0.081834 | 0.082178 |
99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 |
netbeans
zlib | no | 10x | 5x | 3x |
total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 |
max | 2.666852 | 2.672059 | 2.662453 | 2.662936 |
99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 |
zstd | no | 10x | 5x | 3x |
total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 |
max | 2.063482 | 2.062851 | 2.065229 | 2.060147 |
99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 |
mozilla
zlib | no | 10x | 5x | 3x |
total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 |
max | 3.383474 | 3.387400 | 3.405711 | 3.387316 |
99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 |
zstd | no | 10x | 5x | 3x |
total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 |
max | 1.460822 | 1.449640 | 1.439747 | 1.465304 |
99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:46:31 +0100] rev 42468
deltas: skip if projected compressed size is bigger than previous snapshot
If we have a delta, we check constraints against a lower bound estimate of the
resulting compressed delta. We then checks this projected size against the
`size(snapshotⁿ) > size(snapshotⁿ⁺¹)` constraint. This allows to exclude
potential base candidates before doing any expensive computation.
This only apply to the intermediate-snapshot case since this constraint only
apply to them.
For some pathological cases of a private repository this step provide a
further performance boost (timing from `hg perfrevlogwrite`):
before: 3.010646 seconds
after: 2.609307 seconds
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:46:18 +0100] rev 42467
deltas: skip if projected compressed size does not match text size constraint
If we have a delta, we check constraints against a lower bound estimate of the
resulting compressed delta. We then checks this projected size against the ½ⁿ
size constraints. This allows to exclude potential base candidates before doing
any expensive computation.
This only apply to the intermediate-snapshot case since this constraint only apply to
them.
For some pathological cases of a private repository this step provide a
further performance boost (timing from `hg perfrevlogwrite`):
before: 3.145906 seconds
after: 3.010646 seconds
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:37:30 +0100] rev 42466
deltas: accept and skip None return for delta info
They are some extra computation that will shortcut the delta compression if the
delta seems hopeless, returning None.
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 21 Jan 2019 22:36:16 +0100] rev 42465
delta: move some delta chain related computation earlier in deltainfo
They are some more optimization change that will make use of this in the
function. So we retrieve the data earlier.
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Thu, 25 Apr 2019 22:50:33 +0200] rev 42464
deltas: skip if projected delta size is bigger than previous snapshot
Before computing any delta, we get a basic estimation of the delta size we can
expect and the resulted compressed value. We then checks this projected size
against the `size(snapshotⁿ) > size(snapshotⁿ⁺¹)` constraint. This allows to
exclude potential base candidates before doing any expensive computation.
This only apply to the intermediate-snapshot case since this constraint only
apply to them.
For some pathological cases of a private repository this step provide a
significant performance boost (timing from `hg perfrevlogwrite`):
before: 14.115908 seconds
after: 3.145906 seconds
Valentin Gatien-Baron <vgatien-baron@janestreet.com>, Pierre-Yves David <pierre-yves.david@octobus.net> [Thu, 25 Apr 2019 22:30:14 +0200] rev 42463
deltas: skip if projected delta size does not match text size constraint
Before computing any delta, we get a basic estimation of the delta size we can
expect and the resulted compressed value. We then checks this projected size
against the ½ⁿ size constraints. This allows to exclude potential base
candidates before doing any expensive computation.
This only apply to the intermediate-snapshot case since this constraint only
apply to them.
In practice we only perform this new checks for the manifestlog. Manifest log
combine two property: it is likely to have delta chain issue and its
diffing/compression is fairly predictable.
The initial author of this changeset is Valentin Gatien-Baron providing the
initial idea and initial testing, Pierre-Yves David later consolidated the code
in the right location and run more extensive testing.
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 26 Apr 2019 00:28:22 +0200] rev 42462
revlog: add the option to track the expected compression upper bound
There are various optimization we can do if we can estimate the size of delta
before actually spending CPU compressing them. So we add a attributed dedicated
to tracking that.
We only use it on Manifest because (1) it structure is quite stable across all
Mercurial repository so its compression ratio is fairly universal. This is the
revlog with most extreme delta (cf the sparse-revlog optimization).
This will be put to use in later changesets.
Right now the compression upper bound is set to 10. This is a fairly
conservative value (observed value is more around 3), but I prefer to be safe
while introducing the optimization principles. We can tune the optimization
threshold later.
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 12 Jun 2019 17:30:24 +0100] rev 42461
perf: clarify some of the custom behavior of `perfrevlogwrite`
This reduce the chance of developers being surprised by special cases.
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 12 Jun 2019 16:56:41 +0100] rev 42460
perf: fix perfrevlogwrite --count documentation
The help text was copy pasted from the previous option.
Georges Racinet <georges.racinet@octobus.net> [Fri, 17 May 2019 00:17:43 +0200] rev 42459
rust: switched to 'cargo rustc' in setup.py
This is more flexible in the passing of additional flags, also
what setuptools_rust does, giving less uncertainty about non-Linux
platforms.
Georges Racinet <georges.racinet@octobus.net> [Fri, 14 Jun 2019 11:18:06 +0100] rev 42458
rust-cpython: fix build for MacOSX
MacOSX needs special link flags. Quoting the README of rust-cpython:
create a `.cargo/config` with the following content:
```
[target.x86_64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
```
This is tested with Python 2.7 (Anaconda install) and Python 3
(Homebrew install)
Georges Racinet <georges.racinet@octobus.net> [Fri, 14 Jun 2019 10:57:07 +0100] rev 42457
rust-cpython: management of shared libray suffix
Before this changeset, the shared library objects suffixes
were both (rustc output and Python input) hardcoded to '.so',
which is wrong for Python3 and non Linux targets.