Boris Feld <boris.feld@octobus.net> [Wed, 28 Nov 2018 05:06:58 +0100] rev 40831
contrib: add a helper script that help to build interesting repositories
The script is dedicated to building a couple of repositories that should be
interesting to run discovery from one another. It seems a common enough need
to contribute it upstream.
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 03 Dec 2018 19:42:46 +0300] rev 40830
py3: listify filter() to call len() on it
Differential Revision: https://phab.mercurial-scm.org/D5354
Yuya Nishihara <yuya@tcha.org> [Sun, 18 Nov 2018 18:35:31 +0900] rev 40829
loggingutil: document openlogfile()
This function will be used later for command-server logging.
Yuya Nishihara <yuya@tcha.org> [Sun, 18 Nov 2018 18:25:37 +0900] rev 40828
loggingutil: extract openlogfile() and proxylogger to new module
This module isn't placed under the "utils" package since it needs "ui" to
process things. It's called "loggingutil", not "logutil" because the word
"log" is too obscure in our codebase.
Yuya Nishihara <yuya@tcha.org> [Sun, 18 Nov 2018 18:21:39 +0900] rev 40827
blackbox: pass in options to _openlogfile() as arguments
This prepares for extracting utility function from the blackbox module.
Yuya Nishihara <yuya@tcha.org> [Sat, 17 Nov 2018 22:10:27 +0900] rev 40826
blackbox: just try writing to repo.vfs and update lastlogger on success
This is simpler and more robust. Before, an empty ".hg" directory would be
created if it's removed after checking vfs.isdir('.').
Yuya Nishihara <yuya@tcha.org> [Tue, 20 Nov 2018 22:31:12 +0900] rev 40825
vfs: add option to not create parent directories implicitly
In blackbox, we don't want to create a ".hg" directory by mistake. This
provides a race-safe option to achieve that.
Boris Feld <boris.feld@octobus.net> [Thu, 15 Nov 2018 02:55:33 +0100] rev 40824
repo: add a `wcachevfs` to access the `.hg/wcache/` directory
This wvfs will allow us to migrate various cache to the new `wcache` directory.
Helping with cache issues with "share".
Boris Feld <boris.feld@octobus.net> [Thu, 15 Nov 2018 02:46:31 +0100] rev 40823
cache: create `wcache` directory at init time
The cache directory will be needed very quickly, so it seems simpler to create
it early to make sure it has the same owner and permission than the other
directory in the repository.
Boris Feld <boris.feld@octobus.net> [Thu, 15 Nov 2018 02:38:55 +0100] rev 40822
cache: create `cache` directory at init time
The cache directory will be needed very quickly, so it seems simpler to create
it early to make sure it has the same owner and permission than the other
directory in the repository.
Boris Feld <boris.feld@octobus.net> [Thu, 15 Nov 2018 17:08:23 +0100] rev 40821
check-exec: write file in 'wcache' instead of 'cache'
Some cache are relevant or affected by the working copy used. So the `.hg/cache`
directory is not the best place for them because multiple shared repository can
end up fighting over them.
To address this issue, we introduce a new 'wcache' directory to host this kind
of cache.
The first user are the `checkisexec` type file. These files describe property of
the working copy and fit the use-case well.
Boris Feld <boris.feld@octobus.net> [Fri, 23 Nov 2018 06:09:44 +0100] rev 40820
mmapindex: set default to 1MB
mmapping index is more efficient if we only need a small part of it.
The 1MB value has been picked arbitrarily, a lower value might be better.
On a large repository with a 60MB index, we see the following performance
gain:
hg perfindex
before: ! wall 0.032023 comb 0.040000 user 0.000000 sys 0.040000 (best of 100)
after: ! wall 0.000196 comb 0.000000 user 0.000000 sys 0.000000 (best of 1060)
The speed boost benefit all cases, including the one where the full index
needs to be parsed.
hg perfindex --rev 0
before: ! wall 0.040673 comb 0.030000 user 0.000000 sys 0.030000 (best of 100)
after ! wall 0.010713 comb 0.020000 user 0.010000 sys 0.010000 (best of 212)
This gain reflect in higher level operation:
hg perfbookmarks --clear-revlogs
before: ! wall 0.161339 comb 0.160000 user 0.130000 sys 0.030000 (best of 56)
after: ! wall 0.123228 comb 0.120000 user 0.120000 sys 0.000000 (best of 68)
Boris Feld <boris.feld@octobus.net> [Fri, 23 Nov 2018 06:07:33 +0100] rev 40819
mmapindex: move the 'mmapindexthreshold' option out of experimental
The option is useful and should be advertised more. We move it out of
experimental as a first step. The `storage` section is selected as this is
related to how the storage is accessed. A new 'performance' section might be
more appropriate.
We move from 'mmapindexthreshold` to `mmap-threshold` as non-index item are
also suitable for mmap (eg: the rev-branch-cache).
If relevant, we can introduce sub-option `mmap-threshold.revlog-index` later.
Boris Feld <boris.feld@octobus.net> [Sat, 01 Dec 2018 15:57:27 +0100] rev 40818
perf: add a --rev attribute to perfindex
This allow for benchmarking the time necessary to look for other version than
the tip.
Boris Feld <boris.feld@octobus.net> [Fri, 23 Nov 2018 06:03:38 +0100] rev 40817
perf: update perfindex to be more realistic
The previous code was creating a revlog manually, we now use the actual
`localrepo` method to create it.
We have to jump though extra hops to work around the impact of filecache.
Martin von Zweigbergk <martinvonz@google.com> [Sun, 02 Dec 2018 13:09:46 -0800] rev 40816
match: drop unnecessary wrapping of regex in group
It seems the regexes have been wrapped in an unnamed group since
b6c42714d900 (Add locate command., 2005-07-05). In that commit, the
grouping was needed because there was a "head" ('^') added before the
group and a "tail" (os.sep) added after it. It seems the head was
moved inside the group in 1c0c413cccdd (Get add and locate to use new
repo and dirstate walk code., 2005-07-18) and the tail was moved
inside the group in 89985a1b3427 (Clean up walk and changes code to
use normalised names properly., 2005-07-31), So it seems to me that
we've carried around the unnecessary group for 13 years. This patch
removes it.
Differential Revision: https://phab.mercurial-scm.org/D5352
Martin von Zweigbergk <martinvonz@google.com> [Sun, 02 Dec 2018 13:45:20 -0800] rev 40815
match: use _BASE_SIZE instead of magic value 4
Differential Revision: https://phab.mercurial-scm.org/D5351
Martin von Zweigbergk <martinvonz@google.com> [Sun, 02 Dec 2018 13:44:49 -0800] rev 40814
match: make "groupsize" include the trailing "|"
I think this is a little easier to follow and it will simplify later
patches too.
Differential Revision: https://phab.mercurial-scm.org/D5350
Martin von Zweigbergk <martinvonz@google.com> [Sun, 02 Dec 2018 13:09:43 -0800] rev 40813
match: fix an unaligned (but harmless) indent
Differential Revision: https://phab.mercurial-scm.org/D5349
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:41:10 +0100] rev 40812
match: raise an Abort error instead of OverflowError
This case of OverflowError (one single pattern being too large) has never been
properly caught in the past.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 21:02:02 +0100] rev 40811
match: avoid translating glob to matcher multiple times for large sets
For hgignore with many globs, the resulting regexp might not fit under the 20K
length limit. So the patterns need to be broken up in smaller pieces.
Before this change, the logic was re-starting the full process from scratch
for each smaller pieces, including the translation of globs into regexp.
Effectively doing the work over and over.
If the 20K limit is reached, we are likely in a case where there is many such
glob, so exporting them is especially expensive and we should be careful not
to do that work more than once.
To work around this, we now translate glob to regexp once and for all. Then,
we assemble the resulting individual regexp into valid blocks.
This raises a very significant performance win for large `.hgignore file`:
Before: ! wall 0.153153 comb 0.150000 user 0.150000 sys 0.000000 (median of 66)
After: ! wall 0.059793 comb 0.060000 user 0.060000 sys 0.000000 (median of 100)
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:25:49 +0100] rev 40810
match: extract function that group regexps
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:16:05 +0100] rev 40809
match: test for overflow error in pattern
If a single pattern is too large to handle, we raise an exception. This case is
now doctested.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:20:32 +0100] rev 40808
match: extract a literal constant into a symbolic one
Matt Harbison <matt_harbison@yahoo.com> [Sat, 01 Dec 2018 21:42:48 -0500] rev 40807
tests: apply binary mode to output in seq.py
I noticed this when playing with running tests using WSL, and iterating over the
output yielded '0\r', '1\r',... Most of the other *.py tools do this, and `seq`
on MSYS lacks '\r' in the output, so this is more consistent.
Boris Feld <boris.feld@octobus.net> [Fri, 23 Nov 2018 01:09:37 +0100] rev 40806
perf: add a `--clear-caches` to `perfbranchmapupdate`
This flag will help to measure the time we spend loading various cache that
support the branchmap update.
Example for an 500 000 revisions repository:
hg perfbranchmapupdate --base 'not tip' --target 'tip'
! wall 0.000860 comb 0.000000 user 0.000000 sys 0.000000 (best of 336)
hg perfbranchmapupdate --base 'not tip' --target 'tip' --clear-caches
! wall 0.029494 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 21:11:47 +0000] rev 40805
perf: start from an existing branchmap if possible
If the --base set if a superset of one of the cached branchmap, we should use as
a starting point. This greatly help the overall runtime of
`hg perfbranchmapupdate`
For example, for a repository with about 500 000 revisions, using this trick
make the command runtime move from about 200 second to about 10 seconds. A 20x
gain.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 20:35:22 +0000] rev 40804
perf: rely on repoview for perfbranchmapupdate
Using 'repoview' matching the base and target subset make the benchmark more
realistic. It also unlocks optimization to make the command initialization
faster.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 22:56:06 +0100] rev 40803
perf: pre-indent some code in `perfbranchmapupdate`
This make the next patch easier to read.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 12:02:25 +0000] rev 40802
perf: add a `perfbranchmapupdate` command
This command benchmark the time necessary to update the branchmap between two
sets of revisions. This changeset introduce a first version, doing nothing fancy
regarding cache or other internal details.
Anton Shestakov <av6@dwimlabs.net> [Mon, 05 Nov 2018 13:52:19 +0800] rev 40801
push: config option to control behavior when pushing to a publishing server
Pushing to a publishing server by mistake can lead to a difficult situation to
solve because evolution doesn't work on public changesets. This new
experimental config tries to help avoiding unintentionally (or at least being
aware of) pushing to publishing remotes.
`hg push --publish` can be used to make push succeed even when auto-publish is
set to 'abort'.
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 30 Nov 2018 17:42:55 +0300] rev 40800
narrowcommands: remove an unrequired `repo.narrowpats` call
We call that few lines above and do nothing significant in between which can
change the narrowpats. So let's use values returned by that call.
Differential Revision: https://phab.mercurial-scm.org/D5348
Augie Fackler <augie@google.com> [Thu, 29 Nov 2018 16:44:01 -0500] rev 40799
manifest: reject lines shorter than 42 bytes, not 22
Yuya correctly spotted during the review of f27f8e9ef1e73 that we're
dealing with hexlified hashes here, and so it should be 42 bytes not
22.
Differential Revision: https://phab.mercurial-scm.org/D5347
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Nov 2018 20:05:38 +0900] rev 40798
blackbox: initialize logger with repo instance
The blackboxlogger is unusable without a repo. Let's simply initialize it
with a repo instance.
Yuya Nishihara <yuya@tcha.org> [Sat, 17 Nov 2018 20:56:25 +0900] rev 40797
blackbox: do not nullify repo to deactivate the logger on failure
The _repo will be a mandatory attribute. Instead, make the logger to not
track any events.
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Nov 2018 20:02:34 +0900] rev 40796
blackbox: extract global last logger to proxylogger class
So the blackboxlogger can be instantiated with a repo.
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Nov 2018 19:36:21 +0900] rev 40795
ui: pass in bytes opts dict to logger.log()
This is the convention of the Mercurial API.
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Nov 2018 19:35:33 +0900] rev 40794
ui: pass in formatted message to logger.log()
This makes sure that all logger instances will handle the message arguments
properly.
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Nov 2018 17:34:46 +0900] rev 40793
blackbox: send debug message to logger by core ui
Since the core ui.log() may recurse into ui.log() through ui.debug(), it
must guard against recursion.
The ui extension class can finally be removed.
Yuya Nishihara <yuya@tcha.org> [Sat, 17 Nov 2018 20:23:50 +0900] rev 40792
blackbox: change the way of deactivating the logger on write error
This prepares for the upcoming code move. The recursion guard will be ported
to the core ui.
Martin von Zweigbergk <martinvonz@google.com> [Wed, 28 Nov 2018 10:12:50 -0800] rev 40791
match: remove obsolete catching of OverflowError
Since 0f6a1bdf89fb (match: handle large regexes, 2007-08-19), we catch
an OverflowError from the regex engine and split up the regex if that
happens. In 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11),
that was extended to raise an OverflowError in our code even if the
regex engine doesn't raise it. It's unclear if there was a range of
regex sizes where the OverflowError would be raised from the regex
engine but that were still below the limit we added in our
code. Either way, both limitations were probably removed in Python
2.7.4 when the regex code width was extended from 16bit to 32bit (or
Py_UCS4) integer (thanks to Yuya for finding that out).
If at least the first limitation was removed, we no longer should be
using OverflowError for flow control, so this patch changes that.
Differential Revision: https://phab.mercurial-scm.org/D5309
Boris Feld <boris.feld@octobus.net> [Tue, 27 Nov 2018 02:10:14 +0100] rev 40790
sparse: raise a move verbose index error from the C code
If we don't like a value we should print it.
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 23:10:56 +0300] rev 40789
narrow: drop the bundle2 capability since we have server capabilities (BC)
This patch drops the narrow bundle2 capabilities since we introduced narrow
server capabilities which are more nice and now used everywhere.
I am not sure what it can affect, so on safe side I marked this as BC. Also I
removed the NARROWCAP constant as that kind of conflicts with the same name
constant in wireprototypes.py.
Differential Revision: https://phab.mercurial-scm.org/D4892
Boris Feld <boris.feld@octobus.net> [Sun, 02 Jul 2017 04:06:24 +0200] rev 40788
vfs: extract the audit path logic into a submethod
This will make it possible to apply it in more cases.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 20:01:28 +0100] rev 40787
subrepo-git: use an official origvfs when appropriate
The origvfs has the auditor properly set and can move file without issue.
The current code is currently working without errors because rename are not
audited, yet.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 19:26:05 +0100] rev 40786
revert: extract origvfs logic in a sub-function
The subrepo's "revert" logic could benefit from it.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 18:44:07 +0100] rev 40785
vfs: treat 'undo.' file the same as 'journal.' file
They are the same kind of file, they are protected by the store lock, but
directly lives inside the '.hg' directory.
No warnings were ever raised about them because `vfs.rename` is not audited.
Something we are trying to change.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 21:00:13 +0100] rev 40784
perf: add a perfignore command
The command is meant to benchmark operations related to hgignore. Right now the
command is benchmarking the loading time of the hgignore rules.
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 26 Nov 2018 15:36:06 +0300] rev 40783
py3: use pycompat.xrange instead of xrange
xrange does not exists on Python 3.
Differential Revision: https://phab.mercurial-scm.org/D5302
Pulkit Goyal <pulkit@yandex-team.ru> [Tue, 27 Nov 2018 16:16:13 +0300] rev 40782
store: write fncache only once if there are both adds and removes
Differential Revision: https://phab.mercurial-scm.org/D5307
Boris Feld <boris.feld@octobus.net> [Tue, 20 Nov 2018 17:44:24 +0000] rev 40781
perf: disable revlogs clearing in `perftags` by default
This aligns things with what `perfbookmarks` does. I decided to disable the
revlogs clearing by default to focus on the core logic by default, ignoring
side effects.
If we prefer to emphasize the side effect, we can instead keep this on in
`perftags` and enable it by default in `perfbookmarks`.
Boris Feld <boris.feld@octobus.net> [Tue, 20 Nov 2018 10:55:20 +0000] rev 40780
perf: add a `clear-revlogs` flag to `perfbookmarks`
This flag (off by default) makes it possible to enable the refresh of the
changelog and revlog. This is useful to check for costly side effects of
bookmark loading.
Usually, these side effects are shared with other logics (eg: tags).
example output in my mercurial repo (with 1 bookmark, so not a great example):
$ hg perfbookmarks
! wall 0.000044
$ hg perfbookmarks --clear-revlogs
! wall 0.001380
Boris Feld <boris.feld@octobus.net> [Tue, 20 Nov 2018 10:38:15 +0000] rev 40779
tags: cache `repo.changelog` access when checking tags nodes
The tags reading process checks if the nodes referenced in tags exist. Caching
the access to `repo.changelog` provides a large speedup for repositories with
many tags.
running `hg perftags` in a large private repository
before: ! wall 0.393464 comb 0.390000 user 0.330000 sys 0.060000 (median of 25)
after: ! wall 0.267711 comb 0.270000 user 0.210000 sys 0.060000 (median of 38)
Boris Feld <boris.feld@octobus.net> [Tue, 20 Nov 2018 10:46:20 +0000] rev 40778
perf: add a `clear-revlogs` flag to `perftags`
This flag (on by default) makes it possible to disable the refresh of the
changelog and revlog. This is useful to check for the time spent in the core
tags logic without the associated side effects. Usually, these side effects
are shared with other logics (eg: bookmarks).
Example output in my Mercurial repository
$ hg perftags
! wall 0.017919 comb 0.020000 user 0.020000 sys 0.000000 (best of 141)
$ hg perftags --no-clear-revlogs
! wall 0.012982 comb 0.010000 user 0.010000 sys 0.000000 (best of 207)
Boris Feld <boris.feld@octobus.net> [Sun, 25 Nov 2018 13:37:53 +0100] rev 40777
perf: stop creating new revlog by hand in perftags
It's better to let the repository logic create its own object. We now just clear
the cache. New object will be automatically created from there.
Boris Feld <boris.feld@octobus.net> [Mon, 26 Nov 2018 00:23:12 +0100] rev 40776
revlog: update the documentation for `trim_endidx`
The function role drifted since the function was commented.
Boris Feld <boris.feld@octobus.net> [Mon, 26 Nov 2018 00:21:09 +0100] rev 40775
revlog: properly detect corrupted revlog in `index_get_length`
Pointed out by Yuya Nishihara.
Boris Feld <boris.feld@octobus.net> [Mon, 26 Nov 2018 00:15:12 +0100] rev 40774
perf: rename `perfhelper-tracecopies` to `perfhelper-pathcopies`
The command it supports is called `perfpathcopies`. It seems better to align the
names.
Boris Feld <boris.feld@octobus.net> [Mon, 26 Nov 2018 00:13:50 +0100] rev 40773
perf: add a docstring to `perfpathcopies`
This will help people to find this command.
Boris Feld <boris.feld@octobus.net> [Mon, 26 Nov 2018 00:08:11 +0100] rev 40772
revlog: update the docstring of `ancestors` to match reality
Code using this method expect the revision to be (reverse) sorted. As pointed by
Yuya Nishihara, the docstring should reflect that.
Augie Fackler <augie@google.com> [Mon, 26 Nov 2018 15:53:34 -0500] rev 40771
remotefilelog: fix typo in docstring
Differential Revision: https://phab.mercurial-scm.org/D5306
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 23 Nov 2018 18:58:16 +0300] rev 40770
store: append to fncache if there are only new files to write
Before this patch, if we have to add a new entry to fncache, we write the whole
fncache again which slows things down on large fncache which have millions of
entries. Addition of a new entry is common operation while pulling new files or
commiting a new file.
This patch adds a new fncache.addls set which keeps track of the additions
happening and store them. When we write the fncache, we will just read the addls
set and append those entries at the end of fncache.
We make sure that the entries are new entries by loading the fncache and making
sure entry does not exists there. In future if we can check if an entry is new
without loading the fncache, that will speed up things more.
Performance numbers for commiting a new file:
mercurial repo
before: 0.08784651756286621
after: 0.08474504947662354
mozilla-central
before: 1.83314049243927
after: 1.7054164409637451
netbeans
before: 0.7953150272369385
after: 0.7202838659286499
pypy
before: 0.17805707454681396
after: 0.13431048393249512
In our internal repo, the performance improvement is in seconds.
I have used octobus's ASV perf benchmark thing to get the above numbers. I also
see some minute perf improvements related to creating a new commit without a new
file, but I believe that's just some noise.
Differential Revision: https://phab.mercurial-scm.org/D5301
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 26 Nov 2018 15:38:35 +0300] rev 40769
py3: fix couple of division operator to do integer divison
Differential Revision: https://phab.mercurial-scm.org/D5305
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 26 Nov 2018 15:37:48 +0300] rev 40768
py3: use dict.items() instead of dict.iteritems()
dict.iteritems() does not exist on Python 3.
Differential Revision: https://phab.mercurial-scm.org/D5304