muxator <a.mux@inwind.it> [Tue, 09 Oct 2018 22:16:25 +0200] rev 40105
packaging: blindly factor out trap's cleanup function in builddeb
This commit blindly extracts builddeb's trap routine in a dedicated function.
While doing so, I think two bugs are exposed, which will be addressed in the
next commits:
- single quoting around '$CLEANUP' will always evaluate to the literal
'$CLEANUP' regardless of the variable's value. The "if" will always be true.
- the removal operation will not expand $PWD (and a variable expansion would
need double quotes, anyways.
muxator <a.mux@inwind.it> [Tue, 09 Oct 2018 21:40:49 +0200] rev 40104
packaging: print full path to the packages when builddeb finishes successfully
muxator <a.mux@inwind.it> [Tue, 09 Oct 2018 21:39:39 +0200] rev 40103
packaging: print more specific error messages when builddeb fails
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 09 Oct 2018 12:56:11 -0700] rev 40102
cmdutil: sort unresolved paths
I noticed that `hg status` was printing unresolved paths in a
non-deterministic order. This patch fixes that.
I'm not sure if the sorting should be done in
merge.mergestate.unresolved() instead. Either way fixes the
presentation issue.
Differential Revision: https://phab.mercurial-scm.org/D4929
Yuya Nishihara <yuya@tcha.org> [Tue, 09 Oct 2018 07:46:01 +0900] rev 40101
fuzz: report error if Python code raised exception
I think that's what we wanted to do, given the most of the code block is
surrounded by try-except. 'lazymanifest(mdata)' is moved to the try block
as it can fail.
Yuya Nishihara <yuya@tcha.org> [Tue, 09 Oct 2018 07:42:05 +0900] rev 40100
revlog: explicitly initialize static variables
I know .bss section is zero-filled, but explicit initialization should be
better as we rely on that.
Joerg Sonnenberger <joerg@bec.de> [Mon, 08 Oct 2018 21:53:32 +0200] rev 40099
tests: do not change sys.path, it can break loading cext.parsers
When running this tests with run-tests, the prefix would resolve
mercurial.cext to the source tree and the attempt to load
mercurial.cext.parsers would therefore fail since it doesn't exist in
it. With the regular search path from run-tests, it is picked up from
the temporary prefix correctly.
Differential Revision: https://phab.mercurial-scm.org/D4910
Joerg Sonnenberger <joerg@bec.de> [Mon, 08 Oct 2018 21:51:20 +0200] rev 40098
tests: deal with differences in tic from ncurses and NetBSD
Differential Revision: https://phab.mercurial-scm.org/D4909
Joerg Sonnenberger <joerg@bec.de> [Mon, 08 Oct 2018 20:07:13 +0200] rev 40097
closehead: fix close-head -r listification
Differential Revision: https://phab.mercurial-scm.org/D4908
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Thu, 23 Aug 2018 12:25:54 +0900] rev 40096
import-checker: use testparseutil.embedded() to centralize detection logic
This patch fixes issues of embedded() in import-checker.py below, too.
- overlook (or mis-detect) the end of inline script in doctest style
- overlook inline script in doctest style at the end of file
(and ignore invalid un-closed heredoc at the end of file, too)
- overlook code fragment in styles below
- "python <<EOF" (heredoc should be "cat > file <<EOF" style)
- "cat > foobar.py << ANYLIMIT" (limit mark should be "EOF")
- "cat << EOF > foobar.py" (filename should be placed before limit mark)
- "cat >> foobar.py << EOF" (appending is ignored)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Thu, 23 Aug 2018 12:25:54 +0900] rev 40095
tests: use NO_CHECK_EOF as heredoc limit mark to omit checking code fragments
This patch uses NO_CHECK_EOF as heredoc limit mark instead of EOF, in
order to avoid checking all python code fragments in
test-contrib-check-code.t, because almost all of them has
un-recommended implementations intentionally.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Thu, 23 Aug 2018 12:25:54 +0900] rev 40094
contrib: add an utility module to parse test scripts
This patch centralizes the logic to pick up code fragments embedded in
*.t script, in order to:
- apply checking with patterns in check-code.py on such embedded
code fragments
Now, check-code.py completely ignores embedded code
fragments. I'll post another patch series to check them.
- replace similar code path in contrib/import-checker.py
Current import-checker.py has problems below. Fixing each of them
is a little difficult, because parsing logic and pattern strings
are tightly coupled.
- overlook (or mis-detect) the end of inline script in doctest
style
8a8dd6e4a97a fixed a part of this issue, but not enough.
- it overlooks inline script in doctest style at the end of file
(and ignores invalid un-closed heredoc at the end of file, too)
- it overlooks code fragment in styles below
- "python <<EOF" (heredoc should be "cat > file <<EOF" style)
- "cat > foobar.py << ANYLIMIT" (limit mark should be "EOF")
- "cat << EOF > foobar.py" (filename should be placed before limit mark)
- "cat >> foobar.py << EOF" (appending is ignored)
- it is not extensible for other than python code fragments
(e.g. shell script, hgrc file, and so on)
This new module can detect python code fragments in styles below:
- inline script in doctest style (starting by " >>> " line)
- python invocation with heredoc script ("python <<EOF")
- python script in heredoc style (redirected into ".py" file)
As an example of extensibility of new module, this patch also contains
implementation to pick up code fragment below. This will be useful to
add additional restriction for them, for example.
- shell script in heredoc style (redirected into ".sh" file)
- hgrc configuration in heredoc style (redirected into hgrc or $HGRCPATH)
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Thu, 23 Aug 2018 12:24:41 +0900] rev 40093
tests: use environment variable indirectly
Using environment variable directly in heredoc python code will cause
syntax error at checking module importation by import-checker.py
strictly, because "$varname" is invalid in Python syntax. "$varname"
becomes valid after environment variable substitution by shell at
writing text into file.
Current import-checker.py overlooks code fragment changed in this
patch, because of a restriction below for a line starting code
fragment.
- filename must be specified before limit mark
NG: cat <<EOF > FILE.py
OK: cat > FILE.py <<EOF
import-checker.py itself is fixed in subsequent patch.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Thu, 23 Aug 2018 12:20:41 +0900] rev 40092
tests: import multiple modules separately
Current import-checker.py overlooks code fragment changed in this
patch, because of restrictions below for a line starting code
fragment.
- filename must be specified before limit mark
NG: cat <<EOF > FILE.py
OK: cat > FILE.py <<EOF
- limit mark must not be quoted
NG: cat > FILE.py <<'EOF'
OK: cat > FILE.py <<EOF
import-checker.py itself is fixed in subsequent patch.
Augie Fackler <augie@google.com> [Mon, 08 Oct 2018 11:50:25 -0400] rev 40091
fuzz: allow manifest fuzzer to detect leaks
Huzzah!
Differential Revision: https://phab.mercurial-scm.org/D4907
Augie Fackler <augie@google.com> [Mon, 08 Oct 2018 11:47:25 -0400] rev 40090
fuzzers: init Python in LLVMFuzzerInitialize and intentionally leak it
This sidesteps leaks (or "leaks", I'm not sure) in CPython, and lets
our fuzzer work.
Differential Revision: https://phab.mercurial-scm.org/D4906
Augie Fackler <augie@google.com> [Mon, 08 Oct 2018 11:42:06 -0400] rev 40089
revlog: if the module is initialized more than once, don't leak nullentry
Caught (annoyingly) by the manifest fuzzer.
Differential Revision: https://phab.mercurial-scm.org/D4905
Martin von Zweigbergk <martinvonz@google.com> [Mon, 01 Oct 2018 14:31:15 -0700] rev 40088
narrow: move remaining narrow-limited dirstate walks to core
In most places we now filter at a higher level (the context object),
but there are few places that relied on the dirstate walk to be
filtered by the narrowspec. The important cases are those used by `hg
add` and `hg addremove`. This patch updates them to pass in a matcher
instead of relying on the dirstate to do the filtering. The dirstate
filtering is also dropped in narrowdirstate.py.
Not always filtering in the dirstate should be useful for a future `hg
status --include-outside-narrow` option.
These places now end up doing an unrestricted dirstate walk after this
patch:
* debugfileset
* perfwalk
* sparse (but restricted to sparse config)
* largefiles
I'll let anyone who cares about these cases adapt them to work with
narrow if necessary.
Differential Revision: https://phab.mercurial-scm.org/D4901
Martin von Zweigbergk <martinvonz@google.com> [Mon, 01 Oct 2018 10:11:00 -0700] rev 40087
narrow: allow repo.narrowmatch(match) to include exact matches from "match"
Differential Revision: https://phab.mercurial-scm.org/D4900
Martin von Zweigbergk <martinvonz@google.com> [Fri, 28 Sep 2018 22:35:05 -0700] rev 40086
narrow: filter files by narrowspec in ctx.matches()
This has no effect yet because 1) for committed changes, ctx.matches()
just calls ctx.walk(), which we updated in the previous patch, and 2)
for the working copy, the filtering is also done in the overridden
dirstate.walk() in narrowdirstate.
Differential Revision: https://phab.mercurial-scm.org/D4899
Martin von Zweigbergk <martinvonz@google.com> [Fri, 28 Sep 2018 17:09:15 -0700] rev 40085
narrow: only walk files within narrowspec also for committed revisions
Narrow has been walking only paths matching the narrowspec when
walking the working copy. We have not done the same filtering when
walking committed revisions (e.g. "hg files -r "), which seems a
little odd. Let's make it consistent.
Differential Revision: https://phab.mercurial-scm.org/D4898
Martin von Zweigbergk <martinvonz@google.com> [Thu, 27 Sep 2018 23:01:26 -0700] rev 40084
status: intersect matcher with narrow matcher instead of filtering afterwards
I seem to have done a very naive move of the code from the narrow
extension into core in e411774a2e0f (narrow: move status-filtering to
core and to ctx, 2018-08-02). It seems obvious that a better way is to
intersect the matchers.
Note that this means that when requesting status for the working
directory in a narrow repo, we now pass the narrow matcher (possibly
intersected with a user-provided matcher) into _buildstatus() and then
into dirstate.status() and dirstate.walk(), which will the intersect
it again with the narrow matcher. That's functionally fine, but
wasteful. I hope to later remove the dirstate wrapping that adds the
second layer of matcher intersection.
Differential Revision: https://phab.mercurial-scm.org/D4897
Martin von Zweigbergk <martinvonz@google.com> [Fri, 28 Sep 2018 12:29:21 -0700] rev 40083
localrepo: allow narrowmatch() to accept matcher to intersect with
It's pretty common that we need to intersect a matcher we already have
(usually from the user) with the narrow matcher. Let's make
repo.narrowmatch() take an optional matcher to intersect with.
Differential Revision: https://phab.mercurial-scm.org/D4896
Zharaskhan Aman <aman.zharaskhan@gmail.com> [Fri, 05 Oct 2018 01:55:51 +0300] rev 40082
obsolete: fix ValueError when stored note contains ':' char (issue5783)
The newer version of `amend -n 'Some some'` accepts containing ':' char.
The information contained in this note 'Testing::Obstore' gives ValueError,
because we are trying to store more than 2 values in key and value.
Differential Revision: https://phab.mercurial-scm.org/D4883
Differential Revision: https://phab.mercurial-scm.org/D4882
Martin von Zweigbergk <martinvonz@google.com> [Fri, 05 Oct 2018 16:06:51 -0700] rev 40081
narrow: update TODO.rst now that we share format with sparse
The narrowspec format was unified with the sparse format in
f64ebe7d2259 (narrowspec: use sparse.parseconfig() to parse narrowspec
file (BC), 2018-08-03).
Differential Revision: https://phab.mercurial-scm.org/D4904
Martin von Zweigbergk <martinvonz@google.com> [Fri, 05 Oct 2018 16:04:25 -0700] rev 40080
narrow: update TODO.rst now that we filter status in ctx
The comment referred to was addressed in e411774a2e0f (narrow: move
status-filtering to core and to ctx, 2018-08-02). I also think
84092edd5c88 (narrow: drop unnecessary overrides of patch, 2018-09-28)
suggests that it was the right thing to do.
Differential Revision: https://phab.mercurial-scm.org/D4903
Martin von Zweigbergk <martinvonz@google.com> [Fri, 05 Oct 2018 16:01:21 -0700] rev 40079
narrow: update TODO.rst now that the narrowspec is in .hg/store
We no longer have the unfortunate wrappostshare() and
unsharenarrowspec() since 576eef1ab43d (narrow: move .hg/narrowspec to
.hg/store/narrowspec (BC), 2018-08-02).
Differential Revision: https://phab.mercurial-scm.org/D4902
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 23:28:14 +0300] rev 40078
py3: add 8 new passing tests to whitelist found by buildbot
We are getting close!
Differential Revision: https://phab.mercurial-scm.org/D4893
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 23:31:51 +0300] rev 40077
py3: use '%f' for floats instead of '%s'
I remember Yuya saying we need to use bytestr() or '%r' because '%s' was clever.
Not sure it applies to this or not.
Differential Revision: https://phab.mercurial-scm.org/D4894
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 22:52:24 +0300] rev 40076
narrow: move adding of narrow server capabilities to core
We use the experimental.narrow config option introduced in one of the previous
patch and move the functionality of adding narrow server capabilities to core.
Differential Revision: https://phab.mercurial-scm.org/D4891
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 22:31:12 +0300] rev 40075
wireprotoserver: move narrow capabilities to wireprototypes.py
This is done because wireprotoserver import wireprotov1server, so you cannot
import wireprotoserver in wireprotov1server to use the capabilities constants.
Differential Revision: https://phab.mercurial-scm.org/D4890
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 22:19:19 +0300] rev 40074
narrow: introduce a config option to check if narrow is enabled or not
This patch introduces a new config option experimental.narrow which is set to
False by default and set to True by the narrow extension.
While moving narrow related logic into core, we need to know at places whether
narrow extension is enabled or not. Checking the list of extension enabled is
one solution but once narrow is inbuilt, we will definitely want a config option
to check whether narrow is turned on or not.
So this patch introduces a config option, which will evolve to the main point to
turn narrow capability on and off once all the narrow is in core.
Differential Revision: https://phab.mercurial-scm.org/D4889
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 05 Oct 2018 20:24:07 +0300] rev 40073
narrow: move the code to generate a widening bundle2 to core
This is a part of moving more narrow related bits to core.
Differential Revision: https://phab.mercurial-scm.org/D4888
Pulkit Goyal <pulkit@yandex-team.ru> [Tue, 02 Oct 2018 17:09:56 +0300] rev 40072
narrow: start returning bundle2 from widen_bundle()
Differential Revision: https://phab.mercurial-scm.org/D4838
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 28 Sep 2018 23:42:31 +0300] rev 40071
narrow: the first version of narrow_widen wireprotocol command
This patch introduces a wireprotocol command narrow_widen() which will be used
to widen a narrow copy using `hg tracked` command provided by narrow extension.
The wireprotocol command takes the old and new includes and excludes, common
heads, changegroup version, known revs, and a boolean ellipses and generates a
bundle2 of the required data and send it. The clients receives the bundle2
and applies that.
A bundle2 instead of changegroup because in future we might want to add more
things to send while widening. Thanks for martinvonz for the suggestion.
I am not sure whether we need changegroup version as an argument to the command
as I *think* narrow needs changegroup3 already.
The tests shows that we don't exchange phase data now while widening which is
nice. Also we don't check for pushkeys, rbc-cache, bookmarks etc.
This does not support ellipses cases for now but will be supported in future
patches. Since we send bundle2, it won't be hard to plug the ellipses logic in
here.
The existing code for widening a non-ellipses case is also dropped in this
patch.
Differential Revision: https://phab.mercurial-scm.org/D4813
Yuya Nishihara <yuya@tcha.org> [Fri, 05 Oct 2018 21:43:57 +0900] rev 40070
remotenames: abort if literal revset pattern matches nothing
This is the convention of the other namespace revsets such as tag(). Let's
make the remote variants do the same.
Yuya Nishihara <yuya@tcha.org> [Fri, 05 Oct 2018 21:39:41 +0900] rev 40069
remotenames: remove unneeded sorted() from revset implementation
The order is constrained by the subset.
Yuya Nishihara <yuya@tcha.org> [Fri, 05 Oct 2018 21:36:48 +0900] rev 40068
remotenames: don't call a set of nodes as "revs"
Yuya Nishihara <yuya@tcha.org> [Fri, 05 Oct 2018 21:30:55 +0900] rev 40067
remotenames: use util.always instead of handcrafted lambda
Yuya Nishihara <yuya@tcha.org> [Fri, 05 Oct 2018 21:29:21 +0900] rev 40066
remotenames: inline _parseargs() into _revsetutil()
The _parseargs() function gets quite simple, and the 0/1 loop can be rewritten
as "if".
Martin von Zweigbergk <martinvonz@google.com> [Thu, 04 Oct 2018 16:27:40 -0700] rev 40065
repo: create changectx in a single place in localrepo.__getitem__
Differential Revision: https://phab.mercurial-scm.org/D4885
Martin von Zweigbergk <martinvonz@google.com> [Thu, 04 Oct 2018 16:06:36 -0700] rev 40064
repo: remove the last few "pass" statements in localrepo.__getitem__
In case of IndexError or LookupError, we used "pass" statements and
fell through to the end of localrepo.__getitem__. I find the pass
statements easy to miss. Consistently raising and catching exceptions
seems easier to follow.
Oh -- and I didn't plan this before I wrote the above -- that probably
also lets us reuse the "return context.changectx(self, rev, node)" in
a later patch.
Differential Revision: https://phab.mercurial-scm.org/D4884
Martin von Zweigbergk <martinvonz@google.com> [Thu, 04 Oct 2018 10:38:55 -0700] rev 40063
filectx: correct docstring about "changeid"
The changeid argument must be a revnum (basefile.rev() is defined as
"return self._changeid"), so fix the lie in the docstring. It seems to
have been incorrect for at least 10 years (I didn't check further
back).
Differential Revision: https://phab.mercurial-scm.org/D4881
Martin von Zweigbergk <martinvonz@google.com> [Thu, 04 Oct 2018 10:30:05 -0700] rev 40062
context: drop incorrect and superfluous docstring
It's been incorrect at least since 8b86acc7aa64 (context: drop support
for looking up context by ambiguous changeid (API), 2018-04-28).
Differential Revision: https://phab.mercurial-scm.org/D4880
Augie Fackler <raf@durin42.com> [Thu, 04 Oct 2018 21:35:12 -0400] rev 40061
remotenames: follow-up on D3639 to make revset funcs take only one arg
Per the review discussion on D3639, we want this to just take one
argument. That ended up simplifying the code, so I'm sharing this as a
follow-up to that revision rather than editing in-flight.
Pulkit Goyal <7895pulkit@gmail.com> [Thu, 12 Jul 2018 03:12:09 +0530] rev 40060
remotenames: add names argument to remotenames revset
This patch adds names argument to the revsets provided by the remotenames
extension. The revsets are remotenames(), remotebranches() and
remotebookmarks(). names can be a single names, list of names or can be empty too
which means it's an optional argument.
If names is/are passed, changesets which have those remotenames will be
returned.
If names are not passed, changesets from all the remotenames are shown.
Passing an invalid remotename does not throw error.
The name argument also supports pattern matching.
Tests are added for the argument in tests/test-logexchange.t
Differential Revision: https://phab.mercurial-scm.org/D3639
Boris Feld <boris.feld@octobus.net> [Fri, 07 Sep 2018 11:43:48 -0400] rev 40059
copies: add time information to the debug information
Boris Feld <boris.feld@octobus.net> [Fri, 07 Sep 2018 11:16:06 -0400] rev 40058
copies: add a devel debug mode to trace what copy tracing does
Mercurial can spend a lot of time finding renames between two commits. Having
more information about that process help to understand what makes it slow in
an individual instance. (eg: many files vs 1 file, etc...)
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 17:34:34 -0700] rev 40057
revlog: rewrite censoring logic
I was able to corrupt a revlog relatively easily with the existing
censoring code. The underlying problem is that the existing code
doesn't fully take delta chains into account. When copying revisions
that occur after the censored revision, the delta base can refer
to a censored revision. Then at read time, things blow up due to the
revision data not being a compressed delta.
This commit rewrites the revlog censoring code to take a higher-level
approach. We now create a new revlog instance pointing at temp files.
We iterate through each revision in the source revlog and insert
those revisions into the new revlog, replacing the censored revision's
data along the way.
The new implementation isn't as efficient as the old one. This is
because it will fully engage delta computation on insertion. But I
don't think it matters.
The new implementation is a bit hacky because it attempts to reload
the revlog instance with a new revlog index/data file. This is fragile.
But this is needed because the index (which could be backed by C) would
have a cached copy of the old, possibly changed data and that could
lead to problems accessing index or revision data later.
One benefit of the new approach is that we integrate with the
transaction. The old revlog is backed up and if the transaction is
rolled back, the original revlog is restored.
As part of this, we had to teach the transaction about the store
vfs. I'm not super keen about this. But this was the easiest way
to hook things up to the transaction. We /could/ just ignore the
transaction like we were doing before. But any file mutation should
be governed by transaction semantics, including undo during rollback.
Differential Revision: https://phab.mercurial-scm.org/D4869
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 17:28:54 -0700] rev 40056
revlog: move loading of index data into own method
This will allow us to "reload" a revlog instance from a rewritten
index file, which will be used in a subsequent commit.
Differential Revision: https://phab.mercurial-scm.org/D4868
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 10:57:35 -0700] rev 40055
revlog: clear revision cache on hash verification failure
The revision cache is populated after raw revision fulltext is
retrieved but before hash verification. If hash verification
fails, the revision cache will be populated and subsequent
operations to retrieve the invalid fulltext may return the cached
fulltext instead of raising.
This commit changes hash verification so it will invalidate the
revision cache if the cached node fails hash verification. The
side-effect is that subsequent operations to request the revision
text - even the raw revision text - will always fail.
The new behavior is consistent and is definitely less wrong. There
is an open question of whether revision(raw=True) should validate
hashes. But I'm going to punt on this problem. We can always change
behavior later. And to be honest, I'm not sure we should expose
raw=True on the storage interface at all. Another day...
Differential Revision: https://phab.mercurial-scm.org/D4867
Augie Fackler <augie@google.com> [Thu, 06 Sep 2018 02:36:25 -0400] rev 40054
fuzz: new fuzzer for cext/manifest.c
This is a bit messy, because lazymanifest is tightly coupled to the
cpython API for performance reasons. As a result, we have to build a
whole Python without pymalloc (so ASAN can help us out) and link
against that. Then we have to use an embedded Python interpreter. We
could manually drive the lazymanifest in C from that point, but
experimentally just using PyEval_EvalCode isn't really any slower so
we may as well do that and write the innermost guts of the fuzzer in
Python.
Leak detection is currently disabled for this fuzzer because there are
a few global-lifetime things in our extensions that we more or less
intentionally leak and I didn't want to take the detour to work around
that for now.
This should not be pushed to our repo until
https://github.com/google/oss-fuzz/pull/1853 is merged, as this
depends on having the Python tarball around.
Differential Revision: https://phab.mercurial-scm.org/D4879
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 10:32:21 -0700] rev 40053
revlog: rename _cache to _revisioncache
"cache" is generic and revlog instances have multiple caches. Let's
be descriptive about what this is a cache for.
Differential Revision: https://phab.mercurial-scm.org/D4866
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 10:56:48 -0700] rev 40052
testing: add file storage integration for bad hashes and censoring
In order to implement these tests, we need a backdoor to write data
into storage backends while bypassing normal checks. We invent a
callable to do that.
As part of writing the tests, I found a bug with censorrevision()
pretty quickly! After calling censorrevision(), attempting to
access revision data for an affected node raises a cryptic error
related to malformed compression. This appears to be due to the
revlog not adjusting delta chains as part of censoring.
I also found a bug with regards to hash verification and revision
fulltext caching. Essentially, we cache the fulltext before hash
verification. If we look up the fulltext after a failed hash
verification, we don't get a hash verification exception. Furthermore,
the behavior of revision(raw=True) can be inconsistent depending on
the order of operations.
I'll be fixing both these bugs in subsequent commits.
Differential Revision: https://phab.mercurial-scm.org/D4865
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 10:03:41 -0700] rev 40051
testing: add file storage tests for getstrippoint() and strip()
Differential Revision: https://phab.mercurial-scm.org/D4864
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 10:04:04 -0700] rev 40050
wireprotov2: always advertise raw repo requirements
I'm pretty sure my original thinking behind making it conditional
on stream clone support was that the behavior mirrored wire protocol
version 1.
I don't see a compelling reason for us to not advertise the server's
storage requirements. The proper way to advertise stream clone support
in wireprotov2 would be to not advertise the command(s) required to
perform stream clone or to advertise a separate capability denoting
stream clone support.
Stream clone isn't yet implemented on wireprotov2, so we can cross
this bridge later.
Differential Revision: https://phab.mercurial-scm.org/D4863
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 09:48:22 -0700] rev 40049
tests: don't be as verbose in wireprotov2 tests
I don't think that printing low-level I/O and frames is beneficial to
testing command-level functionality. Protocol-level testing, yes. But
command-level functionality shouldn't care about low-level details in
most cases. This output makes tests more verbose and harder to read.
It also makes them harder to maintain, as you need to glob over various
dynamic width fields.
Let's remove these low-level details from many of the wireprotov2
tests.
Differential Revision: https://phab.mercurial-scm.org/D4861
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 12:57:01 -0700] rev 40048
repository: define and use revision flag constants
Revlogs have a per-revision 2 byte field holding integer flags that
define how revision data should be interpreted. For historical reasons,
these integer values are sent verbatim on the wire protocol as part of
changegroup data.
From a semantic standpoint, the flags that go out over the wire are
different from the flags stored internally by revlogs. Failure to
establish this semantic distinction creates unwanted strong coupling
between revlog's internals and the wire protocol.
This commit establishes new constants on the repository module that
define the revision flags used by the wire protocol (and by some
internal storage APIs, sadly). The changegroups internals documentation
has been updated to document them explicitly. Various references
throughout the repo now use the repository constants instead of the
revlog constants. This is done to make it clear that we're operating
on generic revision data and this isn't tied to revlogs.
Differential Revision: https://phab.mercurial-scm.org/D4860
Boris Feld <boris.feld@octobus.net> [Thu, 04 Oct 2018 01:22:25 +0200] rev 40047
context: reverse conditional branch order in introrev
Positive logic will be simpler to follow. It will help to clarify coming
refactoring.
Boris Feld <boris.feld@octobus.net> [Thu, 04 Oct 2018 08:40:01 +0200] rev 40046
context: drop a redundant fast path in introrev
Now that _adjustlinkrev fast path this case itself, we no longer need an extra
conditional.
A nice side effect is that we are no longer calling `self.rev()`. In case
where `_descendantrev` is set, calling `self.rev` will trigger a potentially
expensive `_adjustlinkrev` call. So blindly calling `self.rev()` to avoid
another `_adjustlinkrev` call can be counterproductive.
Note that `_descendantrev` is currently never taken into account in `introrev`
so far which is wrong. We'll fix that in changeset later in this series.
Boris Feld <boris.feld@octobus.net> [Thu, 04 Oct 2018 08:34:59 +0200] rev 40045
context: fast path linkrev adjustement in trivial case
If the search starts from the linkrev, there is nothing to adjust.
Cédric Krier <ced@b2ck.com> [Thu, 04 Oct 2018 11:28:48 +0200] rev 40044
url: allow to configure timeout on http connection
By default, httplib.HTTPConnection opens connection with no timeout.
If the server is hanging, Mercurial will wait indefinitely. This may be an
issue for automated scripts.
Differential Revision: https://phab.mercurial-scm.org/D4878
Boris Feld <boris.feld@octobus.net> [Wed, 26 Sep 2018 23:50:14 +0200] rev 40043
obsolete: explicitly track folds inside the markers
We now record information to be able to recognize "fold" event from
obsolescence markers. To do so, we track the following pieces of information:
a) a fold ID. Unique to that fold (per successor),
b) the number of predecessors,
c) the index of the predecessor in that fold.
We will now be able to create an algorithm able to find "predecessorssets".
We now store this data in the generic "metadata" field of the markers.
Updating the format to have a more compact storage for this would be useful.
This way of tracking a fold through multiple markers could be applied to split
too. This would have two advantages:
1) We get a simpler format, since number of successors is limited to [0-1].
2) We can better deal with situations where only some of the split successors
are pushed to a remote repository.
We should look into the relevance of such a change before updating the on-disk
format.
note: unlike splits, folds do not have to deal with cases where only some of
the markers have been synchronized. As they all share the same successor
changesets, they are all relevant to the same nodes.
Boris Feld <boris.feld@octobus.net> [Wed, 03 Oct 2018 11:59:47 +0200] rev 40042
cleanupnodes: update comment to drop mention of filtering
Since changeset 1857f50a9643 drop the filtering, we should not longer mention it
in code comment.
spectral <spectral@google.com> [Wed, 26 Sep 2018 18:04:46 -0700] rev 40041
treemanifests: remove _loadalllazy when doing copies
'before' here is https://phab.mercurial-scm.org/D4845 (not the committed/rebased
version)
diff --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.329 s +- 0.011 s | 1.320 s +- 0.010 s | 99.3%
m-u | | x | 1.316 s +- 0.005 s | 1.334 s +- 0.018 s | 101.4%
m-u | x | | 1.330 s +- 0.021 s | 1.322 s +- 0.005 s | 99.4%
m-u | x | x | 87.2 ms +- 0.7 ms | 86.9 ms +- 1.5 ms | 99.7%
l-d-r | | | 203.3 ms +- 7.8 ms | 199.4 ms +- 1.8 ms | 98.1%
l-d-r | | x | 204.6 ms +- 2.8 ms | 201.7 ms +- 2.1 ms | 98.6%
l-d-r | x | | 90.5 ms +- 11.0 ms | 86.2 ms +- 1.0 ms | 95.2%
l-d-r | x | x | 66.3 ms +- 2.0 ms | 66.4 ms +- 0.9 ms | 100.2%
diff -c . --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 239.4 ms +- 2.0 ms | 241.7 ms +- 4.6 ms | 101.0%
m-u | | x | 128.9 ms +- 1.9 ms | 130.9 ms +- 7.7 ms | 101.6%
m-u | x | | 241.1 ms +- 1.6 ms | 240.1 ms +- 1.4 ms | 99.6%
m-u | x | x | 133.4 ms +- 1.5 ms | 133.4 ms +- 1.2 ms | 100.0%
l-d-r | | | 84.3 ms +- 1.5 ms | 83.5 ms +- 1.0 ms | 99.1%
l-d-r | | x | 200.9 ms +- 6.3 ms | 203.0 ms +- 4.4 ms | 101.0%
l-d-r | x | | 108.1 ms +- 1.4 ms | 108.7 ms +- 2.1 ms | 100.6%
l-d-r | x | x | 190.2 ms +- 4.8 ms | 191.6 ms +- 2.0 ms | 100.7%
rebase -r . --keep -d .^^:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 5.655 s +- 0.029 s | 5.640 s +- 0.036 s | 99.7%
m-u | | x | 5.813 s +- 0.038 s | 5.773 s +- 0.028 s | 99.3%
m-u | x | | 5.593 s +- 0.043 s | 5.589 s +- 0.028 s | 99.9%
m-u | x | x | 648.2 ms +- 19.2 ms | 637.3 ms +- 27.7 ms | 98.3%
l-d-r | | | 673.3 ms +- 8.0 ms | 673.2 ms +- 6.8 ms | 100.0%
l-d-r | | x | 6.583 s +- 0.030 s | 5.721 s +- 0.028 s | 86.9% <--
l-d-r | x | | 277.8 ms +- 6.7 ms | 276.0 ms +- 2.7 ms | 99.4%
l-d-r | x | x | 1.692 s +- 0.013 s | 720.9 ms +- 13.3 ms | 42.6% <--
status --change . --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 220.9 ms +- 1.6 ms | 219.9 ms +- 2.2 ms | 99.5%
m-u | | x | 109.2 ms +- 1.0 ms | 109.4 ms +- 0.8 ms | 100.2%
m-u | x | | 222.6 ms +- 1.7 ms | 221.4 ms +- 2.1 ms | 99.5%
m-u | x | x | 113.4 ms +- 0.5 ms | 113.1 ms +- 1.1 ms | 99.7%
l-d-r | | | 82.1 ms +- 1.7 ms | 82.1 ms +- 1.2 ms | 100.0%
l-d-r | | x | 199.8 ms +- 4.0 ms | 200.7 ms +- 3.6 ms | 100.5%
l-d-r | x | | 85.4 ms +- 1.5 ms | 85.2 ms +- 0.3 ms | 99.8%
l-d-r | x | x | 202.6 ms +- 4.4 ms | 208.0 ms +- 4.0 ms | 102.7%
status --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.941 s +- 0.014 s | 1.930 s +- 0.009 s | 99.4%
m-u | | x | 1.924 s +- 0.007 s | 1.950 s +- 0.010 s | 101.4%
m-u | x | | 1.959 s +- 0.085 s | 1.926 s +- 0.009 s | 98.3%
m-u | x | x | 96.2 ms +- 1.0 ms | 96.4 ms +- 0.7 ms | 100.2%
l-d-r | | | 604.4 ms +- 10.6 ms | 602.6 ms +- 7.1 ms | 99.7%
l-d-r | | x | 605.7 ms +- 4.1 ms | 607.4 ms +- 6.1 ms | 100.3%
l-d-r | x | | 182.4 ms +- 1.2 ms | 183.4 ms +- 1.2 ms | 100.5%
l-d-r | x | x | 150.8 ms +- 2.0 ms | 150.6 ms +- 1.0 ms | 99.9%
update $rev^; ~/src/hg/hg{hg}/hg update $rev:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 3.185 s +- 0.027 s | 3.181 s +- 0.017 s | 99.9%
m-u | | x | 3.028 s +- 0.021 s | 2.954 s +- 0.010 s | 97.6%
m-u | x | | 3.168 s +- 0.010 s | 3.175 s +- 0.023 s | 100.2%
m-u | x | x | 317.5 ms +- 3.5 ms | 313.2 ms +- 2.9 ms | 98.6%
l-d-r | | | 456.2 ms +- 10.6 ms | 454.4 ms +- 5.8 ms | 99.6%
l-d-r | | x | 9.236 s +- 0.063 s | 757.9 ms +- 9.2 ms | 8.2% <--
l-d-r | x | | 257.6 ms +- 2.3 ms | 261.2 ms +- 1.7 ms | 101.4%
l-d-r | x | x | 1.614 s +- 0.013 s | 478.0 ms +- 14.3 ms | 29.6% <--
Differential Revision: https://phab.mercurial-scm.org/D4875
spectral <spectral@google.com> [Tue, 25 Sep 2018 19:25:41 -0700] rev 40040
treemanifests: store whether a lazydirs entry needs copied after materializing
Due to the way that things like manifestlog.get caches its values, without
making a copy (if necessary) after calling readsubtree(), we might end up
adjusting the state of the same object on different contexts, breaking things
like dirty state tracking (and probably other things).
Differential Revision: https://phab.mercurial-scm.org/D4874
spectral <spectral@google.com> [Tue, 02 Oct 2018 18:55:07 -0700] rev 40039
treemanifests: extract _loaddifflazy from _diff, use in _filesnotin
Differential Revision: https://phab.mercurial-scm.org/D4873
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Wed, 03 Oct 2018 18:07:49 -0400] rev 40038
identify: show remote bookmarks in `hg id url -Tjson -B`
I didn't display bookmarks when `default and not ui.quiet`: it seems
strange for templates to depend on --id or -q, and it would take more
code for `hg id url -T {node}` to not request remote bookmarks.
An alternative I thought of was providing lazy data to the formatter,
`fm.data(bookmarks=lambda: fm.formatlist(getbms(), name='bookmark'))`.
The plainformatter would naturally not compute it, the
templateformatter would compute only what it needs, and the other ones
would compute everything, but that's not supported (or I don't see
how), so I abandoned this idea.
Differential Revision: https://phab.mercurial-scm.org/D4872
Augie Fackler <augie@google.com> [Wed, 03 Oct 2018 16:03:16 -0400] rev 40037
showstack: also handle SIGALRM
This is looking *very* handy when debugging mysterious hangs in a
test: you can wrap a hanging invocation in
`perl -e 'alarm shift @ARGV; exec @ARGV' 1`
for example, a hanging `hg pull` becomes
`perl -e 'alarm shift @ARGV; exec @ARGV' 1 hg pull`
where the `1` is the timeout in seconds before the process will be hit
with SIGALRM. After making that edit to the test file, you can then
use --extra-config-opt on run-tests.py to globaly enable showstack
during the test run, so you'll get full stack traces as you force your
hg to exit.
I wonder (but only a little, not enough to take action just yet) if we
should wire up some scaffolding in run-tests itself to automatically
wrap all commands in alarm(3) somehow to avoid hangs in the future?
Differential Revision: https://phab.mercurial-scm.org/D4870
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 13:54:31 -0700] rev 40036
exchangev2: add progress bar around manifest scanning
This can take a long time on large repositories. Let's add a progress
bar so we don't have long periods where it isn't obvious what is
going on.
Differential Revision: https://phab.mercurial-scm.org/D4859
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 13:17:38 -0700] rev 40035
httppeer: report http statistics
Now that keepalive.py records HTTP request count and the
number of bytes sent and received as part of performing those
requests, we can easily print a report on the activity when
closing a peer instance!
Exact byte counts are globbed in tests because they are influenced
by non-deterministic things, such as hostnames and port numbers.
Plus, the exact byte count isn't too important anyway.
I feel obliged to note that printing the byte count could have
security implications. e.g. if sending a password via HTTP basic
auth, the length of that password will influence the byte count
and the reporting of the byte count could be a side-channel leak
of the password length. I /think/ this is beyond our threshold
for concern. But if we think it poses a problem, we can teach the
byte count logging code to e.g. ignore sensitive HTTP request
headers. We could also consider not reporting the byte count of
request headers altogether. But since the wire protocol uses HTTP
headers for sending command arguments, it is kind of important to
report their size.
Differential Revision: https://phab.mercurial-scm.org/D4858
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:30:32 -0700] rev 40034
keepalive: track number of bytes received from an HTTP response
We also bubble the byte count up to the HTTPConnection instance and its
parent opener at read time. Unlike sending, there isn't a clear
"end of response" signal we can intercept to defer updating the
accounting.
Differential Revision: https://phab.mercurial-scm.org/D4857
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:02:54 -0700] rev 40033
keepalive: track request count and bytes sent
I want wire protocol interactions to report the number of
requests made and bytes transferred.
This commit teaches the very low-level custom HTTPConnection class
to track the number of bytes sent to the socket. This may vary from
the number of bytes that go on the wire due to e.g. TLS. That's OK.
KeepAliveHandler is taught to track the total number of requests
and total number of bytes sent across all requests.
Differential Revision: https://phab.mercurial-scm.org/D4856
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:06:36 -0700] rev 40032
url: have httpsconnection inherit from our custom HTTPConnection
This will ensure that any customizations we perform to HTTPConnection
will be available to httpsconnection.
Differential Revision: https://phab.mercurial-scm.org/D4855
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 09:43:01 -0700] rev 40031
cborutil: change buffering strategy
Profiling revealed that we were spending a lot of time on the
line that was concatenating the old buffer with the incoming data
when attempting to decode long byte strings, such as manifest
revisions.
Essentially, we were feeding N chunks of size len(X) << len(Y) into
decode() and continuously allocating a new, larger buffer to hold
the undecoded input. This created substantial memory churn and
slowed down execution.
Changing the code to aggregate pending chunks in a list until we
have enough data to fully decode the next atom makes things much
more efficient.
I don't have exact data, but I recall the old code spending >1s
on manifest fulltexts from the mozilla-unified repo. The new code
doesn't significantly appear in profile output.
Differential Revision: https://phab.mercurial-scm.org/D4854
Martin von Zweigbergk <martinvonz@google.com> [Wed, 03 Oct 2018 10:27:44 -0700] rev 40030
cleanup: some Yoda conditions, this patch removes
It seems the factor 20 is less than the frequency of " < \d" compared
to " \d > ".
Differential Revision: https://phab.mercurial-scm.org/D4862
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:43:54 -0700] rev 40029
streamclone: don't support stream clone unless repo feature present
This change means custom repository types must opt in to enabling
stream clone. This seems reasonable, as stream clones are a very
low-level feature that has historically assumed the use of revlogs
and the layout of .hg/ that they entail.
Differential Revision: https://phab.mercurial-scm.org/D4853
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:40:39 -0700] rev 40028
localrepo: add repository feature when repo can be stream cloned
Right now, the wire protocol server assumes all repository objects can
be stream cloned (unless the stream clone feature is disabled via
config option).
But not all storage backends or repository objects may support stream
clone.
This commit defines a repository feature denoting whether stream clone
is supported. The feature is defined for revlog-based repositories,
which should currently be "all repositories."
Differential Revision: https://phab.mercurial-scm.org/D4852
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:08:08 -0700] rev 40027
wireprotov2: client support for following content redirects
And with the server actually sending content redirects, it is finally
time to implement client support for following them!
When a redirect response is seen, we wait until all data for that
request has been received (it should be nearly immediate since no
data is expected to follow the redirect message). Then we use
a URL opener to make a request. We stuff that response into the
client handler and construct a new response object to track it.
When readdata() is called for servicing requests, we attempt to
read data from the first redirected response. During data reading,
data is processed similarly to as if it came from a frame payload.
The existing test for the functionality demonstrates the client
transparently following the redirect and obtaining the command
response data from an alternate URL!
There is still plenty of work to do here, including shoring up
testing. I'm not convinced things will work in the presence of
multiple redirect responses. And we don't yet implement support
for integrity verification or configuring server certificates
to validate the connection. But it's a start. And it should enable
us to start experimenting with "real" caches.
Differential Revision: https://phab.mercurial-scm.org/D4778
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:07:55 -0700] rev 40026
wireprotov2: server support for sending content redirects
A "content redirect" can be sent in place of inline response content.
In terms of code, we model a content redirect as a special type of
response object holding the attributes describing that redirect.
Sending a content redirect thus becomes as simple as the object
emission layer sending an instance of that type. A cacher using
externally-addressable content storage could replace the outgoing
object stream with an object advertising its location.
The bulk of the code in this commit is teaching the output layer
which handles the object stream to recognize alternate location
objects. The rules are that if an alternate location object is
present, it must be the first and only object in the object stream.
Otherwise the server emits an error.
Differential Revision: https://phab.mercurial-scm.org/D4777
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:02:19 -0700] rev 40025
wireprotov2: client support for advertising redirect targets
With the server now able to emit a redirect target descriptor, we can
start to teach the client to recognize it.
This commit implements support for filtering the advertised
redirect targets against supported features and for advertising
compatible redirect targets as part of command requests. It also
adds the minimal boilerplate required to fail when a content
redirect is seen.
The server doesn't yet do anything with the advertised redirect
targets. And the client can't yet follow redirects if it did. But
at least we're putting bytes on the wire.
Differential Revision: https://phab.mercurial-scm.org/D4776
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:46:48 -0700] rev 40024
wireprotov2: advertise redirect targets in capabilities
This is pretty straightforward.
Redirect targets will require an extension to support. So we've added
a function that can be wrapped to define redirect targets.
To test this, we teach our simple cache test extension to read
redirect targets from a file. It's a bit hacky. But it gets the
job done.
Differential Revision: https://phab.mercurial-scm.org/D4775
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:02:06 -0700] rev 40023
wireprotov2: define semantics for content redirects
When I implemented the clonebundles feature and deployed it on
hg.mozilla.org using Amazon S3 as a content server, server-side CPU
and bandwidth usage dropped off a cliff and a ton of server scaling
headaches went away pretty much the instant clients with support for
clonebundles were rolled out to Firefox CI.
An obvious takeaway from that experience was that offloading server
load to scalable file servers - potentially backed by a CDN - is a
really good idea. Another takeaway was that Mercurial's wire protocol
wasn't in a good position to support data offload generally.
In wire protocol version 1, there isn't a mechanism in the protocol to
say "grab the data from over here instead." For HTTP, we could teach
the client to follow HTTP redirects. Or we could invent a media type
that encoded redirects inline. But for SSH, we were pretty much out of
luck because that protocol wasn't very flexible.
Wire protocol version 2 offers the opportunity to do something better.
The recent generic server-side content caching layer in the wire
protocol version 2 server demonstrated that it is possible to have
drop-in caching of responses to command requests. This by itself
adds tons of value and already makes the built-in server much more
scalable. But I don't want to stop there.
The existing server-side caching implementation has a big weakness:
it requires the server to send data to the client. This means that
the Mercurial server is potentially sending gigabytes of data to
thousands of clients. This is problematic because compared to scaling
static file servers, scaling dynamic servers is *hard*.
A solution to this is to "offload" serving of content to something
that isn't the Mercurial server. By offloading content serving, you
turn the Mercurial server from a centralized monolithic service to
a distributed mostly-indexing service. Assuming high rates of content
offload, this should drastically reduce the total work performed by
the Mercurial server, both in terms of CPU and data transfer. This
will make Mercurial servers vastly easier to scale.
This commit defines the semantics for "content redirects" in wire
protocol version 2. Essentially:
* Servers advertise the set of locations a response could be served
from.
* When making requests, clients advertise the set of locations they
are willing to fetch content from.
* Servers can then replace the inline response with one that says
"get the response from over here instead."
This feature - when fully implemented - will allow extending the
server-side caching layer to facilitate such things as integrating
your server-side cache with a scalable blob store (such as S3 or
a CDN) and offloading most data transfer to that external service.
This feature could also be leveraged for load balancing. e.g.
requests could come into a central server and then get redirected
to an available mirror depending on server availability or locality.
There's tons of potential :)
Differential Revision: https://phab.mercurial-scm.org/D4774
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:56 -0700] rev 40022
wireprotov2: support response caching
One of the things I've learned from managing VCS servers over the
years is that they are hard to scale. It is well known that some
companies have very beefy (read: very expensive) servers to power
their VCS needs. It is also known that specialized servers for
various VCS exist in order to facilitate scaling servers. (Mercurial
is in this boat.)
One of the aspects that make a VCS server hard to scale is the
high CPU load incurred by constant client clone/pull operations.
To alleviate the scaling pain associated with data retrieval
operations, I want to integrate caching into the Mercurial wire
protocol server as robustly as possible such that servers can
aggressively cache responses and defer as much server load as
possible.
This commit represents the initial implementation of a general
caching layer in wire protocol version 2.
We define a new interface and behavior for a wire protocol cacher
in repository.py. (This is probably where a reviewer should look
first to understand what is going on.)
The bulk of the added code is in wireprotov2server.py, where we
define how a command can opt in to being cached and integrate
caching into command dispatching.
From a very high-level:
* A command can declare itself as cacheable by providing a callable
that can be used to derive a cache key.
* At dispatch time, if a command is cacheable, we attempt to
construct a cacher and use it for serving the request and/or
caching the request.
* The dispatch layer handles the bulk of the business logic for
caching, making cachers mostly "dumb content stores."
* The mechanism for invalidating cached entries (one of the harder
parts about caching in general) is by varying the cache key when
state changes. As such, cachers don't need to be concerned with
cache invalidation.
Initially, we've hooked up support for caching "manifestdata" and
"filedata" commands. These are the simplest to cache, as they should
be immutable over time. Caching of commands related to changeset
data is a bit harder (because cache validation is impacted by
changes to bookmarks, phases, etc). This will be implemented later.
(Strictly speaking, censoring a file should invalidate caches. I've
added an inline TODO to track this edge case.)
To prove it works, this commit implements a test-only extension
providing in-memory caching backed by an lrucachedict. A new test
showing this extension behaving properly is added. FWIW, the
cacher is ~50 lines of code, demonstrating the relative ease with
which a cache can be added to a server.
While the test cacher is not suitable for production workloads, just
for kicks I performed a clone of just the changeset and manifest data
for the mozilla-unified repository. With a fully warmed cache (of just
the manifest data since changeset data is not cached), server-side
CPU usage dropped from ~73s to ~28s. That's pretty significant and
demonstrates the potential that response caching has on server
scalability!
Differential Revision: https://phab.mercurial-scm.org/D4773
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:27 -0700] rev 40021
wireprotov2: define type to represent pre-encoded object
An upcoming commit will introduce a caching layer to command
serving. This will require the ability to cache pre-encoded data.
This commit introduces a type to represent pre-encoded data and
teaches the output layer to not CBOR encode an instance of that
type.
Differential Revision: https://phab.mercurial-scm.org/D4772
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:53:49 -0700] rev 40020
wireprotov2: change name and behavior of readframe()
In the near future, we will want to support performing I/O from
other sources. Let's rename readframe() to readdata() and tweak
its logic to support future growth.
Differential Revision: https://phab.mercurial-scm.org/D4771
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 16:07:59 -0700] rev 40019
url: move _wraphttpresponse() from httpeer
This is a generally useful function. Having it on the url module
will make it more accessible outside of the HTTP peers.
Differential Revision: https://phab.mercurial-scm.org/D4770
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 14:54:15 -0700] rev 40018
debugcommands: print all CBOR objects
application/mercurial-cbor may contain multiple objects. Let's print
all of them.
Differential Revision: https://phab.mercurial-scm.org/D4769
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:48:19 +0900] rev 40017
help: document about "export" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:43:57 +0900] rev 40016
help: document about "config" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:34:18 +0900] rev 40015
help: document about "cat" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:38:49 +0900] rev 40014
help: document about "branches" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:32:18 +0900] rev 40013
help: document about "bookmarks" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:27:45 +0900] rev 40012
help: document about "annotate" template keywords
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:34:53 -0700] rev 40011
storageutil: pass nodes into emitrevisions()
The main emitrevisions() uses nodes. So it makes sense to use
nodes for the helper API.
Not bothering with API since this function was introduced a few
commits ago.
Differential Revision: https://phab.mercurial-scm.org/D4805
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:09 -0700] rev 40010
storageutil: make all callables optional
Not all storage backends may implement these callables. That's part
of the reason these methods aren't exposed on the storage interface.
Differential Revision: https://phab.mercurial-scm.org/D4804
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:22 -0700] rev 40009
storageutil: extract most of emitrevisions() to standalone function
As part of implementing a storage backend, I found myself copying
most of revlog.emitrevisions(). This code is highly nuanced and it
bothered me greatly to be copying such low-level code.
This commit extracts the bulk of revlog.emitrevisions() into a
new standalone function. In order to make the function generally
usable, all "self" function calls that aren't exposed on the
ifilestorage interface are passed in via callable arguments.
No meaningful behavior should have changed as part of the port.
Upcoming commits will tweak behavior to make the code more
generically usable.
Differential Revision: https://phab.mercurial-scm.org/D4803
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:51:17 -0700] rev 40008
storageutil: invert logic of file data comparison
IMO things make more sense when the function is explicitly a test
for file data equivalence.
Not bothering with API since the function was introduced by the
previous commit.
Differential Revision: https://phab.mercurial-scm.org/D4802
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:47:53 -0700] rev 40007
storageutil: extract filelog.cmp() to a standalone function
As part of implementing an alternate storage backend, I found myself
reimplementing this code.
With a little massaging, we can extract filelog.cmp() to a standalone
function.
As part of this, the call to revlog.cmp() was inlined (it is just a
2-line function).
I also tweaked some variable names to improve readability. I'll
further tweak names in a subsequent commit.
Differential Revision: https://phab.mercurial-scm.org/D4801
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:37:49 -0700] rev 40006
storageutil: extract copy metadata retrieval out of filelog
As part of implementing an alternate storage backend, I found myself
reinventing this wheel.
Let's create a utility function for doing the work.
Differential Revision: https://phab.mercurial-scm.org/D4800
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:29:05 -0700] rev 40005
storageutil: extract functionality for resolving strip revisions
I found myself having to copy this method as part of implementing a new
storage backend. With a little tweaking, we can extract it to a
standalone function so it can be reused easily.
We'll likely want to implement a better method for removing revisions
on the storage interface someday. But until then, let's use what we
have.
Differential Revision: https://phab.mercurial-scm.org/D4799
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:16:44 -0700] rev 40004
storageutil: consistently raise LookupError (API)
The interface docs say this is supposed to raise LookupError on
failure. But for invalid revision number input, it could raise
IndexError because ifileindex.node() is documented to raise
IndexError.
lookup() for files isn't used that much (pretty much just in
basefilectx in core AFAICT). And callers should already be catching
LookupError. So I don't anticipate that much fallout from this
change.
Differential Revision: https://phab.mercurial-scm.org/D4798
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:03:17 -0700] rev 40003
storageutil: implement file identifier resolution method (BC)
revlog.lookup() has a number of advanced features, including partial
node matching.
These advanced features aren't needed for file id lookup because file
identifiers are almost always from internal sources. (An exception to
this is hgweb, which appears to have some code paths that attempt
to resolve a user-supplied string to a node.)
This commit implements a new function for resolving file identifiers
to nodes. It behaves like revlog.lookup() but without the
advanced features.
Tests reveal behavior changes:
* Partial hex nodes are no longer resolved to nodes.
* "-1" now returns nullid instead of raising LookupError.
* "0" on an empty store now raises LookupError instead of returning
nullid.
I'm not sure why "-1" wasn't recognized before. But it seems reasonable
to accept it as a value since integer -1 resolves to nullid.
These changes all seem reasonable to me. And with the exception of
partial hex node matching, we may want to consider changing
revlog.lookup() as well.
Differential Revision: https://phab.mercurial-scm.org/D4797
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:00:20 -0700] rev 40002
testing: add more testing for ifileindex.lookup()
The tests demonstrate some... questionable behavior of revlog.lookup().
Differential Revision: https://phab.mercurial-scm.org/D4796
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 10:20:37 -0700] rev 40001
dagop: extract DAG local heads functionality from revlog
As part of implementing an alternate storage backend, I found myself
having to reimplement this function.
The DAG traversal logic is generic and can be factored out into a
standalone function.
We could probably combine this with headrevs(). But I'll leave that
for another time.
Differential Revision: https://phab.mercurial-scm.org/D4795
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 10:03:32 -0700] rev 40000
dagop: extract descendants() from revlog module
This method needs to be implemented in other storage backends and is
generic if we parameterize the functions for retrieving revision
numbers and parent revision numbers.
Let's extract it to dagop so backends don't need to reinvent the
wheel.
I'm not too worried about performance overhead of the extra function
call because I don't expect anyone to be calling descendants() in a
tight loop.
Differential Revision: https://phab.mercurial-scm.org/D4794
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 09:33:05 -0700] rev 39999
filelog: remove checkhash() (API)
It is unused.
While a caller may want to ask the store to validate the hash of some
provided text, since there are no in-core consumers of this method,
let's drop it for now.
Differential Revision: https://phab.mercurial-scm.org/D4793
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 09:28:38 -0700] rev 39998
filelog: remove revdiff() (API)
This proxy method is no longer used.
While it might be useful to query a storage backend for the delta
between any 2 revisions because the store could have a delta cached
and could compute it more efficiently than the caller calling
revision() twice in order to compute a delta, since nothing in core
is using this API now, I feel comfortable nuking it.
Differential Revision: https://phab.mercurial-scm.org/D4792
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 09:46:50 -0700] rev 39997
localrepo: define storage backend in creation options (API)
We add an experimental config option to define the storage backend
for new repositories. By default, it uses "revlogv1," which maps to
the current and only modern supported repository format.
We add a "backend" creation option to control which backend to
use. It defaults to using the value from the config option.
newreporequirements() will now barf if it sees a "backend" value
that isn't "revlogv1." This forces extensions to monkeypatch the
function to handle requirements derivation for custom backends.
In order for this to "just work," we factored out obtaining the
default creation options into its own function and made callers
of newreporequirements() responsible for passing in valid data.
Without this, direct callers of newreporequirements() wouldn't
get the proper results.
Differential Revision: https://phab.mercurial-scm.org/D4791
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 27 Sep 2018 09:23:17 -0700] rev 39996
wireprotov2: derive "required" from presence of default value
If we define a default value for all optional arguments, we no
longer need to explicitly declare whether the argument is required.
Instead, we can derive it from the presence of a default value
callable.
Differential Revision: https://phab.mercurial-scm.org/D4790
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 09:05:40 -0700] rev 39995
localrepo: capture repo interface factory functions as lambas
Previously, the list would hold a reference to the original function
and attempting to wrap the module function via extension wouldn't
result in the wrapped function being called, completely undermining
the expected behavior.
Differential Revision: https://phab.mercurial-scm.org/D4821
Joerg Sonnenberger <joerg@bec.de> [Mon, 14 May 2018 00:43:07 +0200] rev 39994
extensions: new closehead module for closing arbitrary heads
``hg close-head`` allows closing arbitrary heads. It is equivalent to
checking out the given revisions and commit an empty change with
``hg commit --close-branch``.
Differential Revision: https://phab.mercurial-scm.org/D3557
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 13:12:56 -0700] rev 39993
cext: use modern buffer protocol in mpatch_flist()
Differential Revision: https://phab.mercurial-scm.org/D4841
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 13:13:03 -0700] rev 39992
cext: use modern buffer protocol in patches()
PyObject_AsCharBuffer() is part of the "Old Buffer Protocol," which
has been deprecated for years. Let's port away from it.
PyBuffer_GetBuffer() must be paired with PyBuffer_Release(), hence the
added "goto cleanup" in a failure case.
We don't bump the extension version because the API has not changed.
Differential Revision: https://phab.mercurial-scm.org/D4840
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 01 Oct 2018 14:44:27 -0400] rev 39991
identify: when using -T, avoid unnecessary remote bookmarks query
Differential Revision: https://phab.mercurial-scm.org/D4839
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Mon, 01 Oct 2018 09:58:42 -0400] rev 39990
identify: only query remote bookmarks if needed
Instead of all the time when operating on a remote repo. This
perf regression was introduced in 15a79ac823e8, in 4.3.
This datahint method returns nothing for -Tjson, -Tpickle, -Tdebug
--config ui.formatdebug=true and --config ui.formatjson, so the
bookmarks won't show up. I don't know what these formatters are for.
plainformatter and templateformatter work properly, and the few other
uses of datahint should have the same kind of problem.
There is further weirdness where "--template '{node}'" is not enough
to avoid querying the bookmarks, you also need to pass --id or -q.
Differential Revision: https://phab.mercurial-scm.org/D4819
Pulkit Goyal <pulkit@yandex-team.ru> [Wed, 03 Oct 2018 13:59:19 +0300] rev 39989
py3: whitelist another passing tests caught by buildbot
I this indygreg recent patches made this test pass on Python 3. So thanks to
him!
Differential Revision: https://phab.mercurial-scm.org/D4848
Pulkit Goyal <pulkit@yandex-team.ru> [Wed, 03 Oct 2018 13:55:51 +0300] rev 39988
manifest: remove an unused variable caught by pyflakes
Uses of todel were removed in D4843 (19103e68a698). This patch will make
buildbots green again.
Differential Revision: https://phab.mercurial-scm.org/D4847
Matt Harbison <matt_harbison@yahoo.com> [Tue, 02 Oct 2018 22:40:01 -0400] rev 39987
setup: ignore message about disabling 3rd party extensions because of version
I started getting into a bind recently when switching between py2 and py3
because switching requires a `make clean`, which kills __version__.py. But then
when running `make local`, it picks up the local hg.exe (MSYS seems to prefix
$PATH with '.'), which doesn't know its version. That causes it to emit a
warning about needing at least 4.3 to load evolve, which caused setup.py to fail
saying there is no working hg executable to figure out the version. If we can
ignore general extension import failures, we should be able to ignore this too.
Martin von Zweigbergk <martinvonz@google.com> [Tue, 02 Oct 2018 09:11:18 -0700] rev 39986
narrow: avoid overwriting a variable
I don't like to overwrite variables, especially not with a different
type (because it makes it harder to explain what the variable is for,
and, therefore, to name it well).
Differential Revision: https://phab.mercurial-scm.org/D4846