Gregory Szorc <gregory.szorc@gmail.com> [Thu, 13 Sep 2018 15:52:42 -0700] rev 39785
revlog: add opener option to enable ellipsis flag processor
The ellipsis flag processor can now be registered by specifying
an opener option when constructing a revlog instance. This allows
us to enable ellipsis flags on a per-revlog basis.
Differential Revision: https://phab.mercurial-scm.org/D4647
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 13 Sep 2018 15:48:53 -0700] rev 39784
revlog: store flag processors per revlog
Previously, revlog flag processing would consult a global dict
when processing flags. This was simple. But it had the undesired
side-effect that any extension could load flag processors once
and those flag processors would be available to any revlog that was
subsequent loaded in the process. e.g. in hgweb, if the narrow
extension were loaded for repo A but not repo B, repo B would be
able to decode ellipsis flags even though it shouldn't be able to.
Making the flag processors dict per-revlog allows us to have per-revlog
controls over what flag processors are available, thus preserving
desired granular access to flag processors depending on the revlog's
needs.
If a flag processor is globally registered, it is still globally
available. So this commit should not meaningfully change behavior.
Differential Revision: https://phab.mercurial-scm.org/D4646
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 13:29:22 -0700] rev 39783
revlog: define ellipsis flag processors in core
We will soon be teaching core to honor the ellipsis flag on revlogs.
Moving the definition of the processor functions to core is the first
step in this.
The processor is still not registered unless the narrow extension is
loaded.
Differential Revision: https://phab.mercurial-scm.org/D4645
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 12:44:25 -0700] rev 39782
narrow: remove custom filelog type
This functionality is now handled by core as of the previous commit.
I wanted this to be a standalone commit because the deleted code
makes a reference to remotefilelog's file type missing a node() method
and this may have implications to narrow+remotefilelog usage. The code
in core doesn't perform this check and therefore behavior may be subtly
different and buggy.
But I /think/ the check is merely a performance optimization and
nothing more. So I'm optimistic this will continue to "just work."
Differential Revision: https://phab.mercurial-scm.org/D4644
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 13 Sep 2018 16:02:22 -0700] rev 39781
filelog: custom filelog to be used with narrow repos
Narrow repos may have file revisions whose copy/rename metadata
references files not in the store. This can pose problems when
consumers attempt to access a missing referenced file revision.
The narrow extension hacks around this problem by implementing a
derived filelog type that provides custom implementations of
renamed(), size(), and cmp() which handle renames against files not
in the narrow spec by silently removing the rename metadata.
While silently dropping metadata isn't the most robust solution,
it is the easiest to implement.
This commit ports the custom narrow filelog class to core.
When a narrow repo is constructed, its ifilestorage creation
function will automatically use the new filelog type. This means
the extra logic is 0 cost for non-narrow repos and shouldn't
interfere with their operation.
Differential Revision: https://phab.mercurial-scm.org/D4643
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 18 Sep 2018 15:29:42 -0700] rev 39780
localrepo: iteratively derive local repository type
This commit implements the dynamic local repository type derivation
that was explained in the recent commit
bfeab472e3c0 "localrepo: create new function for instantiating a local
repo object."
Instead of a static localrepository class/type which must be customized
after construction, we now dynamically construct a type by building up
base classes/types to represent specific repository interfaces.
Conceptually, the end state is similar to what was happening when
various extensions would monkeypatch the __class__ of newly-constructed
repo instances. However, the approach is inverted. Instead of making
the instance then customizing it, we do the customization up front
by influencing the behavior of the type then we instantiate that
custom type.
This approach gives us much more flexibility. For example, we can
use completely separate classes for implementing different aspects
of the repository. For example, we could have one class representing
revlog-based file storage and another representing non-revlog based
file storage. When then choose which implementation to use based on
the presence of repo requirements.
A concern with this approach is that it creates a lot more types
and complexity and that complexity adds overhead. Yes, it is true that
this approach will result in more types being created. Yes, this is
more complicated than traditional "instantiate a static type." However,
I believe the alternatives to supporting alternate storage backends
are just as complicated. (Before I arrived at this solution, I had
patches storing factory functions on local repo instances for e.g.
constructing a file storage instance. We ended up having a handful
of these. And this was logically identical to assigning custom
methods. Since we were logically changing the type of the instance,
I figured it would be better to just use specialized types instead
of introducing levels of abstraction at run-time.)
On the performance front, I don't believe that having N base classes
has any significant performance overhead compared to just a single base
class. Intuition says that Python will need to iterate the base classes
to find an attribute. However, CPython caches method lookups: as long as
the __class__ or MRO isn't changing, method attribute lookup should be
constant time after first access. And non-method attributes are stored
in __dict__, of which there is only 1 per object, so the number of
base classes for __dict__ is irrelevant.
Anyway, this commit splits up the monolithic completelocalrepository
interface into sub-interfaces: 1 for file storage and 1 representing
everything else.
We've taught ``makelocalrepository()`` to call a series of factory
functions which will produce types implementing specific interfaces.
It then calls type() to create a new type from the built-up list of
base types.
This commit should be considered a start and not the end state. I
suspect we'll hit a number of problems as we start to implement
alternate storage backends:
* Passing custom arguments to __init__ and setting custom attributes
on __dict__.
* Customizing the set of interfaces that are needed. e.g. the
"readonly" intent could translate to not requesting an interface
providing methods related to writing.
* More ergonomic way for extensions to insert themselves so their
callbacks aren't unconditionally called.
* Wanting to modify vfs instances, other arguments passed to __init__.
That being said, this code is usable in its current state and I'm
convinced future commits will demonstrate the value in this approach.
Differential Revision: https://phab.mercurial-scm.org/D4642
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 18 Sep 2018 15:15:24 -0700] rev 39779
localrepo: pass root manifest into manifestlog.__init__
Today, localrepository has a method that can be overloaded which
returns an instance of the root manifest storage object. When a
manifestlog is created, it calls this private method and stores
the root manifest object on it.
This "hook" on localrepository isn't part of the documented interface.
It isn't compatible with our desire to make repo storage determined
before the repo object is constructed.
This commit changes manifestlog.__init__ to accept the root
storage object instead of calling into the repo to construct it.
By doing things this way, the repo instance is responsible for
constructing the manifest storage object directly.
This does mean that other derived repo types need to overload
manifestlog(). But they should have been doing this already,
as manifestlog() is typically decorated in a storage-specific way.
e.g. localrepository.manifestlog() is decorated as
@storecache('00manifest.i'). And this assumes that a 00manifest.i
file exists in the store vfs. This condition may not hold for
repository types using non-revlog storage. So it is important
for special repo types to override manifestlog() to remove this
file association.
The code changed in perf is wrong because it isn't compatible with
older Mercurial versions. But I'm pretty sure the code was broken
on older versions before this commit. It only affects `hg perftags`.
I don't care enough to fix that at this time.
.. api::
``manifest.manifestlog.__init__()`` now receives the root manifest
storage instance instead of calling into a private method on
the repo object to obtain it.
Differential Revision: https://phab.mercurial-scm.org/D4641
Matt Harbison <matt_harbison@yahoo.com> [Fri, 21 Sep 2018 21:44:27 -0400] rev 39778
py3: create built in exceptions with str type messages in win32.py
I hit an IOError in unlink() in test-pathconflicts-basic.t, that then crashed as
it was handled:
File "mercurial\dispatch.py", line 359, in _runcatch
return _callcatch(ui, _runcatchfunc)
File "mercurial\dispatch.py", line 367, in _callcatch
return scmutil.callcatch(ui, func)
File "mercurial\scmutil.py", line 252, in callcatch
ui.error(_("abort: %s\n") % encoding.strtolocal(inst.strerror))
File "mercurial\encoding.py", line 205, in unitolocal
return tolocal(u.encode('utf-8'))
AttributeError: 'bytes' object has no attribute 'encode'
Matt Harbison <matt_harbison@yahoo.com> [Sat, 22 Sep 2018 12:11:48 -0400] rev 39777
tests: stabilize test-shelve.t#phasebased for #no-symlink and #no-execbit
The rev number ended up being 11 instead of 13 on Windows. If I ever get back
to issue2020, this will go away.
Martin von Zweigbergk <martinvonz@google.com> [Thu, 20 Sep 2018 21:35:01 -0700] rev 39776
debugdirstate: deprecate --nodates in favor of --no-dates
We have supported 'no-' prefixes for boolean flag for a few years now,
so I was expecting it to be --no-dates.
I noticed that we have --nodates options for a few more commands
(e.g. `hg diff`), but I'll leave that for another day.
Differential Revision: https://phab.mercurial-scm.org/D4693
Matt Harbison <matt_harbison@yahoo.com> [Fri, 21 Sep 2018 00:37:03 -0400] rev 39775
py3: fix a type error in hghave.has_hardlink
test-hghave.t was failing with:
feature hardlink failed: argument 1: <class 'TypeError'>: wrong type
Martin von Zweigbergk <martinvonz@google.com> [Fri, 21 Sep 2018 09:34:41 -0700] rev 39774
narrow: remove hack to read narowspec from shared .hg directory
This was another leftover from 576eef1ab43d (narrow: move
.hg/narrowspec to .hg/store/narrowspec (BC), 2018-08-02), in addition
to 623081f2abc2 (narrow: remove hack to write narrowspec to shared .hg
directory, 2018-09-12).
Differential Revision: https://phab.mercurial-scm.org/D4692
Augie Fackler <augie@google.com> [Fri, 21 Sep 2018 11:43:46 -0400] rev 39773
streamclone: reimplement nested context manager
It's gone in Python 3, and you can't *ctxs into a with statement. Sigh.
Differential Revision: https://phab.mercurial-scm.org/D4690
Augie Fackler <augie@google.com> [Fri, 21 Sep 2018 11:44:08 -0400] rev 39772
bundle2: grab kwarg using sysstr
# skip-blame just an r prefix on a string
Differential Revision: https://phab.mercurial-scm.org/D4691
Augie Fackler <augie@google.com> [Fri, 21 Sep 2018 11:15:55 -0400] rev 39771
py3: mark another passing test
Differential Revision: https://phab.mercurial-scm.org/D4689
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:47:49 +0900] rev 39770
bookmarks: remove --active in favor of --list
It's weird that we have both --active and --inactive options meaning
completely different things. Instead of adding a one-off option, let's
document the way to display the active bookmark by using -l/--list.
No deprecated option is added since --active isn't released yet.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:44:23 +0900] rev 39769
bookmarks: add explicit option to list bookmarks of the given names
This is a generalized form of the --active option.
A redundant sorted() call is removed. There was no point to update dict items
in lexicographical order.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:34:13 +0900] rev 39768
bookmarks: reject --delete with --inactive which makes no sense
A deleted bookmark is neither active nor inactive.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:32:01 +0900] rev 39767
bookmarks: parse out --inactive to action early
The --inactive option can't be directly mapped to an action or a modifier.
With any names, it means to add/rename to inactive bookmarks. Without names,
it means to deactivate the current bookmark. This patch separates them to
"inactive" flag and "action == 'inactive'".
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:25:19 +0900] rev 39766
bookmarks: parse out implicit "add" action early
This prepares for adding -l/--list option, which can be combined with the
positional arguments.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:07:38 +0900] rev 39765
bookmarks: clarify that opts['rename'] points to an old bookmark to be renamed
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 12:04:29 +0900] rev 39764
bookmarks: refactor option checking to pick one from --delete/rename/active
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 11:51:15 +0900] rev 39763
bookmarks: convert opts to bytes dict early
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 11:50:07 +0900] rev 39762
bookmarks: pass in formatter to printbookmarks() instead of opts (API)
This clarifies that user options have to be processed before calling
printbookmarks().
Boris Feld <boris.feld@octobus.net> [Wed, 19 Sep 2018 17:09:01 +0200] rev 39761
strip: ignore orphaned internal changesets while computing safe strip roots
Internal changeset can be safely garbage collected, so we can ignore them during
safestrip.
(Another phase for internal changeset that must be kept in the repository might
be introduced later).
Boris Feld <boris.feld@octobus.net> [Wed, 06 Jun 2018 02:31:46 +0200] rev 39760
shelve: no longer strip internal commit when using internal phase
When the internal phase is used, the internal commits we create during shelve
will be automatically hidden, and we don't need to strip them. Avoiding strips
gives much better performances and is less traumatic for caches.
Test changes are all related to revision numbers increasing more quickly since
we avoid stripping.
At the end of `test-shelve.t` we now need manually strip the shelve-commit in
addition to the x.shelve file deletion. This emulates a preexisting shelve
after a repository upgrade.
Note:
The hidden internal commits confuses rebase a bit as shown by a new test
added. This will happen when the user have shelve commits on top of a
changeset to be rebased.
We'll fix this in the next commit. As we still use a backup bundle, rebase
can just strip the internal changesets and be fine.
Martin von Zweigbergk <martinvonz@google.com> [Wed, 19 Sep 2018 12:07:52 -0700] rev 39759
meld: enable auto-merge
This tells meld to resolve trivial conflicts before presenting the
user with the remaining conflicts.
This was attempted 5 years ago, but then --auto-merge was too new that
the patch was rejected out of concern that users still had an older
version of meld installed [1]. Maybe it's safe to assume that they
have a newer version now.
[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2013-April/050084.html
Differential Revision: https://phab.mercurial-scm.org/D4665
Matt Harbison <matt_harbison@yahoo.com> [Thu, 20 Sep 2018 23:45:30 -0400] rev 39758
run-tests: partially backout PYTHON quoting
In 7f8b7a060584, I quoted this to support python being installed to
"Program Files". Even though the string passed to os.popen() is this:
"c:/Python27/python.exe" -c "import mercurial; print (mercurial.__path__[0])"
... cmd.exe is trying to run this:
'c:/Python27/python.exe" -c "import'
This caused test-hghave.t to fail, reporting 'unexpected mercurial lib: ""',
because the failed execution prints nothing to stdout. Py3 fails as though it's
not quoted. For whatever reason, print() shows up in the output when run with
py2, but not py3, so I'm having a hard time debugging this. For now, let's fix
the buildbot.
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 21 Sep 2018 03:16:08 +0530] rev 39757
py3: use '%d' instead of '%s' for integers
Python 3 does not allow to use '%s' for integers.
Differential Revision: https://phab.mercurial-scm.org/D4688
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 21 Sep 2018 03:16:38 +0530] rev 39756
py3: use print as a function in tests/test-revert.t
This makes the test work on Python 3.
Differential Revision: https://phab.mercurial-scm.org/D4687
Yuya Nishihara <yuya@tcha.org> [Wed, 19 Sep 2018 23:11:07 +0900] rev 39755
chgserver: restore pager fds attached within runcommand session
While rewriting chg in Rust, I noticed the server leaks the client's pager
fd. This isn't a problem right now since the IPC process terminates earlier
than the pager, but I believe the fds attached within a "runcommand" request
should be released as soon as the session ends.
Yuya Nishihara <yuya@tcha.org> [Wed, 19 Sep 2018 22:57:47 +0900] rev 39754
chgserver: add separate flag to remember if stdio fds are replaced
I want to make it use a separate saved buffer for "attachio" requests within
"runcommand" session. See the next patch for details.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 21:35:36 +0900] rev 39753
status: remove "morestatus" message from formatter data (BC)
They are just printable messages, not data that should be fed to JSON or
templater.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 21:28:47 +0900] rev 39752
tests: show that the structure of the more status output looks weird
Each dict should represent data of the same kind.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 16:35:39 +0900] rev 39751
phabricator: add testedwith boilerplate
Kyle Lippincott <spectral@google.com> [Thu, 20 Sep 2018 12:13:00 -0700] rev 39750
narrow: extract wdir cleanup function to make it extensible
We have an overlay filesystem which shows the entire repository, and unlinking
a file that's in the underlying data store will create "tombstone" entries,
which are going to cause our automatic tracking to re-add these directories. We
need to use a different (non-posix) interface to clean up items in the working
directory that are no longer relevant.
Extracting this to a function lets us use extensions.wrappedfunction and perform
this cleanup work, even if the paths aren't in the dirstate (they may have been
removed in the past and thus entirely "tombstone" entries already, part of
hgignore, exclusively directories (possibly empty), or other edge cases).
Differential Revision: https://phab.mercurial-scm.org/D4681
Augie Fackler <augie@google.com> [Thu, 20 Sep 2018 09:52:59 -0400] rev 39749
changegroup: reintroduce some comments that have gotten lost over the years
I got concerned about the correctness of the pruning logic, but I was
misreading it. I didn't figure that out until I walked all the way
back to 0252abaafb8a from 20111, where I was finally able to see (in
the deleted side of the change!) a complete explanation from
b6d9ea0bc107 in 2005.
Differential Revision: https://phab.mercurial-scm.org/D4686
Augie Fackler <augie@google.com> [Wed, 19 Sep 2018 23:38:30 -0400] rev 39748
changegroup: tease out a temporary prune method for manifests
It's extracted so extensions can filter manifest nodes if needed. This
is an unfortunate hack, but I think I only need it for manifests. The
long-term solution will be to rework the relationship between
changegroups and storage so that this isn't required.
Differential Revision: https://phab.mercurial-scm.org/D4685
Augie Fackler <augie@google.com> [Wed, 19 Sep 2018 23:36:16 -0400] rev 39747
changegroup: remove outdated comment
Differential Revision: https://phab.mercurial-scm.org/D4684
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 18:36:33 +0300] rev 39746
py3: encode the name to bytes before using in revsetpredicate()
Differential Revision: https://phab.mercurial-scm.org/D4677
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 18:36:00 +0300] rev 39745
py3: suppress the output on .write() calls in tests/test-hgweb-commands.t
Differential Revision: https://phab.mercurial-scm.org/D4676
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 18:35:24 +0300] rev 39744
py3: use stringutil.pprint() to print boolean values
Differential Revision: https://phab.mercurial-scm.org/D4675
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 18:34:38 +0300] rev 39743
py3: add a missing b'' in tests/test-newercgi.t
# skip-blame because just b'' prefixes
Differential Revision: https://phab.mercurial-scm.org/D4674
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 18:33:53 +0300] rev 39742
py3: use pycompat.maplist instead of map
Differential Revision: https://phab.mercurial-scm.org/D4673
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 17:23:20 +0300] rev 39741
py3: add some b'' prefixes in tests/test-extension.t
# skip-blame because just b'' prefixes
Differential Revision: https://phab.mercurial-scm.org/D4672
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 17:17:02 +0300] rev 39740
py3: make tests/svn-safe-append.py compatible with python 3
Differential Revision: https://phab.mercurial-scm.org/D4671
Pulkit Goyal <pulkit@yandex-team.ru> [Thu, 20 Sep 2018 17:16:16 +0300] rev 39739
py3: use print as a function in tests/test-subrepo-svn.t
Differential Revision: https://phab.mercurial-scm.org/D4670
Anton Shestakov <av6@dwimlabs.net> [Mon, 17 Sep 2018 17:47:24 +0800] rev 39738
bundle2: make server.bundle2.stream default to True
Support for bundle2 streaming clones has been shipped in Mercurial 4.5
(7eedbd5d4880), but was never activated by default. It's time to have more
people use it. The new format allows streaming clones to transport cache
(hooray for speed) and phaseroots (fixes phase-related issues).
Changes in tests:
bundle2 capabilities now have "stream=v2" (plus a '\n' as a separator) and
therefore take 14 bytes more: "%0Astream%3Dv2". Tip for tests that have data
encoded with CBOR: 0xd3 - 0xc5 = 14.
$USUAL_BUNDLE2_CAPS$ replaces $USUAL_BUNDLE2_CAPS_SERVER$, which is the same
thing, but without "stream=v2".
Since streaming clones now also transfer caches, the reported byte and file
counts are higher (e.g. 816 bytes in 9 files instead of 613 bytes in 4 files,
a bit of --debug and manual math confirms that the caches take these extra 203
bytes in 5 files).
Differential Revision: https://phab.mercurial-scm.org/D4680
Anton Shestakov <av6@dwimlabs.net> [Mon, 17 Sep 2018 16:52:34 +0800] rev 39737
bundle2: graduate bundle2.stream option from experimental to server section
Differential Revision: https://phab.mercurial-scm.org/D4679
Anton Shestakov <av6@dwimlabs.net> [Thu, 20 Sep 2018 17:02:31 +0800] rev 39736
tests: split capabilities into separate lines while searching for "narrow"
This test is interested only in capabilities that are related to narrow, so
let's omit everything else. Makes it easier to update other capabilities (and
"rev-branch-cache" is one of the usual patterns that are already present in
tests/common-patterns.py anyway).
Differential Revision: https://phab.mercurial-scm.org/D4678
Matt Harbison <matt_harbison@yahoo.com> [Wed, 19 Sep 2018 23:54:16 -0400] rev 39735
py3: resolve Unicode issues around `hg serve` on Windows
Presumably we're going to want to use CreateProcessW(), and possibly get rid of
pycompat.getcwd() here (which maps to the DeprecationWarning causing
os.getcwdb()) to use os.getcwd() directly. But this was a minimal change to
get rid of some stacktraces in test-run-tests.t.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 19 Sep 2018 21:41:58 -0400] rev 39734
run-tests: avoid os.getcwdb() on Windows
Any call to this issues a DeprecationWarning about the Windows bytes API being
deprecated. There are a handful of these calls in core, but test-run-tests.t
was littered with these, as it's printed everytime run-tests.py is launched.
I'm not sure what the long term strategy for Unicode on Windows in the test
runner is, but this seems no worse than the current conversion strategy.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 19 Sep 2018 20:45:57 -0400] rev 39733
run-tests: quote PYTHON when spawning a subprocess
Same reason as 5abc47d4ca6b. This covers running *.py tests, as well as inline
python blocks. I didn't hit the path around line 3079, but it seems correct to
quote.
Augie Fackler <augie@google.com> [Mon, 17 Sep 2018 20:43:40 -0400] rev 39732
narrow: add test showing that local-to-local narrow clones don't work
It turns out they've never actually worked: prior to some recent
refactoring they just unintentionally followed the full-clone path,
which we unintentionally relied on in a test at Google.
Differential Revision: https://phab.mercurial-scm.org/D4640
Martin von Zweigbergk <martinvonz@google.com> [Wed, 19 Sep 2018 17:34:36 -0700] rev 39731
fastannotate: process files as they arrive
peer.commandexecutor()'s context manager waits for all responses to
arrive in its __exit__() method. We want to process the results as
they arrive, so we should do that inside the context manager
scope. Note that the futures' result() methods have been replaced to
make sure that the command executor's sendcommands() method is called
when the first future's result is requested, so we don't need to do
that.
A minor side-effect is that we can no longer easily tell when the
server has started sending us responses, so that long statement was
lost.
Differential Revision: https://phab.mercurial-scm.org/D4666
Matt Harbison <matt_harbison@yahoo.com> [Tue, 18 Sep 2018 22:14:03 -0400] rev 39730
py3: make osenvironb a proxy for, instead of a copy of os.environ where needed
Without this, TESTDIR and a few other variables weren't defined in the *.t test.
I didn't bother implementing all of the view functions for simplicity. All that
is actually used is __{get,set}item__(), get() and pop(), but the rest seems
easy enough to add for futureproofing.
Sean Farley <sean@farley.io> [Tue, 22 May 2018 16:16:11 +0200] rev 39729
memctx: simplify _manifest with new revlog nodeids
This was originally written before we had modifiednodeid and
addednodeid, so we had to get the parents of the context, the data from
the function, and then hash that.
This is much more simple now and helps refactor more code later.
Sean Farley <sean@farley.io> [Tue, 22 May 2018 12:35:38 +0200] rev 39728
context: remove unused overlayfilectx (API)
It seems that this was maybe used in an extension but at this point
nothing in lfs, hg-experimental, or any other cursory repo looked at has
a reference to this class; so, for now, let's just remove it.
Sean Farley <sean@farley.io> [Mon, 11 Jun 2018 20:48:47 -0700] rev 39727
context: fix typo in workingcommitctx
This was probably a copy pasta error in 745e3b485632. Refactoring memctx
code exposed this bug.
Sean Farley <sean@farley.io> [Tue, 17 Jul 2018 17:16:22 -0700] rev 39726
filectx: fix return of renamed
How is this not blowing up everywhere?
It seems that filelog.renamed has always returned False (incorrectly a
boolean) instead of the assumed None. Tracing through history, you need
to skip over my move of code in 2013 by annotating from 896193a9cab4^
and you can see the original code is from 2007 (180a3eee4b75) and that
ab9fa7a85dd9 broke this by assuming renamed was a bool (instead of
None).
Refactoring memctx code later exposed this bug.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 19 Sep 2018 00:23:02 -0400] rev 39725
tests: glob over some quoting differences in test-narrow-widen-no-ellipsis.t
Matt Harbison <matt_harbison@yahoo.com> [Tue, 18 Sep 2018 23:56:38 -0400] rev 39724
py3: byteify contrib/check-config.py
The corresponding *.t still fails because of bytes (with a 'b' prefix) vs str
printing, but no longer crashes.
# skip-blame for b'' prefixing
Matt Harbison <matt_harbison@yahoo.com> [Tue, 18 Sep 2018 23:47:21 -0400] rev 39723
tests: quote PYTHON usage
Python3 defaults to installing under "Program Files".
Matt Harbison <matt_harbison@yahoo.com> [Tue, 18 Sep 2018 22:40:03 -0400] rev 39722
py3: add a missing b'' for Windows
I tried ./contrib/byteify-strings.py, but there were way too many changes (and
most looked wrong). This was hit with test-check-interfaces.py.
# skip-blame for b'' prefixes
Yuya Nishihara <yuya@tcha.org> [Mon, 03 Sep 2018 21:01:47 +0900] rev 39721
log: make changesetformatter pass in changectx to formatter
It wasn't necessary before, but user templates may have keywords that aren't
filled in by the changesetformatter.
Yuya Nishihara <yuya@tcha.org> [Mon, 03 Sep 2018 20:56:53 +0900] rev 39720
journal: use changesetformatter to properly nest list of commits in JSON
Before, two separate JSON documents were interleaved.
I chose the field name "changesets" over the option name "commits", since
each entry is called a "changeset" in log templates.
Yuya Nishihara <yuya@tcha.org> [Mon, 03 Sep 2018 07:53:50 +0900] rev 39719
journal: do not pass in repolookuperror string to template (BC)
This doesn't look like data, but a warning message.
Yuya Nishihara <yuya@tcha.org> [Mon, 03 Sep 2018 07:52:24 +0900] rev 39718
journal: inline formatted nodes and date into expression
The variable name "str" was misleading since these values aren't always
strings.
Yuya Nishihara <yuya@tcha.org> [Mon, 03 Sep 2018 07:48:43 +0900] rev 39717
journal: unify template name for "nodes" (BC)
This is a part of the name unification.
https://www.mercurial-scm.org/wiki/GenericTemplatingPlan#Dictionary
.. bc::
``{oldhashes}`` and ``{newhashes}`` in journal template are renamed to
``{oldnodes}`` and ``{newnodes}`` respectively.
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:59:26 -0700] rev 39716
localrepo: extract resolving of opener options to standalone functions
Requirements and config options are converted into a dict which is
available to the store vfs to consult. This is how storage options
are communicated from the repo layer to the storage layer.
Currently, we do that option resolution in a private method on the
repo instance. And there is a single method doing that resolution.
Opener options are logically specific to the storage backend they
apply to. And, opener options may wish to influence how the repo
object/type is constructed. So it makes sense to have more granular
storage option resolution that occurs before the repo object is
instantiated.
This commit extracts the code for resolving opener options into new
module-level functions. These functions are run before the repo
instance is constructed.
As part of the code move, we split the option resolution into
generic and revlog-specific options. After this commit, we no longer
add revlog-specific options to repos that don't have a revlog
requirement.
Some of these opener options and associated config options might make
sense on alternate storage backends. We can always reuse config
options and opener option names for other backends. But we shouldn't
be passing opener options to storage backends that won't recognize
them. I haven't done it here, but after this commit it should be
possible for store backends to validate the set of opener options
it receives.
Because localrepository.openerreqs is no longer used after this commit,
it has been removed.
I'm not super thrilled about the code outside of localrepo that is
adding requirements and updating opener options. We'll probably want
to create a more formal API for that use case that constructs a new
repo instance and poisons the old repo object. But this was a
pre-existing issue and can be dealt with later. I have little doubt
it will cause me troubles as I continue to refactor how repository
objects are instantiated.
.. api::
``localrepository.openerreqs`` has been removed. Override
``localrepo.resolvestorevfsoptions()`` to add custom opener options.
.. api::
``localrepository._applyopenerreqs()`` has been removed. Use
``localrepo.resolvestorevfsoptions()`` to add custom opener options.
Differential Revision: https://phab.mercurial-scm.org/D4576
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:17:47 -0700] rev 39715
localrepo: use boolean in opener options
Not sure why we're using an integer for a flag value here. I'm
pretty sure nothing relies on values being 1.
While we're here, convert to a dict comprehension.
Differential Revision: https://phab.mercurial-scm.org/D4575
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:07:27 -0700] rev 39714
localrepo: move store() from store module
I want logic related to requirements handling to be in the localrepo
module so it is all in one place.
I would have loved to inline this logic. Unfortunately, statichttprepo
also calls it. I didn't want to inline it twice. We could potentially
refactor statichttppeer. But meh.
Differential Revision: https://phab.mercurial-scm.org/D4574
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:05:51 -0700] rev 39713
localrepo: resolve store and cachevfs in makelocalrepository()
This is mostly a code move and refactor.
One change is that we now explicitly look for requirements indicating
a share is being used rather than blindly try to read from
.hg/sharedpath. Requirements *should* be all that is necessary to
dictate high-level behavior and I'm not sure why the previous code
was doing what it was.
The previous code has been in place since 87d1fd40f57e (authored in
2009). And the commit immediately after that (971e38a9344b) introduced
``hg.share()`` and always wrote the ``shared`` requirement. And as far
as I can tell, every revision of ``hg.share()`` since has written
either the ``shared`` or ``relshared`` requirement. So I'm pretty
sure we don't need to maintain BC by always looking for and honoring
the ``.hg/sharedpath`` file even if a requirement isn't present.
.. bc::
A repository will no longer use shared storage if it has a
``.hg/sharedpath`` file but no entry in ``.hg/requires`` saying it
is shared.
This change should not have any end-user impact, as all shared
repos should have a ``.hg/requires`` file indicating this.
Differential Revision: https://phab.mercurial-scm.org/D4573
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 13:10:45 -0700] rev 39712
localrepo: document and test bug around opening shared repos
As part of refactoring this code, I realized that we don't
validate the requirements of a shared repository. This commit
documents that next to the requirements validation code and adds a
test demonstrating the buggy behavior.
I'm not sure if I'll fix this. But it is definitely a bug that
users could encounter, as LFS, narrow, and potentially other
extensions dynamically add requirements on first use. One part
of this I'm not sure about is how to handle loading the .hg/hgrc
of the shared repo. We need to do that in order to load extensions.
But we don't want that repo's hgrc to overwrite the current repo's.
Differential Revision: https://phab.mercurial-scm.org/D4572
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:03:17 -0700] rev 39711
localrepo: move requirements reasonability testing to own function
Just because we know how to handle each listed requirement doesn't
mean that set of requirements is reasonable.
This commit introduces an extension-wrappable function to validate
that a set of requirements makes sense.
We could combine this with ensurerequirementsrecognized(). But I think
having a line between basic membership testing and compatibility
checking is more powerful as it will help differentiate between
missing support and buggy behavior.
Differential Revision: https://phab.mercurial-scm.org/D4571
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 15:47:24 -0700] rev 39710
statichttprepo: use new functions for requirements validation
The new code in localrepo for requirements gathering and validation
is more robust than scmutil.readrequires(). Let's port statichttprepo
to it.
Since scmutil.readrequires() is no longer used, it has been removed.
It is possible extensions were monkeypatching this to supplement the
set of supported requirements. But the proper way to do that is to
register a featuresetupfuncs. I'm comfortable forcing the API break
because featuresetupfuncs is more robust and has been supported for
a while.
.. api::
``scmutil.readrequires()`` has been removed.
Use ``localrepo.featuresetupfuncs`` to register new repository
requirements.
Use ``localrepo.ensurerequirementsrecognized()`` to validate them.
Differential Revision: https://phab.mercurial-scm.org/D4570
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 14:54:17 -0700] rev 39709
localrepo: validate supported requirements in makelocalrepository()
This should be a glorified code move. I did take the opportunity to
refactor things. We now have a separate function for gathering
requirements and one for validating them.
I also mode cosmetic changes to the code, such as not using
abbreviations and using a set instead of list to model missing
requirements.
Differential Revision: https://phab.mercurial-scm.org/D4569
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 14:45:52 -0700] rev 39708
localrepo: read requirements file in makelocalrepository()
Previously, scmutil.readrequires() loaded the requirements file
and validated its content against what was supported.
Requirements translate to repository features and are critical to
our plans to dynamically create local repository types. So, we must
load them in makelocalrepository() before a repository instance is
constructed.
This commit moves the reading of the .hg/requires file to
makelocalrepository(). Because scmutil.readrequires() was performing
I/O and validation, we inlined the validation into
localrepository.__init__ and removed scmutil.readrequires().
I plan to remove scmutil.readrequires() in a future commit (we can't
do it now because statichttprepo uses it).
Differential Revision: https://phab.mercurial-scm.org/D4568
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 12:36:07 -0700] rev 39707
localrepo: check for .hg/ directory in makelocalrepository()
As part of this, we move the check to before .hg/hgrc is loaded,
as it makes sense to check for the directory before attempting to
open a file in it.
Differential Revision: https://phab.mercurial-scm.org/D4567
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 11:44:57 -0700] rev 39706
localrepo: load extensions in makelocalrepository()
Behavior does change subtly.
First, we now load the hgrc before optionally setting up the vfs ward.
That's fine: the vfs ward is for debugging and we know we won't hit it
when reading .hg/hgrc. If the loaded extension were performing repo/vfs
I/O, then we'd be worried. But extensions don't have access to the
repo object that loaded them when they are loaded. Unless they are
doing stack walking as part of module loading (which would be crazy),
they shouldn't have access to the repo that incurred their load.
Second, we now load extensions outside of the try..except IOError
block. Previously, if loading an extension raised IOError, it would
be silently ignored. I'm pretty sure the IOError is there for missing
.hgrc files and should never have been ignored for issues loading
extensions. I don't think this matters in reality because extension
loading traps I/O errors.
Differential Revision: https://phab.mercurial-scm.org/D4566
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 11:34:02 -0700] rev 39705
localrepo: copy ui in makelocalrepository()
We will want to load the .hg/hgrc file from makelocalrepository() so
we can consult its options as part of deriving the repository type.
This means we need to create our ui instance copy in that function.
Differential Revision: https://phab.mercurial-scm.org/D4565
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 11:31:14 -0700] rev 39704
localrepo: move some vfs initialization out of __init__
In order to make repository types more dynamic, we'll need to move the
logic for determining repository behavior out of
localrepository.__init__ so we can influence behavior before the type
is instantiated.
This commit starts that process by moving working directory and .hg/
vfs initialization to our new standalone function for instantiating
local repositories.
Aside from API changes, behavior should be fully backwards compatible.
.. api::
localrepository.__init__ now does less work and accepts new args
Use ``hg.repository()``, ``localrepo.instance()``, or
``localrepo.makelocalrepository()`` to obtain a new local repository
instance instead of calling the ``localrepository`` constructor
directly.
Differential Revision: https://phab.mercurial-scm.org/D4564
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 11:02:16 -0700] rev 39703
localrepo: create new function for instantiating a local repo object
Today, there is a single local repository class - localrepository. Its
__init__ is responsible for loading the .hg/requires file and taking
different actions depending on what is present.
In addition, extensions may define a "reposetup" function that
monkeypatches constructed repository instances, often by implementing
a derived type and changing the __class__ of the repo instance.
Work around alternate storage backends and partial clone has made it
clear to me that shoehorning all this logic into __init__ and operating
on an existing instance is too convoluted. For example, localrepository
assumes revlog storage and swapping in non-revlog storage requires
overriding e.g. file() to return something that isn't a revlog. I've
authored various patches that either:
a) teach various methods (like file()) about different states and
taking the appropriate code path at run-time
b) create methods/attributes/callables used for instantiating things
and populating these in __init__
"a" incurs run-time performance penalties and makes code more
complicated since various functions have a bunch of "if storage is X"
branches.
"b" makes localrepository quickly explode in complexity.
My plan for tackling this problem is to make the local repository type
more dynamic. Instead of a static localrepository class/type that
supports all of the local repository configurations (revlogs vs other,
revlogs with ellipsis, revlog v1 versus revlog v2, etc), we'll
dynamically construct a type providing the implementations that are
needed for the repository on disk, derived from the .hg/requires file
and configuration options. The constructed repository type will be
specialized and methods won't need to be taught about different
implementations nor overloaded.
We may also leverage this functionality for building types that don't
implement all attributes. For example, the "intents" feature allows
commands to declare that they are read only. By dynamically
constructing a repository type, we could return a repository instance
with no attributes related to mutating the repository. This could
include things like a "changelog" property implementation that doesn't
check whether it needs to invalidate the hidden revisions set on every
access.
This commit establishes a function for building a local repository
instance. Future commits will start moving functionality from
localrepository.__init__ to this function. Then we'll start dynamically
changing the returned type depending on options that are present.
This change may seem radical. But it should be fully compatible with
the reposetup() model - at least for now.
Differential Revision: https://phab.mercurial-scm.org/D4563
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:29:12 -0700] rev 39702
transaction: make entries a private attribute (API)
This attribute is tracking changes to append-only files. It is
an implementation detail and should not be exposed as part of
the public interface.
But code in repair was accessing it, so it seemingly does belong
as part of the public API. But that code in repair is making
assumptions about how storage works and is grossly wrong when
alternate storage backends are in play. We'll need some kind of
"strip" API at the storage layer that knows how to handle things
in a storage-agnostic manner. I don't think accessing a private
attribute on the transaction is any worse than what this code
is already doing. So I'm fine with violating the abstraction for
transactions.
And with this change, all per-instance attributes on transaction
have been made private except for "changes" and "hookargs." Both
are used by multiple consumers and look like they need to be
part of the public interface.
.. api::
Various attributes of ``transaction.transaction`` are now ``_``
prefixed to indicate they shouldn't be used by external
consumers.
Differential Revision: https://phab.mercurial-scm.org/D4634
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:19:55 -0700] rev 39701
transaction: make names a private attribute
This is used to report the transaction name in __repr__. It is
very obviously an implementation detail and doesn't need to be
exposed as part of the public interface. So mark it as private.
Differential Revision: https://phab.mercurial-scm.org/D4633
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:13:38 -0700] rev 39700
transaction: make map a private attribute
This is used to track which files are modified. It is an
implementation detail of current transactions and doesn't need
to be exposed to the public interface. So mark it as private.
Differential Revision: https://phab.mercurial-scm.org/D4632
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:11:25 -0700] rev 39699
transaction: make report a private attribute
This is a callable used for logging. It isn't used outside the
transaction code. It doesn't need to be part of the public interface.
Let's mark it as private.
Differential Revision: https://phab.mercurial-scm.org/D4631
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:08:02 -0700] rev 39698
transaction: make opener a private attribute
The VFS instance is an implementation detail of the transaction
and doesn't belong as part of the public interface. So mark it as
private.
Differential Revision: https://phab.mercurial-scm.org/D4630
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:04:52 -0700] rev 39697
transaction: make after a private attribute
This is another callable that is passed in at __init__ time. It
doesn't need to be part of the public interface.
Differential Revision: https://phab.mercurial-scm.org/D4629
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:02:53 -0700] rev 39696
transaction: make checkambigfiles a private attribute
This holds instance state that is passed in at __init__ time. It
doesn't need to be exposed as part of the public interface.
Differential Revision: https://phab.mercurial-scm.org/D4628
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:01:22 -0700] rev 39695
transaction: make validator a private attribute
This is similar to releasefn. It holds state that doesn't need to be
exposed as part of the public interface.
Differential Revision: https://phab.mercurial-scm.org/D4627
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 16:00:09 -0700] rev 39694
transaction: make releasefn a private attribute
This is a handle on a callable that is called when the journal
is closed. The value is specified at __init__ time. It doesn't
need to be exposed on the public interface. So mark it as private.
Differential Revision: https://phab.mercurial-scm.org/D4626
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 15:57:32 -0700] rev 39693
transaction: make file a private attribute
This holds a file handle for the journal file. This file handle
should not be touched outside the journal class and doesn't
belong on the public interface.
Differential Revision: https://phab.mercurial-scm.org/D4625
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 15:55:57 -0700] rev 39692
transaction: make journal a private attribute
This attribute tracks the name of the journal file. It is an
implementation detail of the current transaction and therefore
shouldn't be exposed as part of the interface. Let's mark it as
private.
Differential Revision: https://phab.mercurial-scm.org/D4624
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 15:52:59 -0700] rev 39691
transaction: make undoname a private attribute
This attribute tracks the file pattern to use for undo files.
It is an implementation detail of the current transaction semantics
and doesn't need to be part of the future transaction interface. So
mark it as private.
Differential Revision: https://phab.mercurial-scm.org/D4623
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 17 Sep 2018 15:51:19 -0700] rev 39690
transaction: make count and usages private attributes
I want to formalize the interface for transactions. As part of
doing that, let's take the opportunity to make some attributes
non-public.
"count" and "usages" track how many times the transaction has
been opened/nested/closed/released. This is internal state and
doesn't need to be part of the public API.
Differential Revision: https://phab.mercurial-scm.org/D4622
Pulkit Goyal <pulkit@yandex-team.ru> [Tue, 18 Sep 2018 13:41:16 +0300] rev 39689
narrow: don't send the changelog information when widening without ellipses
When we widen anon-ellipses narrow copy, the server sends the changelog
information of all the changesets. The code was copied from ellipses case and in
ellipses cases, it's required to send the new changelog data.
But in non-ellipses cases, we don't need to send the changelog data as we will
have all the changesets locally.
Before this patch, there was a overhead of ~8-10 mins on each widening call
because of all the changelog information being pulled and being applied. After
this patch, we no more pull the changelog information. So this patch can save ~5
mins on Mozilla repo on each widening and more on repos which have more
changesets.
When we apply an empty changelog from changegroup, there is a devel-warn. This
patch kind of hacks to silence that devel-warn.
Differential Revision: https://phab.mercurial-scm.org/D4639
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 21:41:34 +0300] rev 39688
changegroup: add functionality to skip adding changelog data to changegroup
In narrow extension, when we have a non-ellipses narrow working copy and we
extend it, we pull all the changelog data again and the client tries to reapply
all that changelog data.
While downloading millions of changeset data is still not very expensive but
applying them on the client side is very expensive and takes ~10 minutes. These
10 minutes are added to every `hg tracked --addinclude <>` call and extending
a narrow copy becomes very slow.
This patch adds a new changelog argument to cgpacker.generate() fn. If the
changelog argument is set to False, we won't yield the changelog data. We still
have to iterate over the deltas returned by _generatechangelog() because that's
a generator and builds the data for clstate variable which is required for
calculating manifests and filelogs.
Differential Revision: https://phab.mercurial-scm.org/D4638
Pulkit Goyal <pulkit@yandex-team.ru> [Tue, 18 Sep 2018 10:46:19 -0700] rev 39687
tests: add debug output in test-narrow-widen-no-ellipsis.t
This will help us in understanding the upcoming patches better.
Differential Revision: https://phab.mercurial-scm.org/D4637
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 18:21:17 +0300] rev 39686
changegroup: improve the devel-warn to specify changelog was empty
Right now, the develwarn says "applied empty changegroup" which is not correct
because we can send a changegroup without changelog with just manifest and
filelogs and it will still say the same.
Let's fix this to say that we are applying empty changelog from changegroup. In
future patches I am will be adding functionality to send a changegroup from the
server without an empty changelog.
Differential Revision: https://phab.mercurial-scm.org/D4636
Anton Shestakov <av6@dwimlabs.net> [Mon, 17 Sep 2018 13:21:46 +0800] rev 39685
zsh_completion: add -b/--branch and -B/--bookmark(s) flags properly
_hg_branch_bmark_opts used to add these two flags, but had the same
descriptions for the flags regardless of what command took them and didn't
allow specifying flags more than once (no '*' at the start). Even more
importantly, it assumed that -B was always expecting an argument (i.e.
--bookmark=foo), but in case of incoming and outgoing it's not so (--bookmarks
is self-sufficient).
Differential Revision: https://phab.mercurial-scm.org/D4612
spectral <spectral@google.com> [Fri, 14 Sep 2018 16:29:51 -0700] rev 39684
narrow: when writing treemanifests, skip inspecting directories outside narrow
This provides significant speed benefits when narrow and treemanifests are in
use, see the timing numbers below. Note that like previously, differences of <5%
are considered noise.
The below timing numbers are in the same style as previously (example:
ee7ee0c516ca). 'before' is 9db85644, and does not include that example commit's
improvements.
diff --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.327 s +- 0.051 s | 1.296 s +- 0.009 s | 97.7%
m-u | | x | 1.310 s +- 0.020 s | 1.295 s +- 0.015 s | 98.9%
m-u | x | | 1.295 s +- 0.018 s | 1.296 s +- 0.007 s | 100.1%
m-u | x | x | 83.5 ms +- 0.8 ms | 84.1 ms +- 0.8 ms | 100.7%
l-d-r | | | 205.1 ms +- 3.5 ms | 205.0 ms +- 3.8 ms | 100.0%
l-d-r | | x | 194.2 ms +- 5.6 ms | 192.3 ms +- 4.3 ms | 99.0%
l-d-r | x | | 99.1 ms +- 2.2 ms | 97.8 ms +- 0.9 ms | 98.7%
l-d-r | x | x | 66.2 ms +- 1.0 ms | 67.2 ms +- 2.7 ms | 101.5%
diff -c . --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 233.9 ms +- 1.9 ms | 235.6 ms +- 5.1 ms | 100.7%
m-u | | x | 151.4 ms +- 1.2 ms | 152.2 ms +- 2.0 ms | 100.5%
m-u | x | | 234.8 ms +- 2.7 ms | 235.0 ms +- 2.7 ms | 100.1%
m-u | x | x | 127.8 ms +- 2.1 ms | 126.0 ms +- 1.1 ms | 98.6%
l-d-r | | | 82.5 ms +- 1.6 ms | 82.3 ms +- 2.0 ms | 99.8%
l-d-r | | x | 3.742 s +- 0.017 s | 3.819 s +- 0.208 s | 102.1%
l-d-r | x | | 84.4 ms +- 1.5 ms | 83.2 ms +- 1.0 ms | 98.6%
l-d-r | x | x | 751.2 ms +- 5.0 ms | 755.8 ms +- 12.9 ms | 100.6%
rebase -r . --keep -d .^^:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 5.519 s +- 0.038 s | 5.526 s +- 0.057 s | 100.1%
m-u | | x | 5.588 s +- 0.048 s | 5.607 s +- 0.061 s | 100.3%
m-u | x | | 5.520 s +- 0.044 s | 5.546 s +- 0.059 s | 100.5%
m-u | x | x | 586.6 ms +- 12.8 ms | 554.9 ms +- 21.2 ms | 94.6% <--
l-d-r | | | 629.8 ms +- 5.5 ms | 627.4 ms +- 6.6 ms | 99.6%
l-d-r | | x | 6.165 s +- 0.058 s | 6.255 s +- 0.303 s | 101.5%
l-d-r | x | | 270.2 ms +- 2.3 ms | 271.4 ms +- 2.7 ms | 100.4%
l-d-r | x | x | 4.700 s +- 0.025 s | 1.651 s +- 0.016 s | 35.1% <--
status --change . --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 215.4 ms +- 2.3 ms | 216.5 ms +- 4.2 ms | 100.5%
m-u | | x | 132.9 ms +- 1.2 ms | 132.0 ms +- 1.4 ms | 99.3%
m-u | x | | 217.0 ms +- 1.9 ms | 215.4 ms +- 1.9 ms | 99.3%
m-u | x | x | 108.6 ms +- 1.0 ms | 108.2 ms +- 1.5 ms | 99.6%
l-d-r | | | 80.0 ms +- 1.3 ms | 80.5 ms +- 1.1 ms | 100.6%
l-d-r | | x | 3.916 s +- 0.187 s | 3.966 s +- 0.236 s | 101.3%
l-d-r | x | | 84.4 ms +- 3.1 ms | 83.9 ms +- 1.1 ms | 99.4%
l-d-r | x | x | 758.0 ms +- 8.2 ms | 753.5 ms +- 5.0 ms | 99.4%
status --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.905 s +- 0.025 s | 1.910 s +- 0.044 s | 100.3%
m-u | | x | 1.892 s +- 0.009 s | 1.895 s +- 0.012 s | 100.2%
m-u | x | | 1.891 s +- 0.012 s | 1.902 s +- 0.018 s | 100.6%
m-u | x | x | 93.3 ms +- 0.9 ms | 93.4 ms +- 0.8 ms | 100.1%
l-d-r | | | 570.7 ms +- 7.8 ms | 571.9 ms +- 18.5 ms | 100.2%
l-d-r | | x | 561.5 ms +- 5.2 ms | 562.9 ms +- 6.1 ms | 100.2%
l-d-r | x | | 171.7 ms +- 2.6 ms | 171.9 ms +- 1.2 ms | 100.1%
l-d-r | x | x | 142.7 ms +- 2.0 ms | 140.3 ms +- 1.0 ms | 98.3%
update $rev^; ~/src/hg/hg{hg}/hg update $rev:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 3.126 s +- 0.016 s | 3.128 s +- 0.015 s | 100.1%
m-u | | x | 3.014 s +- 0.068 s | 3.008 s +- 0.031 s | 99.8%
m-u | x | | 3.143 s +- 0.037 s | 3.184 s +- 0.086 s | 101.3%
m-u | x | x | 308.0 ms +- 1.8 ms | 308.1 ms +- 5.7 ms | 100.0%
l-d-r | | | 430.8 ms +- 4.5 ms | 436.4 ms +- 8.7 ms | 101.3%
l-d-r | | x | 9.676 s +- 0.127 s | 9.945 s +- 0.272 s | 102.8%
l-d-r | x | | 254.2 ms +- 3.3 ms | 255.7 ms +- 3.1 ms | 100.6%
l-d-r | x | x | 1.571 s +- 0.030 s | 1.555 s +- 0.014 s | 99.0%
Differential Revision: https://phab.mercurial-scm.org/D4606
Augie Fackler <augie@google.com> [Mon, 17 Sep 2018 15:16:20 -0400] rev 39683
tests: fix a couple of drawdag.py references
Differential Revision: https://phab.mercurial-scm.org/D4635
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 14 Sep 2018 23:51:21 +0300] rev 39682
py3: fix kwargs handling in hgext/fastannotate.py
Differential Revision: https://phab.mercurial-scm.org/D4588
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 15:55:18 +0300] rev 39681
narrow: use diffmatcher to send only new filelogs in non-ellipses widening
Before this patch, when we widen a non-ellipses narrow clone, we downloads all
the filelogs matching the resulting new matcher. This is same as the ellipses
case but can be improved because, we don't pull new csets in non-ellipses cases,
we can only download the new added files instead of downloading all the files
which matches the new matcher.
So, we only download files which matches the new matcher but does not matches
the old matcher. There exists a match.differencematcher() which is used here.
This will lead to significant amount of speedup in extending a non-ellipses
narrow copy on large repos because we will download and process only the new
required filelogs.
The tests changes demonstrate that we are downloading now less files.
Thanks to Augie for pointing that functionality of differencematcher exists in
core.
Differential Revision: https://phab.mercurial-scm.org/D4614
Pulkit Goyal <pulkit@yandex-team.ru> [Mon, 17 Sep 2018 15:27:39 +0300] rev 39680
py3: add missing b'' prefixes in couple of test files
These were missed in the earlier patch and caught by Yuya.
# skip-blame because just b'' prefix
Differential Revision: https://phab.mercurial-scm.org/D4613
Matt Harbison <matt_harbison@yahoo.com> [Sun, 16 Sep 2018 23:13:05 -0400] rev 39679
run-tests: convert the remaining os.system() call to Unicode
I wasn't able to hit this path in 543a788eea2d, but I have now when I
accidentally left off `--local`.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 15 Sep 2018 13:31:41 -0400] rev 39678
py3: partially fix pager spawning on Windows
Previously, spinning up the pager crashed because the command and environment
was in bytes. (See also 543a788eea2d.) Now it aborts with an invalid handle:
$ HGMODULEPOLICY=py py -3 ../hg --traceback --config extensions.evolve=!
Traceback (most recent call last):
File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 967, in _write
self.fout.write(''.join(msgs))
File "c:\Users\Matt\projects\hg\mercurial\windows.py", line 173, in write
self.fp.write(s[start:end])
OSError: [WinError 6] The handle is invalid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\Matt\projects\hg\mercurial\scmutil.py", line 164, in callcatch
return func()
File "c:\Users\Matt\projects\hg\mercurial\dispatch.py", line 350, in _runcatchfunc
return _dispatch(req)
File "c:\Users\Matt\projects\hg\mercurial\dispatch.py", line 930, in _dispatch
return commands.help_(ui, 'shortlist')
File "c:\Users\Matt\projects\hg\mercurial\commands.py", line 2930, in help_
ui.write(formatted)
File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 948, in write
self._writenobuf(*args, **opts)
File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 960, in _writenobuf
self._write(*msgs, **opts)
File "c:\Users\Matt\projects\hg\mercurial\ui.py", line 969, in _write
raise error.StdioError(err)
mercurial.error.StdioError: [Errno 9] The handle is invalid
abort: The handle is invalid
The interesting bit here is that the abort message is marked with ANSI color,
but the OSError is not.
Yuya Nishihara <yuya@tcha.org> [Sat, 15 Sep 2018 10:35:00 +0900] rev 39677
censor: rename loop variable to silence pyflakes warning
hgext/censor.py:92: list comprehension redefines 'c' from line 88
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:58:51 +0530] rev 39676
py3: add b'' prefixes in tests/test-hgweb-no-request-uri.t
# skip-blame because just b'' prefixes.
Differential Revision: https://phab.mercurial-scm.org/D4611
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:49:37 +0530] rev 39675
py3: add b'' prefixes in tests/test-hgweb-no-path-info.t
# skip-blame because just b'' prefixes
Differential Revision: https://phab.mercurial-scm.org/D4610
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 20:20:59 +0530] rev 39674
py3: add b'' prefixes in tests/test-hgweb-non-interactive.t
# skip-blame because just b'' prefix
Differential Revision: https://phab.mercurial-scm.org/D4609
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:58:01 +0530] rev 39673
py3: use codecs.encode() to encode in rot-13 encoding
The other occurence will need some more love as description is bytes by default
and we need to decode it and then encode it.
Differential Revision: https://phab.mercurial-scm.org/D4608
Pulkit Goyal <pulkit@yandex-team.ru> [Sun, 16 Sep 2018 19:18:15 +0530] rev 39672
py3: add two passing tests to whitelist found by buildbot
The buildbot found these two new passing tests on Python 3.
Differential Revision: https://phab.mercurial-scm.org/D4607
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:36:43 -0400] rev 39671
phabricator: mark extension as experimental for now
I don't want us to commit to this having a stable interface just yet.
Differential Revision: https://phab.mercurial-scm.org/D4605
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:16:31 -0400] rev 39670
phabricator: fix templating bug by using hybriddict
Differential Revision: https://phab.mercurial-scm.org/D4604
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 01:13:37 -0400] rev 39669
phabricator: add tests of templatekeyword
Having tests is paying off: I found a bug and now it'll be easy to
fix!
Differential Revision: https://phab.mercurial-scm.org/D4603
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:46:17 -0400] rev 39668
phabricator: move extension from contrib to hgext
It's well-enough tested now and widely enough used I think we should
ship it.
Differential Revision: https://phab.mercurial-scm.org/D4602
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:50:21 -0400] rev 39667
tests: add some basic tests of phabricator interactions
This uses the vcr library to avoid hitting phabricator on every test
execution. In order to generate new recordings (vcr calls them
cassettes) just remove the appropriate json file, and the test will
regenerate it. It's not my favorite way to test things, but it'll let
us have test coverage on the phabricator extension that'll make it
resilient to refactors in core and let us move it to hgext.
In the future, it'd probably be better to have a docker container we
can spin up for creating the vcr recordings, but for now this is
enough better than nothing I'm going to declare victory.
Coverage reports about 73% of the extension is now covered.
Differential Revision: https://phab.mercurial-scm.org/D4601
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:20:03 -0400] rev 39666
phabricator: add support for using the vcr library to mock interactions
I'll use this in an upcoming test. The decorator dancing in this is
more complicated than I'd like, but it beats repeating all this code
everywhere.
Differential Revision: https://phab.mercurial-scm.org/D4600
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:19:09 -0400] rev 39665
keepalive: work around slight deficiency in vcr
VCR's response type doesn't define the will_close attribute. Let's
just have keepalive default to closing the socket if the will_close
attribute is missing.
Differential Revision: https://phab.mercurial-scm.org/D4599
Augie Fackler <raf@durin42.com> [Sat, 15 Sep 2018 00:18:16 -0400] rev 39664
hghave: add a checker for the vcr HTTP record/replay library
I'm going to use this to write some tests of the phabricator
extension.
Differential Revision: https://phab.mercurial-scm.org/D4598
Matt Harbison <matt_harbison@yahoo.com> [Sat, 15 Sep 2018 00:04:06 -0400] rev 39663
py3: allow run-tests.py to run on Windows
This is now functional:
HGMODULEPOLICY=py py -3 run-tests.py --local test-help.t --pure --view bcompare
However, on this machine without a C compiler, it tries to load cext anyway, and
blows up. I haven't looked into why, other than to see that it does set the
environment variable. When the test exits though, I see it can't find
killdaemons.py, get-with-headers.py, etc.
I have no idea why these changes are needed, given that it runs on Linux. But
os.system() is insisting that it take a str, and subprocess.Popen() blows up
without str:
Errored test-help.t: Traceback (most recent call last):
File "run-tests.py", line 810, in run
self.runTest()
File "run-tests.py", line 858, in runTest
ret, out = self._run(env)
File "run-tests.py", line 1268, in _run
exitcode, output = self._runcommand(cmd, env)
File "run-tests.py", line 1141, in _runcommand
env=env)
File "C:\Program Files\Python37\lib\subprocess.py", line 756, in __init__
restore_signals, start_new_session)
File "C:\Program Files\Python37\lib\subprocess.py", line 1100, in _execute_child
args = list2cmdline(args)
File "C:\Program Files\Python37\lib\subprocess.py", line 511, in list2cmdline
needquote = (" " in arg) or ("\t" in arg) or not arg
TypeError: argument of type 'int' is not iterable
This is exactly how it crashes when trying to spin up a pager too. I left one
instance of os.system() unchanged in _installhg(), because it doesn't get there.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 23:04:18 -0400] rev 39662
py3: ensure run-tests environment is uniformly str
subprocess.popen() was crashing, and when I printed out `env`, all of the keys
and most of the values were str. Except these.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 14 Sep 2018 22:57:35 -0400] rev 39661
py3: ensure run-tests.osenvironb is actually bytes
Windows doesn't have os.environb, so it was falling back to the Unicode form,
and all of the accesses are trying to use bytes.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 22:07:00 -0400] rev 39660
py3: fix str vs bytes in enough places to run `hg version` on Windows
I don't have Visual Studio 2015 at home, but this now works with a handful of
extensions (blackbox, extdiff, patchbomb, phabricator and rebase, but not
evolve):
$ HGMODULEPOLICY=py py -3 ../hg version
Enabling the evolve extension causes the usual "failed to import ..." line, but
then print this before the usual version output:
('commit', '[b'debugancestor', b'debugapplystreamclonebundle', ...,
b'verify', b'version']')
... where the elided part seems to be every command and alias known.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 20:54:53 -0400] rev 39659
windows: open registry keys using unicode names
Python3 complained it must be str. While here, use a context manager to close
the key- it wouldn't wrap at 80 characters the old way, and would have had to
move anyway.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 00:39:02 -0400] rev 39658
py3: byteify strings in pycompat
These surfaced when disabling the source transformer to debug the problems in
win32.py. ./contrib/byteify-strings.py found a couple false positives, so I
marked them with r'' explicitly (in case I'm wrong).
# skip-blame since this is just b'' and r'' prefixing
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 30 Aug 2018 14:55:34 -0700] rev 39657
wireprotov2: let clients drive delta behavior
Previously, the "manifestdata" and "filedata" commands assumed the
receiver had all parent revisions for requested nodes. Unless the
revision had no parents, they emitted a delta instead of a fulltext.
This strategy isn't appropriate for shallow clones and for clients
that only want to access fulltext revision data for a single node
without fetching their parent revisions.
This commit adds an "haveparents" argument to the "manifestdata"
and "filedata" commands that controls delta generation behavior.
Unless "haveparents" is set, the server assumes that the client
doesn't have parent revisions unless they were previously sent
as part of the current group of revisions.
This change allows the fulltext revision data of any individual
revision to be obtained. This will facilitate shallow clones
and other data retrieval strategies that don't require all previous
revisions of an entity to be fetched.
Differential Revision: https://phab.mercurial-scm.org/D4492
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 04 Sep 2018 10:42:24 -0700] rev 39656
exchangev2: fetch file revisions
Now that the server has an API for fetching file data, we can call into
it to fetch file revisions.
The implementation is relatively straightforward: we examine the
manifests that we fetched and find all new file revisions referenced
by them. We build up a mapping from file path to file nodes to
manifest node. (The mapping to first manifest node allows us to
map back to first changelog node/revision, which is used for the
linkrev.)
Once that map is built up, we iterate over it in a deterministic
manner and fetch and store file data. The code is very similar
to manifest fetching. So similar that we could probably extract the
common bits into a generic function.
With file data retrieval implemented, `hg clone` and `hg pull` are
effectively feature complete, at least as far as the completeness
of data transfer for essential repository data (changesets, manifests,
files, phases, and bookmarks). We're still missing support for
obsolescence markers, the hgtags fnodes cache, and the branchmap
cache. But these are non-essential for the moment (and will be
implemented later).
This is a good point to assess the state of exchangev2 in terms of
performance. I ran a local `hg clone` for the mozilla-unified
repository using both version 1 and version 2 of the wire protocols
and exchange methods. This is effectively comparing the performance
of the wire protocol overhead and "getbundle" versus domain-specific
commands. Wire protocol version 2 doesn't have compression implemented
yet. So I tested version 1 with `server.compressionengines=none` to
remove compression overhead from the equation.
server
before: user 220.420+0.000 sys 14.420+0.000
after: user 321.980+0.000 sys 18.990+0.000
client
before: real 561.650 secs (user 497.670+0.000 sys 28.160+0.000)
after: real 1226.260 secs (user 944.240+0.000 sys 354.150+0.000)
We have substantial regressions on both client and server. This
is obviously not desirable. I'm aware of some reasons:
* Lack of hgtagsfnodes transfer (contributes significant CPU to
client).
* Lack of branch cache transfer (contributes significant CPU to
client).
* Little to no profiling / optimization performed on wire protocol
version 2 code.
* There appears to be a memory leak on the client and that is likely
causing swapping on my machine.
* Using multiple threads on the client may be counter-productive because
Python.
* We're not compressing on the server.
* We're tracking file nodes on the client via manifest diffing
rather than using linkrev shortcuts on the server.
I'm pretty confident that most of these issues are addressable.
But even if we can't get wire protocol version 2 on performance parity
with "getbundle," I still think it is important to have the set of low
level data-specific retrieval commands that we have implemented so
far. This is because the existence of such commands allows flexibility
in how clients access server data.
Differential Revision: https://phab.mercurial-scm.org/D4491
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:10:17 -0700] rev 39655
wireprotov2: define and implement "filedata" command
Continuing our trend of implementing *data commands for retrieving
information about specific repository data primitives, this commit
implements a command for retrieving data about an individual tracked
file.
The command is very similar to "manifestdata." The only significant
difference is that we have a standalone function for obtaining
storage for a tracked file. This is to provide a monkeypatch point
for extensions to implement path-based access control.
With this API available, wire protocol version 2 now exposes all
data primitives necessary to implement a full clone. Of course,
since "filedata" can only resolve data for a single path at a time,
clients would need to issue N commands to perform a full clone. On
the Firefox repository, this would be ~461k commands. We'll likely
need to implement a file data retrieval command that supports
multiple paths. But that can be implemented later.
Differential Revision: https://phab.mercurial-scm.org/D4490
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:57 -0700] rev 39654
exchangev2: fetch manifest revisions
Now that the server has support for retrieving manifest data, we can
implement the client bits to call it.
We teach the changeset fetching code to capture the manifest revisions
that are encountered on incoming changesets. We then feed this into a
new function which filters out known manifests and then batches up
manifest data requests to the server.
This is different from the previous wire protocol in a few notable
ways.
First, the client fetches manifest data separately and explicitly.
Before, we'd ask the server for data pertaining to some changesets
(via a "getbundle" command) and manifests (and files) would be sent
automatically. Providing an API for looking up just manifest data
separately gives clients much more flexibility for manifest management.
For example, a client may choose to only fetch manifest data on demand
instead of prefetching it (i.e. partial clone).
Second, we send N commands to the server for manifest retrieval instead
of 1. This property has a few nice side-effects. One is that the
deterministic nature of the requests lends itself to server-side
caching. For example, say the remote has 50,000 manifests. If the
server is configured to cache responses, each time a new commit
arrives, you will have a cache miss and need to regenerate all outgoing
data. But if you makes N requests requesting 10,000 manifests each,
a new commit will still yield cache hits on the initial, unchanged
manifest batches/requests.
A derived benefit from these properties is that resumable clone is
conceptually simpler to implement. When making a monolithic request
for all of the repository data, recovering from an interrupted clone
is hard because the server was in the driver's seat and was maintaining
state about all the data that needed transferred. With the client
driving fetching, the client can persist the set of unfetched entities
and retry/resume a fetch if something goes wrong. Or we can fetch all
data N changesets at a time and slowly build up a repository. This
approach is drastically easier to implement when we have server APIs
exposing low-level repository primitives (such as manifests and files).
We don't yet support tree manifests. But it should be possible to
implement that with the existing wire protocol command.
Differential Revision: https://phab.mercurial-scm.org/D4489
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:09:52 -0700] rev 39653
wireprotov2: define and implement "manifestdata" command
The added command can be used for obtaining manifest data.
Given a manifest path and set of manifest nodes, data about
manifests can be retrieved.
Unlike changeset data, we wish to emit deltas to describe
manifest revisions. So the command uses the relatively new
API for building delta requests and emitting them.
The code calls into deltaparent(), which I'm not very keen of.
There's still work to be done in delta generation land so
implementation details of storage (e.g. exactly one delta
is stored/available) don't creep into higher levels. But we
can worry about this later (there is already a TODO on
imanifestorage tracking this).
On the subject of parent deltas, the server assumes parent revisions
exist on the receiving end. This is obviously wrong for shallow
clone. I've added TODOs to add a mechanism to the command to
allow clients to specify desired behavior. This shouldn't be
too difficult to implement.
Another big change is that the client must explicitly request
manifest nodes to retrieve. This is a major departure from
"getbundle," where the server derives relevant manifests as it
iterates changesets and sends them automatically. As implemented,
the client must transmit each requested node to the server. At
20 bytes per node, we're looking at 2 MB per 100,000 nodes. Plus
wire encoding overhead. This isn't ideal for clients with limited
upload bandwidth. I plan to address this in the future by allowing
alternate mechanisms for defining the revisions to retrieve. One
idea is to define a range of changeset revisions whose manifest
revisions to retrieve (similar to how "changesetdata" works).
We almost certainly want an API to look up an individual manifest
by node. And that's where I've chosen to start with the implementation.
Again, a theme of this early exchangev2 work is I want to start by
building primitives for accessing raw repository data first and see
how far we can get with those before we need more complexity.
Differential Revision: https://phab.mercurial-scm.org/D4488
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 22 Aug 2018 14:51:11 -0700] rev 39652
wireprotov2: add TODOs around extending changesetdata fields
Extensions will inevitably want to extend the set of changeset
data/fields that can be requested. We'll need to implement support
for extending this in the future. Add some TODOs to track that.
Differential Revision: https://phab.mercurial-scm.org/D4487
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 29 Aug 2018 17:03:19 -0700] rev 39651
exchangev2: fetch and apply bookmarks
This is pretty similar to phases data. We collect bookmarks data
as we process records. Then at the end we make a call to the
bookmarks subsystem to reflect the remote's bookmarks.
Like phases, the code for handling bookmarks is vastly simpler
than the previous wire protocol code because the server always
transfers the full set of bookmarks when bookmarks are requested.
We don't have to keep track of whether we requested bookmarks or
not.
Differential Revision: https://phab.mercurial-scm.org/D4486
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 23 Aug 2018 18:14:19 -0700] rev 39650
wireprotov2: add bookmarks to "changesetdata" command
Like we did for phases, we want to emit bookmarks data attached
to each changeset.
The approach here is very similar to phases: we emit bookmarks
data inline with requested revision data. But we emit
records for nodes that weren't requested as well so consumers have
access to the full set of defined bookmarks.
Differential Revision: https://phab.mercurial-scm.org/D4485
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:58 -0700] rev 39649
exchangev2: fetch and apply phases data
Now that the server supports emitting phases data, we can request it
and apply it on the client.
Because we may receive phases-only updates from the server, we no
longer conditionally perform the "changesetdata" command depending
on whether there are revisions to fetch. In the previous wire
protocol, this case would result in us falling back to performing
"listkeys" commands to look up phases, bookmarks, etc data. But
since "changesetdata" is smart enough to handle metadata only
fetches, we can keep things consistent.
It's worth noting that because of the unified approach to changeset
data retrieval, phase handling code in wire proto v2 exchange is
drastically simpler. Contrast with all the code in exchange.py
dealing with all the variations for obtaining phases data.
Differential Revision: https://phab.mercurial-scm.org/D4484
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 28 Aug 2018 18:19:23 -0700] rev 39648
wireprotov2: add phases to "changesetdata" command
This commit teaches the "changesetdata" wire protocol command
to emit the phase state for each changeset.
This is a different approach from existing phase transfer in a
few ways. Previously, if there are no new revisions (or we're
not using bundle2), we perform a "listkeys" request to retrieve
phase heads. And when revision data is being transferred
with bundle2, phases data is encoded in a standalone bundle2 part.
In both cases, phases data is logically decoupled from the changeset
data and is encountered/applied after changeset revision data
is received.
The new wire protocol purposefully tries to more tightly associate
changeset metadata (phases, bookmarks, obsolescence markers, etc)
with the changeset revision and index data itself, rather than
have it live as a separate entity that must be fetched and
processed separately. I reckon that one reason we didn't do this
before was it was difficult to add new data types/fields without
breaking existing consumers. By using CBOR maps to transfer
changeset data and putting clients in control of what fields are
requested / present in those maps, we can easily add additional
changeset data while maintaining backwards compatibility. I believe
this to be a superior approach to the problem.
That being said, for performance reasons, we may need to resort
to alternative mechanisms for transferring data like phases. But
for now, I think giving the wire protocol the ability to transfer
changeset metadata next to the changeset itself is a powerful feature
because it is a raw, changeset-centric data API. And if you build
simple APIs for accessing the fundamental units of repository data,
you enable client-side experimentation (partial clone, etc). If it
turns out that we need specialized APIs or mechanisms for transferring
data like phases, we can build in those APIs later. For now, I'd
like to see how far we can get on simple APIs.
It's worth noting that when phase data is being requested, the
server will also emit changeset records for nodes in the bases
specified by the "noderange" argument. This is to ensure that
phase-only updates for nodes the client has are available to the
client, even if no new changesets will be transferred.
Differential Revision: https://phab.mercurial-scm.org/D4483
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:36 -0700] rev 39647
exchangev2: fetch changeset revisions
All Mercurial repository data is derived from changesets:
you can't do anything unless you have changesets. Therefore,
it makes sense for changesets to be the first piece of data
that we transfer as part of pull.
To do this, we call our new "changesetdata" command, requesting
parents and revision data. This gives us all the data that a
changegroup delta group would give us. We simply normalize
this data into what addgroup() expects and call that API on
the changelog to bulk insert revisions into the changelog.
Code in this commit is heavily borrowed from
changegroup.cg1unpacker.apply().
Differential Revision: https://phab.mercurial-scm.org/D4482
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 10:01:16 -0700] rev 39646
wireprotov2: define and implement "changesetdata" command
This commit introduces the "changesetdata" wire protocol command.
The role of the command is to expose data associated with changelog
revisions, including the raw revision data itself.
This command is the first piece of a new clone/pull strategy that
is built on top of domain-specific commands for data retrieval.
Instead of a monolithic "getbundle" command that transfers all of the
things, we'll be introducing commands for fetching specific pieces
of data.
Since the changeset is the fundamental unit from which we derive
pointers to other data (manifests, file nodes, etc), it makes sense
to start reimplementing pull with this data.
The command accepts as arguments a set of root and head revisions
defining the changesets that should be fetched as well as an explicit
list of nodes. By default, the command returns only the node values:
the client must explicitly request additional fields be added to the
response. Current supported fields are the list of parent nodes and
the revision fulltext.
My plan is to eventually add support for transferring other data
associated with changesets, including phases, bookmarks, obsolescence
markers, etc. Since the response format is CBOR, we'll be able to add
this data into the response object relatively easily (it should be
as simple as adding a key in a map).
The documentation captures a number of TODO items. Some of these may
require BC breaking changes. That's fine: wire protocol v2 is still
highly experimental.
Differential Revision: https://phab.mercurial-scm.org/D4481
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 09:58:23 -0700] rev 39645
exchangev2: start to implement pull with wire protocol v2
Wire protocol version 2 will take a substantially different
approach to exchange than version 1 (at least as far as pulling
is concerned).
This commit establishes a new exchangev2 module for holding
code related to exchange using wire protocol v2. I could have
added things to the existing exchange module. But it is already
quite big. And doing things inline isn't in question because
the existing code is already littered with conditional code
for various states of support for the existing wire protocol
as it evolved over 10+ years. A new module gives us a chance
to make a clean break.
This approach does mean we'll end up writing some duplicate
code. And there's a significant chance we'll miss functionality
as code is ported. The plan is to eventually add #testcase's
to existing tests so the new wire protocol is tested side-by-side
with the existing one. This will hopefully tease out any
features that weren't ported properly. But before we get there,
we need to build up support for the new exchange methods.
Our journey towards implementing a new exchange begins with pulling.
And pulling begins with discovery.
The discovery code added to exchangev2 is heavily drawn from
the following functions:
* exchange._pulldiscoverychangegroup
* discovery.findcommonincoming
For now, we build on top of existing discovery mechanisms. The
new wire protocol should be capable of doing things more efficiently.
But I'd rather defer on this problem.
To foster the transition, we invent a fake capability on the HTTPv2
peer and have the main pull code in exchange.py call into exchangev2
when the new wire protocol is being used.
Differential Revision: https://phab.mercurial-scm.org/D4480
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 21 Aug 2018 15:33:11 -0700] rev 39644
httppeer: expose capabilities for each command
This will help code using peers to sniff out exactly what servers
support.
Differential Revision: https://phab.mercurial-scm.org/D4436
spectral <spectral@google.com> [Thu, 13 Sep 2018 22:48:27 -0700] rev 39643
narrow: intersect provided matcher with narrowmatcher in `hg diff`
This provides significant speedups when running diff, and no change in behavior
that I'm aware of (or that the tests found). I tested with a repo that I started
using narrow in after it was created and attempted to run `hg diff -c .` and
similar commands in it on a commit that had files not in the narrowspec.
Timing numbers below, using a similar setup as my previous commits.
before=9db85644, m-u is mozilla-unified at eb39298e432d (flatmanifest) and
0553b7f29eaf (treemanifest). l-d-r is a repo simulating a situation I've
encountered where there's one directory with 30k+ subdirectories. N means
narrow, T means treemanifest. The narrowspec is pretty small when in use, and
importantly the narrowspec is applied *after* doing the initial checkout
(without narrowing), so all of these files exist in the filesystem, which is not
normally the case if someone has been using narrow for the entire life of the
clone.
Anything less than a 5% difference in performance is most likely noise.
diff --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.292 s +- 0.009 s | 1.295 s +- 0.010 s | 100.2%
m-u | | x | 1.296 s +- 0.042 s | 1.299 s +- 0.026 s | 100.2%
m-u | x | | 1.292 s +- 0.010 s | 1.297 s +- 0.021 s | 100.4%
m-u | x | x | 84.2 ms +- 1.2 ms | 83.6 ms +- 0.2 ms | 99.3%
l-d-r | | | 188.7 ms +- 2.7 ms | 188.8 ms +- 2.0 ms | 100.1%
l-d-r | | x | 189.9 ms +- 1.5 ms | 189.4 ms +- 1.2 ms | 99.7%
l-d-r | x | | 97.1 ms +- 1.0 ms | 87.1 ms +- 1.0 ms | 89.7% <--
l-d-r | x | x | 96.9 ms +- 0.8 ms | 87.2 ms +- 0.7 ms | 90.0% <--
diff -c . --git:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 231.6 ms +- 3.1 ms | 228.9 ms +- 1.6 ms | 98.8%
m-u | | x | 150.5 ms +- 1.7 ms | 150.7 ms +- 1.4 ms | 100.1%
m-u | x | | 233.7 ms +- 2.4 ms | 232.2 ms +- 1.9 ms | 99.4%
m-u | x | x | 126.1 ms +- 1.2 ms | 126.8 ms +- 1.2 ms | 100.6%
l-d-r | | | 82.1 ms +- 2.0 ms | 81.8 ms +- 1.4 ms | 99.6%
l-d-r | | x | 3.732 s +- 0.020 s | 3.746 s +- 0.027 s | 100.4%
l-d-r | x | | 83.1 ms +- 0.8 ms | 107.6 ms +- 2.4 ms | 129.5% <--
l-d-r | x | x | 758.2 ms +- 38.8 ms | 188.5 ms +- 1.8 ms | 24.9% <--
rebase -r . --keep -d .^^:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 5.532 s +- 0.087 s | 5.496 s +- 0.016 s | 99.3%
m-u | | x | 5.554 s +- 0.061 s | 5.532 s +- 0.013 s | 99.6%
m-u | x | | 5.602 s +- 0.134 s | 5.508 s +- 0.035 s | 98.3%
m-u | x | x | 582.2 ms +- 15.2 ms | 572.9 ms +- 12.0 ms | 98.4%
l-d-r | | | 629.5 ms +- 12.3 ms | 622.5 ms +- 7.3 ms | 98.9%
l-d-r | | x | 6.173 s +- 0.062 s | 6.185 s +- 0.076 s | 100.2%
l-d-r | x | | 274.5 ms +- 10.0 ms | 272.1 ms +- 6.2 ms | 99.1%
l-d-r | x | x | 4.835 s +- 0.056 s | 4.826 s +- 0.034 s | 99.8%
status --change . --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 214.4 ms +- 1.4 ms | 212.2 ms +- 1.7 ms | 99.0%
m-u | | x | 130.9 ms +- 1.2 ms | 131.7 ms +- 1.1 ms | 100.6%
m-u | x | | 215.0 ms +- 2.1 ms | 214.9 ms +- 2.7 ms | 100.0%
m-u | x | x | 109.5 ms +- 2.3 ms | 107.8 ms +- 0.9 ms | 98.4%
l-d-r | | | 79.6 ms +- 0.9 ms | 79.8 ms +- 1.6 ms | 100.3%
l-d-r | | x | 3.799 s +- 0.037 s | 3.928 s +- 0.021 s | 103.4% <--?
l-d-r | x | | 82.7 ms +- 0.7 ms | 83.2 ms +- 1.0 ms | 100.6%
l-d-r | x | x | 746.8 ms +- 6.1 ms | 739.0 ms +- 4.2 ms | 99.0%
status --copies:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 1.884 s +- 0.012 s | 1.885 s +- 0.013 s | 100.1%
m-u | | x | 1.897 s +- 0.027 s | 1.909 s +- 0.077 s | 100.6%
m-u | x | | 1.886 s +- 0.021 s | 1.891 s +- 0.030 s | 100.3%
m-u | x | x | 92.0 ms +- 0.7 ms | 92.4 ms +- 0.4 ms | 100.4%
l-d-r | | | 570.3 ms +- 18.7 ms | 552.2 ms +- 4.5 ms | 96.8%
l-d-r | | x | 568.9 ms +- 16.1 ms | 567.2 ms +- 11.9 ms | 99.7%
l-d-r | x | | 171.1 ms +- 2.5 ms | 170.4 ms +- 1.2 ms | 99.6%
l-d-r | x | x | 171.6 ms +- 3.4 ms | 171.5 ms +- 1.7 ms | 99.9%
update $rev^; ~/src/hg/hg{hg}/hg update $rev:
repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u | | | 3.107 s +- 0.017 s | 3.116 s +- 0.012 s | 100.3%
m-u | | x | 2.943 s +- 0.010 s | 2.945 s +- 0.019 s | 100.1%
m-u | x | | 3.116 s +- 0.033 s | 3.118 s +- 0.027 s | 100.1%
m-u | x | x | 318.5 ms +- 2.7 ms | 320.8 ms +- 4.8 ms | 100.7%
l-d-r | | | 428.9 ms +- 4.4 ms | 429.5 ms +- 4.0 ms | 100.1%
l-d-r | | x | 9.593 s +- 0.081 s | 9.869 s +- 0.043 s | 102.9%
l-d-r | x | | 253.2 ms +- 3.6 ms | 254.0 ms +- 2.8 ms | 100.3%
l-d-r | x | x | 1.613 s +- 0.009 s | 1.630 s +- 0.017 s | 101.1%
Differential Revision: https://phab.mercurial-scm.org/D4587
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 12:15:02 +0900] rev 39642
identify: change {parents} to a list of nodes (BC)
This is a part of the name unification. {parents} is a list of nodes in
"hg log -Tjson" output. Since {rev} can be computed from (repo, node) pair,
we no longer need to put it to provide {rev} to user templates.
https://www.mercurial-scm.org/wiki/GenericTemplatingPlan#Dictionary
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 12:09:22 +0900] rev 39641
identify: use fm.hexfunc thoroughly
This fixes the length of {id} in JSON and template outputs.
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 15:52:18 +0900] rev 39640
formatter: replace contexthint() with demand loading of ctx object
And pass in repo instead to resolve ctx from (repo, node) pair.
Yuya Nishihara <yuya@tcha.org> [Thu, 07 Jun 2018 21:48:11 +0900] rev 39639
formatter: populate ctx from repo and node value
This will basically replace the fm.contexthint() API. I originally thought
this would be too complicated, and I wrote 8399438bc7ef "formatter: provide
hint of context keys required by template" because of that. However, I had
to add a similar mechanism for fctx templates, and the overall machinery
became way simpler than my original patch.
The test output slightly changed as {author} is no longer available in
the {manifest} context, which isn't the point this test targeted on.
Augie Fackler <augie@google.com> [Fri, 14 Sep 2018 18:18:46 -0400] rev 39638
merge with stable
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:37:20 +0300] rev 39637
py3: call hgweb.hgweb() with bytes values
# skip-blame because just b'' prefixes
I believe this should fix some tests.
Differential Revision: https://phab.mercurial-scm.org/D4594
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:24:05 +0300] rev 39636
py3: use '%d' for integers instead of '%s'
Differential Revision: https://phab.mercurial-scm.org/D4593
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:17:56 +0300] rev 39635
py3: use "%f" for floats instead of "%s"
Differential Revision: https://phab.mercurial-scm.org/D4592
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:01:52 +0300] rev 39634
py3: suppress the return value from .write() call
Differential Revision: https://phab.mercurial-scm.org/D4591
Pulkit Goyal <pulkit@yandex-team.ru> [Sat, 15 Sep 2018 00:01:20 +0300] rev 39633
py3: add b'' prefixes in tests/test-diff-color.t
# skip-blame because just b'' prefixes
Differential Revision: https://phab.mercurial-scm.org/D4590
Pulkit Goyal <pulkit@yandex-team.ru> [Fri, 14 Sep 2018 23:59:41 +0300] rev 39632
py3: slice through bytes to prevent getting ascii value
I still don't know why python-dev thought it was a nice idea to do this.
Differential Revision: https://phab.mercurial-scm.org/D4589
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Thu, 13 Sep 2018 16:22:53 -0400] rev 39631
censor: use a reasonable amount of memory
Before this change, trying to censor some random revision uses an ever
increasing amount of memory (I stopped at 20GB, but it was by no means
finished), presumably because these contexts have a lot of
information that is kept alive.
After this change, the memory usage plateaus quickly.
Differential Revision: https://phab.mercurial-scm.org/D4582
Yuya Nishihara <yuya@tcha.org> [Fri, 14 Sep 2018 22:25:44 +0900] rev 39630
help: add internals.wireprotocolrpc to the table
Yuya Nishihara <yuya@tcha.org> [Fri, 14 Sep 2018 22:23:02 +0900] rev 39629
setup: exclude vendored futures package on Python 3
The vendored future can't live on Python 3.
Augie Fackler <augie@google.com> [Thu, 13 Sep 2018 11:08:08 -0400] rev 39628
py3: whitelist another passing test
Differential Revision: https://phab.mercurial-scm.org/D4562
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 00:42:25 -0400] rev 39627
py3: prevent the win32 ctype _fields_ from being transformed to bytes
Otherwise, any hg invocation dies with
TypeError: '_fields_' must be a sequence of (name, C type) pairs
# skip-blame just a r prefix
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 17:32:20 -0400] rev 39626
cext: fix warnings when building for py3 on Windows
MSVC++ 14 now has standard int types that don't need to be redefined (I didn't
go back to see when they came along since the build system wants either 2008 or
2015), but doesn't have ssize_t. The FILE pointer in posixfile is only used on
python2.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 12:43:50 -0400] rev 39625
cext: stop preprocessing a partial function call
MSVC++ 14 yelled:
mercurial/cext/revlog.c(1913): fatal error C1057: unexpected end of file in
macro expansion
At this point, the C extensions build (with warnings), and it dies in win32.py
because the `_fields_` strings in the ctypes classes are being converted to
bytes by the source translator.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 13 Sep 2018 12:37:32 -0400] rev 39624
py3: add b'' to some setup.py strings for Windows
These were things found trying to do `make PYTHON="py -3" local`. The following
is dumped out, before dying while compiling the C extensions:
C:\Program Files\Python37\lib\site-packages\setuptools\dist.py:406: UserWarning:
The version specified (b'4.7.1') is an invalid version, this may not work as
expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440
for more details.
"details." % self.metadata.version
running build_py
byte-compiling .\mercurial\thirdparty\concurrent\futures\_base.py to _base.cpython-37.pyc
File "mercurial\thirdparty\concurrent\futures\_base.py", line 416
raise exception_type, self._exception, self._traceback
^
SyntaxError: invalid syntax
# skip-blame since these are just converting to bytes literals
Augie Fackler <augie@google.com> [Thu, 13 Sep 2018 18:09:22 -0400] rev 39623
dagop: fix typo spotted while doing unrelated investigation
Differential Revision: https://phab.mercurial-scm.org/D4584
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 12 Sep 2018 19:00:46 -0700] rev 39622
hg: don't reuse repo instance after unshare()
Unsharing a repository is a pretty invasive procedure and fundamentally
changes the behavior of the repository.
Currently, hg.unshare() calls into localrepository.__init__ to
re-initialize the repository instance. This is a bit hacky. And
future commits that refactor how localrepository instances are
constructed will make this difficult to support.
This commit changes unshare() so it constructs a new repo instance
once the unshare I/O has completed. It then poisons the old repo
instance so any further use will result in error.
Surprisingly, nothing in core appears to access a repo instance
after it has been unshared!
.. api::
``hg.unshare()`` now poisons the repo instance so it can't be used.
It also returns a new repo instance suitable for interacting with
the unshared repository.
Differential Revision: https://phab.mercurial-scm.org/D4557
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 20:06:39 -0700] rev 39621
unionrepo: dynamically create repository type from base repository
This is basically the same thing we just did for bundlerepo except
for union repositories.
.. api::
``unionrepo.unionrepository()`` is no longer usable on its own.
To instantiate an instance, call ``unionrepo.instance()`` or
``unionrepo.makeunionrepository()``.
Differential Revision: https://phab.mercurial-scm.org/D4556
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 19:50:07 -0700] rev 39620
bundlerepo: dynamically create repository type from base repository
Previously, bundlerepository inherited from localrepo.localrepository.
You simply instantiated a bundlerepository and its __init__ called
localrepo.localrepository.__init__. Things were simple.
Unfortunately, this strategy is limiting because it assumes that
the base repository is a localrepository instance. And it assumes
various properties of localrepository, such as the arguments its
__init__ takes. And it prevents us from changing behavior of
localrepository.__init__ without also having to change derived classes.
Previous and ongoing work to abstract storage revealed these
limitations.
This commit changes the initialization strategy of bundle repositories
to dynamically create a type to represent the repository. Instead of
a static type, we instantiate a new local repo instance via
localrepo.instance(). We then combine its __class__ with
bundlerepository to produce a new type. This ensures that no matter
how localrepo.instance() decides to create a repository object, we
can derive a bundle repo object from it. i.e. localrepo.instance()
could return a type that isn't a localrepository and it would "just
work."
Well, it would "just work" if bundlerepository's custom implementations
only accessed attributes in the documented repository interface. I'm
pretty sure it violates the interface contract in a handful of
places. But we can worry about that another day. This change gets us
closer to doing more clever things around instantiating repository
instances without having to worry about teaching bundlerepository about
them.
.. api::
``bundlerepo.bundlerepository`` is no longer usable on its own.
The class is combined with the class of the base repository it is
associated with at run-time.
New bundlerepository instances can be obtained by calling
``bundlerepo.instance()`` or ``bundlerepo.makebundlerepository()``.
Differential Revision: https://phab.mercurial-scm.org/D4555
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 19:16:32 -0700] rev 39619
bundlerepo: factor out code for instantiating a bundle repository
This code will soon become a bit more complicated. So extract to its
own function.
And change both instantiators of bundlerepository to use it.
Differential Revision: https://phab.mercurial-scm.org/D4554
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 18:45:05 -0700] rev 39618
bundlerepo: pass create=True
I don't want to know how this came to be. Maybe a holdover from the
days before Python had a bool type?
Differential Revision: https://phab.mercurial-scm.org/D4553
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 18:41:14 -0700] rev 39617
shelve: use bundlerepo.instance() to construct a repo object
The instance() functions are preferred over cls.__init__ for
creating repo instances. It doesn't really matter now. But future
commits will refactor the bundlerepository class in ways that will
cause the old way to break.
Differential Revision: https://phab.mercurial-scm.org/D4552
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 22:04:01 +0900] rev 39616
templatekw: add experimental {status} keyword
This is another example of fctx-based keywords. I think this is somewhat
useful in log templates.
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 21:52:01 +0900] rev 39615
templatekw: add option to include ignored/clean/unknown files in cache
They will be necessary to provide {status} of files.
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 22:07:42 +0900] rev 39614
templatekw: keep status tuple in cache dict and rename cache key accordingly
There's no point to drop tail elements, which are mostly empty lists.
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 21:39:12 +0900] rev 39613
templatekw: extract function that computes and caches file status
Yuya Nishihara <yuya@tcha.org> [Thu, 13 Sep 2018 22:32:51 +0900] rev 39612
py3: use sysstr() to convert ProgrammingError bytes with no unicode error risk
msg.decode('utf8') may fail if msg isn't an ASCII string, and that's possible
as we sometimes embed a filename in the error message for example.
Boris Feld <boris.feld@octobus.net> [Mon, 10 Sep 2018 08:31:41 +0200] rev 39611
revlog: reuse cached delta for identical base revision (issue5975)
Since 8f83a953dddf, we skip over empty deltas when choosing a delta base. Such
delta happens when two distinct revisions have the same content.
The remote might be sending a delta against such revision within the bundle.
In that case, the delta base is no longer considered, but the cached one could
still, be used with the equivalent revision.
Not reusing the delta from the bundle can have a significant performance
impact, so we now make sure with doing so when possible.
Boris Feld <boris.feld@octobus.net> [Mon, 10 Sep 2018 10:11:21 +0200] rev 39610
snapshot: fix line order when skipping over empty deltas
The code movement in 37957e07138c introduced an error.
Since 8f83a953dddf, we discarded some revisions because they are identical to
their delta base (and use that delta base instead). That logic is good,
however, in 37957e07138c we mixed up the order of two line, adding the "new"
revision to the set of already tested one, instead of the discarded one. So in
practice, we were never investigating any revisions in a chain starting with
an empty delta. Creating significantly worst delta chain (eg: Mercurial's
manifest move goes from about 60MB up to about 80MB).
Matt Harbison <matt_harbison@yahoo.com> [Wed, 12 Sep 2018 23:10:59 -0400] rev 39609
tests: stabilize change for handling not quoting non-empty-directory
The change originated in cb1329738d64. I suspect the problem is with the
combination of (re) and the '\' to '/' retry on Windows. I've no idea if py3 on
Windows needs the quoting, since it can't even run `hg` with no arguments.
(It's dying somewhere on the ctype declarations when win32.py is imported.)
Augie Fackler <augie@google.com> [Tue, 21 Aug 2018 15:25:46 -0400] rev 39608
hg: wrap the highest layer in the `hg` script possible in trace event
This should help us have a better idea of what "interpreter startup
costs" look like. This does omit the HGUNICODEPEDANTRY block and the
LIBDIR dancing to set up sys.path, but the former is usually off and
the latter is unavoidable and should be very fast. If we get worried
about those cases we can consider open-coding the tracing logic here.
Differential Revision: https://phab.mercurial-scm.org/D4346
Martin von Zweigbergk <martinvonz@google.com> [Wed, 12 Sep 2018 12:01:32 -0700] rev 39607
localrepo: use urllocalpath() for path to create repo too
It looks like this was lost in 7ce9dea3a14a (localrepo: move repo
creation logic out of localrepository.__init__ (API), 2018-09-11). I
don't know when it makes a difference (maybe on Windows, since
urllocalpath() mentions something about drive letters).
Differential Revision: https://phab.mercurial-scm.org/D4550
Martin von Zweigbergk <martinvonz@google.com> [Wed, 12 Sep 2018 08:41:00 -0700] rev 39606
localrepo: move check for existing repo into createrepository()
For symmetry with the check for existence of a repo in
localrepository.__init__, we should check for the non-existence in
createrepository(). We could alternatively move both checks into
instance().
Differential Revision: https://phab.mercurial-scm.org/D4549
Matt Harbison <matt_harbison@yahoo.com> [Wed, 12 Sep 2018 21:32:08 -0400] rev 39605
py3: add b'' to some run-tests.py strings for Windows
Things go seriously off the rails after this, so there may be more that are
missing.
# skip-blame since these are just converting to bytes literals
Augie Fackler <raf@durin42.com> [Wed, 12 Sep 2018 19:14:28 -0400] rev 39604
wireprotov1peer: forward __name__ of wrapped method in batchable decorator
Not required, but clarifies debugging when the going gets really tough.
Differential Revision: https://phab.mercurial-scm.org/D4551
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 21:28:51 +0900] rev 39603
templatekw: add {size} keyword as an example of fctx-based keyword
I'll add {status}, and I think some lfs keywords can be migrated to this.
I'm not certain how many fctx-based keywords will be introduced into the
global space, but if there are a couple more, we'll probably need to sort
them out to the "File Keywords" section in the templater help. Until then,
fctx keywords are hidden as experimental.
Yuya Nishihara <yuya@tcha.org> [Sun, 29 Jul 2018 21:25:37 +0900] rev 39602
formatter: populate fctx from ctx and path value
Tests will be added by the next patch.
Yuya Nishihara <yuya@tcha.org> [Thu, 07 Jun 2018 21:36:13 +0900] rev 39601
formatter: factor out function that detects node change and document it
This prepares for demand loading of ctx/fctx objects. With this change,
'revcache' is also recreated if 'node' value changes, which will be needed
to support loading of ctx from (repo, node) pair.
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 15:06:05 +0900] rev 39600
formatter: inline _gettermap and _knownkeys
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Sep 2018 13:21:45 +0900] rev 39599
formatter: fill missing resources by formatter, not by resource mapper
While working on demand loading of ctx/fctx objects, I found it's weird
to support lookup in both directions. For instance, fctx can be loaded
from (ctx, path) pair, but ctx may also be derived from fctx.changectx()
in the original mapping. If the original mapping has had fctx but no ctx,
and if the new mapping provides {path}, we can't be sure if fctx should be
updated by fctx'.changectx()[path] or not.
This patch simply drops the support for the resolution in fctx -> ctx -> repo
direction.
Yuya Nishihara <yuya@tcha.org> [Thu, 07 Jun 2018 23:27:54 +0900] rev 39598
templater: remove unused context argument from most resourcemapper functions
While working on demand loading of ctx/fctx objects, I noticed that it's quite
easy to create infinite recursion by carelessly using the template context in
the resource mapper. Let's make that not happen.
Yuya Nishihara <yuya@tcha.org> [Mon, 10 Sep 2018 20:57:18 +0900] rev 39597
ancestor: remove extra generator from lazyancestors.__iter__()
Martin von Zweigbergk <martinvonz@google.com> [Wed, 12 Sep 2018 11:24:51 -0700] rev 39596
localrepo: fix a mixmatched arg name in createrepository() docstring
Differential Revision: https://phab.mercurial-scm.org/D4548
Augie Fackler <augie@google.com> [Wed, 12 Sep 2018 11:37:34 -0400] rev 39595
error: ensure ProgrammingError message is always a str
Since this error is internal-only and a runtime error, let's give it a
treatment that makes it behave identically when repr()d on both Python
2 and Python 3.
Differential Revision: https://phab.mercurial-scm.org/D4545
Augie Fackler <augie@google.com> [Wed, 12 Sep 2018 11:39:48 -0400] rev 39594
py3: whitelist a test caught by the ratchet
Differential Revision: https://phab.mercurial-scm.org/D4547
Augie Fackler <augie@google.com> [Wed, 12 Sep 2018 11:38:46 -0400] rev 39593
tests: handle Python 3 not quoting non-empty-directory error
I assume this happens on Windows too, so I did the same regex on both
versions of the output. The whole message printed by these aborts
comes from Python, so if we want to exert control over the quoting
here it'll be a bit of a pain.
Differential Revision: https://phab.mercurial-scm.org/D4546
Pulkit Goyal <pulkit@yandex-team.ru> [Wed, 12 Sep 2018 17:45:43 +0300] rev 39592
context: don't count deleted files as candidates for path conflicts in IMM
This patch makes sure we don't consider the deleted files in our IMM wctx
as potential conflicts while calculating paths conflicts. This fixes the bug
demonstrated in previous patch.
Differential Revision: https://phab.mercurial-scm.org/D4543
Pulkit Goyal <pulkit@yandex-team.ru> [Wed, 12 Sep 2018 17:22:46 +0300] rev 39591
rebase: add tests showing patch conflict detection needs to be smarter in IMM
This patch adds test which shows that you can't rebase a cset which removes a
dir and adds a file of the same as that of dir as there are False positives
path conflicts reported.
I fixed the case when there is a file and we adds a dir of same name while
removing the file, but missed testing the current case. Next patch will fix
this.
Differential Revision: https://phab.mercurial-scm.org/D4544
Anton Shestakov <av6@dwimlabs.net> [Mon, 10 Sep 2018 16:47:02 +0800] rev 39590
zsh_completion: add new and remove deprecated flags
Differential Revision: https://phab.mercurial-scm.org/D4519
Anton Shestakov <av6@dwimlabs.net> [Mon, 10 Sep 2018 16:43:49 +0800] rev 39589
zsh_completion: update various arguments, descriptions, metavariables
Addition of "=" means the flag must have an argument after it.
Differential Revision: https://phab.mercurial-scm.org/D4518
Pulkit Goyal <pulkit@yandex-team.ru> [Wed, 05 Sep 2018 01:18:29 +0530] rev 39588
setup: don't support py 3.5.0, 3.5.1, 3.5.2 because of bug in codecs
codecs.escape_encode() raises SystemError if an empty bytestring is passed. We
do that at some places in our code and because of this bug, things break.
Therefore we can't support the mentioned version. The bug was fixed in 3.5.3,
3.6.0 beta 2. We can't support 3.6.0 anyway because of bug in formatting
bytestrings.
Link to the python bug: https://bugs.python.org/issue25270
Differential Revision: https://phab.mercurial-scm.org/D4475
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 07 Sep 2018 10:18:20 -0700] rev 39587
util: update lrucachedict order during get()
get() should have the same semantics as __getitem__ for item
retrieval.
Differential Revision: https://phab.mercurial-scm.org/D4506
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 18:04:27 -0700] rev 39586
util: lower water mark when removing nodes after cost limit reached
See the inline comment for the reasoning here. This is a pretty
common strategy for garbage collectors, other cache-like primtives.
The performance impact is substantial:
$ hg perflrucachedict --size 4 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 100
! inserts w/ cost limit
! wall 1.659181 comb 1.650000 user 1.650000 sys 0.000000 (best of 7)
! wall 1.722122 comb 1.720000 user 1.720000 sys 0.000000 (best of 6)
! mixed w/ cost limit
! wall 1.139955 comb 1.140000 user 1.140000 sys 0.000000 (best of 9)
! wall 1.182513 comb 1.180000 user 1.180000 sys 0.000000 (best of 9)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000
! inserts
! wall 0.679546 comb 0.680000 user 0.680000 sys 0.000000 (best of 15)
! sets
! wall 0.825147 comb 0.830000 user 0.830000 sys 0.000000 (best of 13)
! inserts w/ cost limit
! wall 25.105273 comb 25.080000 user 25.080000 sys 0.000000 (best of 3)
! wall 1.724397 comb 1.720000 user 1.720000 sys 0.000000 (best of 6)
! mixed
! wall 0.807096 comb 0.810000 user 0.810000 sys 0.000000 (best of 13)
! mixed w/ cost limit
! wall 12.104470 comb 12.070000 user 12.070000 sys 0.000000 (best of 3)
! wall 1.190563 comb 1.190000 user 1.190000 sys 0.000000 (best of 9)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000 --mixedgetfreq 90
! inserts
! wall 0.711177 comb 0.710000 user 0.710000 sys 0.000000 (best of 14)
! sets
! wall 0.846992 comb 0.850000 user 0.850000 sys 0.000000 (best of 12)
! inserts w/ cost limit
! wall 25.963028 comb 25.960000 user 25.960000 sys 0.000000 (best of 3)
! wall 2.184311 comb 2.180000 user 2.180000 sys 0.000000 (best of 5)
! mixed
! wall 0.728256 comb 0.730000 user 0.730000 sys 0.000000 (best of 14)
! mixed w/ cost limit
! wall 3.174256 comb 3.170000 user 3.170000 sys 0.000000 (best of 4)
! wall 0.773186 comb 0.770000 user 0.770000 sys 0.000000 (best of 13)
$ hg perflrucachedict --size 100000 --gets 1000000 --sets 1000000 --mixed 1000000 --mixedgetfreq 90 --costlimit 5000000
! gets
! wall 1.191368 comb 1.190000 user 1.190000 sys 0.000000 (best of 9)
! wall 1.195304 comb 1.190000 user 1.190000 sys 0.000000 (best of 9)
! inserts
! wall 0.950995 comb 0.950000 user 0.950000 sys 0.000000 (best of 11)
! inserts w/ cost limit
! wall 1.589732 comb 1.590000 user 1.590000 sys 0.000000 (best of 7)
! sets
! wall 1.094941 comb 1.100000 user 1.090000 sys 0.010000 (best of 9)
! mixed
! wall 0.936420 comb 0.940000 user 0.930000 sys 0.010000 (best of 10)
! mixed w/ cost limit
! wall 0.882780 comb 0.870000 user 0.870000 sys 0.000000 (best of 11)
This puts us ~2x slower than caches without cost accounting. And for
read-heavy workloads (the prime use cases for caches), performance is
nearly identical.
In the worst case (pure write workloads with cost accounting enabled),
we're looking at ~1.5us per insert on large caches. That seems "fast
enough."
Differential Revision: https://phab.mercurial-scm.org/D4505
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 12:40:30 -0700] rev 39585
util: optimize cost auditing on insert
Calling popoldest() on insert with cost auditing enabled introduces
significant overhead.
The primary reason for this overhead is that popoldest() needs to
walk the linked list to find the first non-empty node. When we
call popoldest() within a loop, this can become quadratic. The
performance impact is more pronounced on caches with large capacities.
This commit effectively inlines the popoldest() call into
_enforcecostlimit(). By doing so, we only do the backwards walk
to find the first empty node once. However, we still may still
perform this work on insert when the cache is near cost capacity.
So this is only a partial performance win.
$ hg perflrucachedict --size 4 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 100
! gets w/ cost limit
! wall 0.598737 comb 0.590000 user 0.590000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 1.694282 comb 1.700000 user 1.700000 sys 0.000000 (best of 6)
! wall 1.659181 comb 1.650000 user 1.650000 sys 0.000000 (best of 7)
! mixed w/ cost limit
! wall 1.157655 comb 1.150000 user 1.150000 sys 0.000000 (best of 9)
! wall 1.139955 comb 1.140000 user 1.140000 sys 0.000000 (best of 9)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000
! gets w/ cost limit
! wall 0.598526 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! wall 0.601993 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 37.838315 comb 37.840000 user 37.840000 sys 0.000000 (best of 3)
! wall 25.105273 comb 25.080000 user 25.080000 sys 0.000000 (best of 3)
! mixed w/ cost limit
! wall 18.060198 comb 18.060000 user 18.060000 sys 0.000000 (best of 3)
! wall 12.104470 comb 12.070000 user 12.070000 sys 0.000000 (best of 3)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000 --mixedgetfreq 90
! gets w/ cost limit
! wall 0.600024 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! wall 0.614439 comb 0.620000 user 0.620000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 37.154547 comb 37.120000 user 37.120000 sys 0.000000 (best of 3)
! wall 25.963028 comb 25.960000 user 25.960000 sys 0.000000 (best of 3)
! mixed w/ cost limit
! wall 4.381602 comb 4.380000 user 4.370000 sys 0.010000 (best of 3)
! wall 3.174256 comb 3.170000 user 3.170000 sys 0.000000 (best of 4)
Differential Revision: https://phab.mercurial-scm.org/D4504
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 14:04:46 -0700] rev 39584
util: teach lrucachedict to enforce a max total cost
Now that lrucachedict entries can have a numeric cost associated
with them and we can easily pop the oldest item in the cache, it
now becomes relatively trivial to implement support for enforcing
a high water mark on the total cost of items in the cache.
This commit teaches lrucachedict instances to have a max cost
associated with them. When items are inserted, we pop old items
until enough "cost" frees up to make room for the new item.
This feature is close to zero cost when not used (modulo the insertion
regressed introduced by the previous commit):
$ ./hg perflrucachedict --size 4 --gets 1000000 --sets 1000000 --mixed 1000000
! gets
! wall 0.607444 comb 0.610000 user 0.610000 sys 0.000000 (best of 17)
! wall 0.601653 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! inserts
! wall 0.678261 comb 0.680000 user 0.680000 sys 0.000000 (best of 14)
! wall 0.685042 comb 0.680000 user 0.680000 sys 0.000000 (best of 15)
! sets
! wall 0.808770 comb 0.800000 user 0.800000 sys 0.000000 (best of 13)
! wall 0.834241 comb 0.830000 user 0.830000 sys 0.000000 (best of 12)
! mixed
! wall 0.782441 comb 0.780000 user 0.780000 sys 0.000000 (best of 13)
! wall 0.803804 comb 0.800000 user 0.800000 sys 0.000000 (best of 13)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000
! init
! wall 0.006952 comb 0.010000 user 0.010000 sys 0.000000 (best of 418)
! gets
! wall 0.613350 comb 0.610000 user 0.610000 sys 0.000000 (best of 17)
! wall 0.617415 comb 0.620000 user 0.620000 sys 0.000000 (best of 17)
! inserts
! wall 0.701270 comb 0.700000 user 0.700000 sys 0.000000 (best of 15)
! wall 0.700516 comb 0.700000 user 0.700000 sys 0.000000 (best of 15)
! sets
! wall 0.825720 comb 0.830000 user 0.830000 sys 0.000000 (best of 13)
! wall 0.837946 comb 0.840000 user 0.830000 sys 0.010000 (best of 12)
! mixed
! wall 0.821644 comb 0.820000 user 0.820000 sys 0.000000 (best of 13)
! wall 0.850559 comb 0.850000 user 0.850000 sys 0.000000 (best of 12)
I reckon the slight slowdown on insert is due to added if checks.
For caches with total cost limiting enabled:
$ hg perflrucachedict --size 4 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 100
! gets w/ cost limit
! wall 0.598737 comb 0.590000 user 0.590000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 1.694282 comb 1.700000 user 1.700000 sys 0.000000 (best of 6)
! mixed w/ cost limit
! wall 1.157655 comb 1.150000 user 1.150000 sys 0.000000 (best of 9)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000
! gets w/ cost limit
! wall 0.598526 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 37.838315 comb 37.840000 user 37.840000 sys 0.000000 (best of 3)
! mixed w/ cost limit
! wall 18.060198 comb 18.060000 user 18.060000 sys 0.000000 (best of 3)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000 --costlimit 10000 --mixedgetfreq 90
! gets w/ cost limit
! wall 0.600024 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! inserts w/ cost limit
! wall 37.154547 comb 37.120000 user 37.120000 sys 0.000000 (best of 3)
! mixed w/ cost limit
! wall 4.381602 comb 4.380000 user 4.370000 sys 0.010000 (best of 3)
The functions we're benchmarking are slightly different, which could
move numbers by a few milliseconds. But the slowdown on insert is too
great to be explained by that. The slowness is due to insert heavy
operations needing to call popoldest() repeatedly when the cache is
at capacity. The next commit will address this.
Differential Revision: https://phab.mercurial-scm.org/D4503
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 07 Sep 2018 12:14:42 -0700] rev 39583
util: allow lrucachedict to track cost of entries
Currently, lrucachedict allows tracking of arbitrary items with the
only limit being the total number of items in the cache.
Caches can be a lot more useful when they are bound by the size
of the items in them rather than the number of elements in the
cache.
In preparation for teaching lrucachedict to enforce a max size of
cached items, we teach lrucachedict to optionally associate a numeric
cost value with each node.
We purposefully let the caller define their own cost for nodes.
This does introduce some overhead. Most of it comes from __setitem__,
since that function now calls into insert(), thus introducing Python
function call overhead.
$ hg perflrucachedict --size 4 --gets 1000000 --sets 1000000 --mixed 1000000
! gets
! wall 0.599552 comb 0.600000 user 0.600000 sys 0.000000 (best of 17)
! wall 0.614643 comb 0.610000 user 0.610000 sys 0.000000 (best of 17)
! inserts
! <not available>
! wall 0.655817 comb 0.650000 user 0.650000 sys 0.000000 (best of 16)
! sets
! wall 0.540448 comb 0.540000 user 0.540000 sys 0.000000 (best of 18)
! wall 0.805644 comb 0.810000 user 0.810000 sys 0.000000 (best of 13)
! mixed
! wall 0.651556 comb 0.660000 user 0.660000 sys 0.000000 (best of 15)
! wall 0.781357 comb 0.780000 user 0.780000 sys 0.000000 (best of 13)
$ hg perflrucachedict --size 1000 --gets 1000000 --sets 1000000 --mixed 1000000
! gets
! wall 0.621014 comb 0.620000 user 0.620000 sys 0.000000 (best of 16)
! wall 0.615146 comb 0.620000 user 0.620000 sys 0.000000 (best of 17)
! inserts
! <not available>
! wall 0.698115 comb 0.700000 user 0.700000 sys 0.000000 (best of 15)
! sets
! wall 0.560247 comb 0.560000 user 0.560000 sys 0.000000 (best of 18)
! wall 0.832495 comb 0.830000 user 0.830000 sys 0.000000 (best of 12)
! mixed
! wall 0.686172 comb 0.680000 user 0.680000 sys 0.000000 (best of 15)
! wall 0.841359 comb 0.840000 user 0.840000 sys 0.000000 (best of 12)
We're still under 1us per insert, which seems like reasonable
performance for a cache.
If we comment out updating of self.totalcost during insert(),
performance of insert() is identical to __setitem__ before. However,
I don't want to make total cost evaluation lazy because it has
significant performance implications for when we need to evaluate the
total cost at mutation time (it requires a cache traversal, which could
be expensive for large caches).
Differential Revision: https://phab.mercurial-scm.org/D4502
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 23:15:20 -0700] rev 39582
util: add a popoldest() method to lrucachedict
This allows consumers to prune the oldest item from the cache. This
could be useful for e.g. a consumer that wishes for the size of
items tracked by the cache to remain under a high water mark.
Differential Revision: https://phab.mercurial-scm.org/D4501
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 11:40:20 -0700] rev 39581
util: ability to change capacity when copying lrucachedict
This will allow us to easily replace an lrucachedict with one
with a higher or lower capacity as consumers deem necessary.
IMO it is easier to just create a new cache instance than to
muck with the capacity of an existing cache. Mutating an existing
cache's capacity feels more prone to bugs.
Differential Revision: https://phab.mercurial-scm.org/D4500
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 11:37:27 -0700] rev 39580
util: make capacity a public attribute on lrucachedict
So others can query it. Useful for operations that may want to verify
the cache has capacity for N items before it performs an operation that
may cause cache eviction.
Differential Revision: https://phab.mercurial-scm.org/D4499
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 11:33:40 -0700] rev 39579
util: properly copy lrucachedict instances
Previously, copy() only worked if the cache was full. We teach
copy() to only copy defined nodes.
Differential Revision: https://phab.mercurial-scm.org/D4498
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 06 Sep 2018 11:27:25 -0700] rev 39578
tests: rewrite test-lrucachedict.py to use unittest
This makes the code so much easier to test and debug.
Along the way, I discovered a bug in copy(), which I kind of
added test coverage for.
Differential Revision: https://phab.mercurial-scm.org/D4497
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 29 Aug 2018 15:17:11 -0700] rev 39577
wireprotov2peer: stream decoded responses
Previously, wire protocol version 2 would buffer all response data.
Only once all data was received did we CBOR decode it and resolve
the future associated with the command. This was obviously not
desirable. In future commits that introduce large response payloads,
this caused significant memory bloat and slowed down client
operations due to waiting on the server.
This commit refactors the response handling code so that response
data can be streamed.
Command response objects now contain a buffered CBOR decoder. As
new data arrives, it is fed into the decoder. Decoded objects are
made available to the generator as they are decoded.
Because there is a separate thread processing incoming frames and
feeding data into the response object, there is the potential for
race conditions when mutating response objects. So a lock has been
added to guard access to critical state variables.
Because the generator emitting decoded objects needs to wait on
those objects to become available, we've added an Event for the
generator to wait on so it doesn't busy loop. This does mean
there is the potential for deadlocks. And I'm pretty sure they can
occur in some scenarios. We already have a handful of TODOs around
this. But I've added some more. Fixing this will likely require
moving the background thread receiving frames into clienthandler.
We likely would have done this anyway when implementing the client
bits for the SSH transport.
Test output changes because the initial CBOR map holding the overall
response state is now always handled internally by the response
object.
Differential Revision: https://phab.mercurial-scm.org/D4474
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 29 Aug 2018 16:43:17 -0700] rev 39576
wireprotoframing: buffer emitted data to reduce frame count
An upcoming commit introduces a wire protocol command that can emit
hundreds of thousands of small objects. Without a buffering layer,
we would emit a single, small frame for every object. Performance
profiling revealed this to be a source of significant overhead for
both client and server.
This commit introduces a very crude buffering layer so that we emit
fewer, bigger frames in such a scenario. This code will likely get
rewritten in the future to be part of the streams API, as we'll
need a similar strategy for compressing data. I don't want to think
about it too much at the moment though.
server
before: user 32.500+0.000 sys 1.160+0.000
after: user 20.230+0.010 sys 0.180+0.000
client
before: user 133.400+0.000 sys 93.120+0.000
after: user 68.370+0.000 sys 32.950+0.000
This appears to indicate we have significant overhead in the frame
processing code on both client and server. It might be worth profiling
that at some point...
Differential Revision: https://phab.mercurial-scm.org/D4473
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:06:40 -0700] rev 39575
wireprotov2: implement commands as a generator of objects
Previously, wire protocol version 2 inherited version 1's model of
having separate types to represent the results of different wire
protocol commands.
As I implemented more powerful commands in future commits, I found
I was using a common pattern of returning a special type to hold a
generator. This meant the command function required a closure to
do most of the work. That made logic flow more difficult to follow.
I also noticed that many commands were effectively a sequence of
objects to be CBOR encoded.
I think it makes sense to define version 2 commands as generators.
This way, commands can simply emit the data structures they wish to
send to the client. This eliminates the need for a closure in
command functions and removes encoding from the bodies of commands.
As part of this commit, the handling of response objects has been
moved into the serverreactor class. This puts the reactor in the
driver's seat with regards to CBOR encoding and error handling.
Having error handling in the function that emits frames is
particularly important because exceptions in that function can lead
to things getting in a bad state: I'm fairly certain that uncaught
exceptions in the frame generator were causing deadlocks.
I also introduced a dedicated error type for explicit error reporting
in command handlers. This will be used in subsequent commits.
There's still a bit of work to be done here, especially around
formalizing the error handling "protocol." I've added yet another
TODO to track this so we don't forget.
Test output changed because we're using generators and no longer know
we are at the end of the data until we hit the end of the generator.
This means we can't emit the end-of-stream flag until we've exhausted
the generator. Hence the introduction of 0-sized end-of-stream frames.
Differential Revision: https://phab.mercurial-scm.org/D4472
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 27 Aug 2018 13:30:44 -0700] rev 39574
internals: extract frame-based protocol docs to own document
wireprotocol.txt is quite long and difficult to digest. The
frame-based protocol is effectively a standalone concept (and could
even be used outside of Mercurial). So this commit extracts its
docs to a standalone file.
The first few paragraphs were rewritten as part of the extraction.
Sections headers were adjusted accordingly.
Existing referalls in wireprotocol.txt were updated to refer to the
new doc / concept, which I've started referring to as `hgrpc`.
I'm on the fence as to whether to move the HTTP and SSH transport
details to the new doc as well. For now, I'm leaving them in
wireprotocol.txt.
Differential Revision: https://phab.mercurial-scm.org/D4443
Yuya Nishihara <yuya@tcha.org> [Wed, 12 Sep 2018 22:19:29 +0900] rev 39573
narrow: remove hack to write narrowspec to shared .hg directory
AFAIK, we no longer need it since the narrowspec file was move to the
store directory in 576eef1ab43d, "narrow: move .hg/narrowspec to
.hg/store/narrowspec."
Yuya Nishihara <yuya@tcha.org> [Wed, 12 Sep 2018 22:15:43 +0900] rev 39572
narrowspec: remove parseserverpatterns() which isn't used anymore
Follows up 10a8472f6662, "narrow: drop support for remote expansion."
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 17:22:15 -0700] rev 39571
hg: write narrow patterns after repo creation
Now that hg.clone() knows when a narrow clone is requested, it
makes sense to have it update the narrow patterns for the repo
soon after the repo is created, before any exchange occurs.
Previously, the narrow extension was monkeypatching an exchange
function to do this. The old code is redundant and has been
removed.
Differential Revision: https://phab.mercurial-scm.org/D4541
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 16:59:17 -0700] rev 39570
narrow: don't wrap exchange.pull() during clone
The wrapped version was setting up the narrow repo requirement when
a narrow clone was requested.
Previous commits taught hg.clone() and repo creation to add the narrow
requirement when a narrow clone was requested. So this requirement
should already be set up for us and this code is no longer necessary.
Differential Revision: https://phab.mercurial-scm.org/D4540
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 17:21:18 -0700] rev 39569
exchange: support defining narrow file patterns for pull
This commit teaches exchange.pull() about the desire to perform a
narrow file pull. We simply pass include and exclude patterns to
the function. The values are validated and stored on the pulloperation
instance.
hg.clone() has been taught to pass these arguments to exchange.pull().
If the arguments are not passed to exchange.pull(), the active narrow
patterns from the repository will automatically be used. We /could/
always use the narrow patterns from the repo. However, allowing
explicit values to be passed in allows us to perform data fetching
that doesn't necessarily align with the repo configuration. This
provides more flexibility.
Differential Revision: https://phab.mercurial-scm.org/D4539
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 17:20:14 -0700] rev 39568
commands: pass include and exclude options to hg.clone()
These arguments are defined by the narrow extension. Let's teach
core to recognize them so we can delete some code from the narrow
extension and start to exercise the in-core code for performing a
narrow clone.
We have no way of easily testing it, but this change should result in
.hg/requires having the narrow requirement from the time the file
is written rather than added as part of pull. We'll confirm this when
we delete some monkeypatched functions from the narrow extension in
later commits.
Test output changed because hg.clone() is now receiving patterns
and validation of those values is occurring sooner, before the exchange
code runs and prints the message that was deleted.
Differential Revision: https://phab.mercurial-scm.org/D4538
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 14:16:05 -0700] rev 39567
localrepo: add requirement when narrow files creation option present
The previous commit taught hg.clone() to define a creation option
when file include or exclude patterns are passed.
This commit teaches the new repo creation code to convert that creation
option into a repository requirement.
While not yet used by the narrow extension, the eventual side-effect
of this change is that newly-created repositories will have the narrow
requirement from their creation onset. Currently, the requirement is
added to the repo at exchange.pull() time via a wrapped function in
the narrow extension.
Differential Revision: https://phab.mercurial-scm.org/D4537
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 17:15:35 -0700] rev 39566
hg: recognize include and exclude patterns when cloning
This commit teaches clone() to accept arguments defining file
patterns to clone. This is the first step in teaching core code
about the existence of a narrow clone.
Right now, we only perform validation of the arguments and pass
additional options into createopts to influence repository
creation. Nothing of consequence happens with that creation option
yet, however.
For now, arbitrary restrictions exist, such as not allowing patterns
for shared repos and disabling local copies when patterns are
defined. We can potentially lift these restrictions in the future
once partial clone/storage support is more flushed out. I figure
it is best to reduce the surface area for bugs for the time being.
It may seem weird to prefix these arguments with "store." However,
clone is effectively pull + update and file patterns could apply to
both the store and the working directory. The prefix is there to
disambiguate in the future when this function may want to use
different sets of patterns for the store and working directory.
Differential Revision: https://phab.mercurial-scm.org/D4536
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 17:11:32 -0700] rev 39565
hg: allow extra arguments to be passed to repo creation (API)
Currently, repository creation is influenced by consulting the
ui instance and turning config options into requirements. This
means that in order to influence repository creation, you need
to define and set a config option and that the option must translate
to a requirement stored in the .hg/requires file.
This commit introduces a new mechanism to influence repository
creation. hg.repository() and hg.peer() have been taught to
receive a new optional argument defining extra options to apply
to repository creation. This value is passed along to the various
instance() functions and can be used to influence repository
creation. This will allow us to pass rich data directly to repository
creation without having to go through the config layer. It also allows
us to be more explicit about the features requested during repository
creation and provides a natural point to detect unhandled options
influencing repository creation. The new code detects when unknown
creation options are present and aborts in that case.
.. api:: options can now be passed to influence repository creation
The various instance() functions to spawn new peers or repository
instances now receive a ``createopts`` argument that can be a
dict defining additional options to influence repository creation.
localrepo.newreporequirements() also receives this argument.
Differential Revision: https://phab.mercurial-scm.org/D4535
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 13:46:59 -0700] rev 39564
localrepo: move repo creation logic out of localrepository.__init__ (API)
It has long bothered me that local repository creation is handled as
part of localrepository.__init__. Upcoming changes I want to make
around how repositories are initialized and instantiated will make
the continued existence of repository creation code in
localrepository.__init__ even more awkward.
localrepository instances are almost never constructed directly:
instead, callers are supposed to go through hg.repository() to obtain
a handle on a repository. And hg.repository() calls
localrepo.instance() to return a new repo instance.
This commit teaches localrepo.instance() to handle the create=True
logic. Most of the code for repo construction has been moved to a
standalone function. This allows extensions to monkeypatch the function
to further customize freshly-created repositories.
A few calls to localrepo.localrepository.__init__ that were passing
create=True were converted to call localrepo.instance().
.. api:: local repo creation moved out of constructor
``localrepo.localrepository.__init__`` no longer accepts a
``create`` argument to create a new repository. New repository
creation is now performed as part of ``localrepo.instance()``
and the bulk of the work is performed by
``localrepo.createrepository()``.
Differential Revision: https://phab.mercurial-scm.org/D4534
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 07 Sep 2018 15:57:55 -0700] rev 39563
localrepo: pass ui to newreporequirements() (API)
newreporequirements() is called as part of creating a new repository.
It doesn't make much sense for it to receive a repo instance as part
of determining what requirements for new repos should be.
.. api::
localrepo.newreporequirements() receives a ui instead of a repo
Differential Revision: https://phab.mercurial-scm.org/D4533
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 15:40:33 -0700] rev 39562
narrow: set opts['narrow'] instead of local variable
This will allow the command function in core to infer the presence
of the option without duplicating logic.
Differential Revision: https://phab.mercurial-scm.org/D4532
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 15:53:12 -0700] rev 39561
narrow: drop support for remote expansion (BC)
Previous patches to validate narrow patterns accidentically dropped
support for the include: syntax that allows patterns to be expanded
from a remote.
This feature was never implemented in core and is only implemented on
Google's custom server. Per @martinvonz's review comment in D4522, it
is OK to drop this feature since it isn't used.
The concept of this feature does seem useful. I anticipate it making
a comeback some day in some shape or form. But for now, let's jettison
the dead code.
Differential Revision: https://phab.mercurial-scm.org/D4530
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 07 Sep 2018 18:35:54 -0700] rev 39560
fastannotate: use repo.local()
This is the proper way to check whether we're dealing with a local
repository, since extensions should be coding to an interface and
not testing for exact types.
Differential Revision: https://phab.mercurial-scm.org/D4542
Martin von Zweigbergk <martinvonz@google.com> [Tue, 11 Sep 2018 16:04:55 -0700] rev 39559
tests: drop extra "file:" prefix from paths in narrow test
It looks like these were added by mistake in f4d4bd8c8911 (narrow: add
a --narrowspec flag to clone command, 2018-08-08).
Differential Revision: https://phab.mercurial-scm.org/D4531
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 11:47:10 -0700] rev 39558
narrow: validate spec files are well-formed during clone (BC)
Previously, specfiles would get read then normalized. We want
specfiles to be normalized on read so there is no confusion about
what the format of specfiles should be.
This commit validates the parsed result of --specfile. If entries
aren't prefixed, an error is raised.
Previously, validation would occur at exchange time, hence why we
dropped a line of test output related to server iteraction.
Differential Revision: https://phab.mercurial-scm.org/D4526
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 10:59:21 -0700] rev 39557
narrow: validate patterns on incoming bundle2 part
The remote data is untrusted and needs to be validated for
pattern conformance.
Differential Revision: https://phab.mercurial-scm.org/D4525
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 15:28:41 -0700] rev 39556
narrowspec: validate patterns when loading and saving spec file
Patterns should be normalized and validated before being passed into
narrowspec.save(). Let's assert that by checking immediately before
writing the narrow spec file. And let's assert that patterns loaded
from the spec file also conform.
Differential Revision: https://phab.mercurial-scm.org/D4524
Yuya Nishihara <yuya@tcha.org> [Mon, 10 Sep 2018 22:34:19 +0900] rev 39555
ancestor: use heapreplace() in place of heappop/heappush()
This should be slightly faster.
Overall perfancestors result::
cpython nginx mercurial
------------- ---------------- ---------------- ----------------
b6db2e80a9ce^ 0.103461 0.006303 0.035716
8eb2145ff0fb 0.192307 (x1.86) 0.012115 (x1.92) 0.052135 (x1.46)
this patch 0.139986 (x1.35) 0.006389 (x1.01) 0.037176 (x1.04)
Yuya Nishihara <yuya@tcha.org> [Tue, 11 Sep 2018 22:36:51 +0900] rev 39554
ancestor: rename local aliases of heapq functions in _lazyancestorsiter()
The original names no longer look pretty. Just call them as heap*() instead.
Yuya Nishihara <yuya@tcha.org> [Mon, 10 Sep 2018 21:58:59 +0900] rev 39553
ancestor: optimize _lazyancestorsiter() for contiguous chains
If there's no revision between p1 and current, p1 must be the next revision
to visit. In this case, we can get around the overhead of heappop/push
operations. Note that this is faster than using heapreplace().
'current - p1 == 1' could be generalized as 'all(r not in seen for r in
xrange(p1, current)', but Python is too slow to do such thing.
Yuya Nishihara <yuya@tcha.org> [Mon, 10 Sep 2018 21:54:40 +0900] rev 39552
ancestor: unroll loop of parents in _lazyancestorsiter()
This change itself isn't major performance win, but it helps optimizing
the visit loop for contiguous chains. See the next patch.
Yuya Nishihara <yuya@tcha.org> [Mon, 10 Sep 2018 21:46:19 +0900] rev 39551
ancestor: return early from _lazyancestorsiter() when reached to stoprev
There's no need to empty the heap.
Yuya Nishihara <yuya@tcha.org> [Tue, 11 Sep 2018 22:38:32 +0900] rev 39550
ancestor: remove alias of initrevs from _lazyancestorsiter()
It's just redundant and less comprehensible.
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 10:36:07 -0700] rev 39549
narrow: validate patterns returned by expandnarrow
Remotes could supply malicious or invalid patterns. We should
validate them as soon as possible.
Differential Revision: https://phab.mercurial-scm.org/D4523
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 11 Sep 2018 15:25:35 -0700] rev 39548
narrowspec: limit patterns to path: and rootfilesin: (BC)
Some matcher patterns are computationally expensive and may even
have security issues (e.g. evaluating some file sets). For these
reasons, we want to limit the types of matcher patterns that can
be used in narrow specs and by command line arguments used for
defining narrow specs.
This commit teaches ``narrowspec.parsepatterns()`` to validate the
pattern types against "safe" patterns.
Surprisingly, no existing tests broke. So tests for the feature
have been added.
We also added a function to validate a patterns data structure.
This will be used in future commits.
Differential Revision: https://phab.mercurial-scm.org/D4522
Martin von Zweigbergk <martinvonz@google.com> [Tue, 11 Sep 2018 10:54:20 -0700] rev 39547
narrow: mark wire proto capability names experimental and versioned
We already plan to add a "widen" wire protocol command to the "narrow"
capability, so let's version the capabilities as "exp-narrow-1" and
"exp-ellipses-1". When we add the "widen" command, we will then add a
"exp-narrow-2" capability to indicate support for that command.
Differential Revision: https://phab.mercurial-scm.org/D4529
Martin von Zweigbergk <martinvonz@google.com> [Tue, 11 Sep 2018 10:50:46 -0700] rev 39546
narrow: move wire proto capabilities to narrowwirepeer
These are not bundle2 capabilities (they just happened to share the
name "narrow"), so they seem to belong with the wirepeer overrides.
Differential Revision: https://phab.mercurial-scm.org/D4528