Boris Feld <boris.feld@octobus.net> [Wed, 19 Sep 2018 12:19:28 +0200] rev 39888
shelve: return the shelved node as part of bundle application
It make sense to have the function in charge of unbundling the shelved revision
also return the node of that revision (when the data is in the bundle).
This will help us to handle unnatural state where the unshelved change already
exists in the repository.
Boris Feld <boris.feld@octobus.net> [Thu, 20 Sep 2018 11:18:28 +0200] rev 39887
changelog: keep track of duplicated node in the transaction adding them
The transaction is already tracking the new nodes. We now tracks the
"duplicates" in the same location.
Boris Feld <boris.feld@octobus.net> [Wed, 19 Sep 2018 21:02:47 +0200] rev 39886
revlog: add a callback "tracking" duplicate node addition
If a changegroup contains node already added to the repository, they will be
skipped. Skipping them is the right behavior (we don't need to store things
twice), but it can hide some information to the code doing the unbundle (eg:
shelve looking for the tip of the bundle).
The first step to improve this situation is to add a low level callback. We do
not need this tracking on all revlog, so actual tracking will be added in the
next changeset.
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Wed, 26 Sep 2018 18:30:19 -0400] rev 39885
logtoprocess: define $HG for children processes
So they can compute the hg version for instance.
Differential Revision: https://phab.mercurial-scm.org/D4768
Matt Harbison <matt_harbison@yahoo.com> [Wed, 26 Sep 2018 22:21:25 -0400] rev 39884
py3: mask out None type when printing in `debuglocks`
Apparently, %b doesn't allow None.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 26 Sep 2018 21:25:18 -0400] rev 39883
py3: ensure standard exceptions use `str` type strings in windows.py
See also
edaa40dc5fe5.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 26 Sep 2018 20:49:28 -0400] rev 39882
py3: replace a StandardError reference
This doesn't exist on py3, and the standard way of handling this seems to be to
catch both exceptions.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 15:19:52 -0700] rev 39881
storageutil: extract revision number iteration
This code is a bit quirky (and possibly buggy). It will likely be used
by multiple storage backends. Let's extract it so it is reusable.
Differential Revision: https://phab.mercurial-scm.org/D4757
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 14:54:28 -0700] rev 39880
storageutil: new function for extracting metadata-less content from text
Other storage backends will want to do this.
I'm not concerned about Python function call overhead because I
expect self.revision() to dwarf the function call overhead time,
since self.revision() requires multiple function calls and may
involve decompression in the common case.
Differential Revision: https://phab.mercurial-scm.org/D4756
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 14:33:45 -0700] rev 39879
storageutil: move _censoredtext() from revlog
This seems like generic functionality we'll want to use from
non-revlog storage backends.
Differential Revision: https://phab.mercurial-scm.org/D4755
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 14:31:31 -0700] rev 39878
storageutil: move metadata parsing and packing from revlog (API)
Parsing and writing of revision text metadata is likely identical
across storage backends. Let's move the code out of revlog so we
don't need to import the revlog module in order to use it.
Differential Revision: https://phab.mercurial-scm.org/D4754
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 14:23:54 -0700] rev 39877
storageutil: new module for storage primitives (API)
There will exist common code between storage backends. It would
be nice to have a central place to put that code.
This commit attempts to create that place by creating the
"storageutil" module.
The first thing we move is revlog.hash(), which is the function for
computing the SHA-1 hash of revision fulltext and parents.
Differential Revision: https://phab.mercurial-scm.org/D4753
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 13:35:50 -0700] rev 39876
filelog: stop proxying deltaparent() (API)
deltaparent() obtains the revision number of the base revision a
delta in storage is stored against. It is highly revlog-centric and
may not apply to other storage backends. As a result, it doesn't
belong on the generic file storage interface.
This method/proxy is no longer used in core. The last consumer was
probably changegroup code and went away with the transition to
emitrevisions().
Differential Revision: https://phab.mercurial-scm.org/D4751
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 12:49:17 -0700] rev 39875
filelog: stop proxying rawsize() (API)
This method is no longer used by external consumers. The API is
quite low-level and is effectively len(revision(raw=True)). I don't
see a compelling reason to keep it around.
Let's drop the API and make the file storage interface simpler.
Differential Revision: https://phab.mercurial-scm.org/D4750
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 12:42:03 -0700] rev 39874
filelog: stop proxying "opener" (API)
The last consumer of it in upgrade code was removed as part of the
previous commit. This attribute is revlog specific (because it
assumes the existence of a vfs for performing I/O on tracked file
data) and therefore isn't appropriate for a generic storage interface.
So nuke it.
Differential Revision: https://phab.mercurial-scm.org/D4749
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 11:16:33 -0700] rev 39873
filelog: stop proxying flags() (API)
Per-revision storage flags are kinda a revlog-centric API. (Except for
the fact that changegroup uses the same integer flags as revlog does
and there's minimal verification that the server's flags map to the
client's storage flags - but that's another problem.)
The last user of flags() was in verify.py and that code was just moved
into revlog.py and is accessed behind the verifyintegrity() file
storage API.
Since there are no more consumers, let's drop the proxy and remove
the method from the file storage interface.
This commit only drops the dedicated API for reading a single
revision's storage flags: we still support reading and writing flags
through the bulk data retrieval and add revision APIs. And since
changegroups encode revlog integer flags over the wire, we'll always
need to support flags at some level. The removal of individual storage
flags may be too premature. But since flags() is now unused, I'd like
to see how far we can get without that dedicated API - especially
since it uses revision numbers instead of nodes.
Differential Revision: https://phab.mercurial-scm.org/D4746
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 11:27:47 -0700] rev 39872
revlog: move revision verification out of verify
File revision verification is performing low-level checks of file
storage, namely that flags are appropriate and revision data can
be resolved.
Since these checks are somewhat revlog-specific and may not
be appropriate for alternate storage backends, this commit moves
those checks from verify.py to revlog.py.
Because we're now emitting warnings/errors that apply to specific
revisions, we taught the iverifyproblem interface to expose the
problematic node and to report this node in verify output. This
was necessary to prevent unwanted test changes.
After this change, revlog.verifyintegrity() and file verify code
in verify.py both iterate over revisions and resolve their fulltext.
But they do so in separate loops. (verify.py needs to resolve
fulltexts as part of calling renamed() - at least when using revlogs.)
This should add overhead.
But on the mozilla-unified repo:
$ hg verify
before: time: real 700.640 secs (user 585.520+0.000 sys 23.480+0.000)
after: time: real 682.380 secs (user 570.370+0.000 sys 22.240+0.000)
I'm not sure what's going on. Maybe avoiding the filelog attribute
proxies shaved off enough time to offset the losses? Maybe fulltext
resolution has less overhead than I thought?
I've left a comment indicating the potential for optimization. But
because it doesn't produce a performance regression on a large
repository, I'm not going to worry about it.
Differential Revision: https://phab.mercurial-scm.org/D4745
Martin von Zweigbergk <martinvonz@google.com> [Wed, 26 Sep 2018 12:06:44 -0700] rev 39871
tests: de-flake test-narrow-debugrebuilddirstate.t
If the dirstate gets written much later (usually 1-2 s, depending on
FS) than the working copy file (there's only one), then the `hg
debugdirstate` command will include a timestamp. There's nothing wrong
with that, so we should just allow it.
Differential Revision: https://phab.mercurial-scm.org/D4758
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 12:39:34 -0700] rev 39870
upgrade: use storageinfo() for obtaining storage metadata
Let's switch to our new API for obtaining information about storage.
This eliminates the last consumer of rawsize() and the opener proxy
from the file storage interface!
Differential Revision: https://phab.mercurial-scm.org/D4748
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 11:56:48 -0700] rev 39869
revlog: add method for obtaining storage info (API)
We currently have a handful of methods on the file and manifest
storage interfaces for obtaining metadata about storage. e.g.
files() is used to obtain the files backing storage. rawsize()
is to quickly compute the size of tracked revisions without resolving
their fulltext.
Code in upgrade and stream clone make heavy use of these methods.
The existing APIs are generic and don't necessarily have the
specialization that we need going forward. For example, files()
doesn't distinguish between exclusive storage and shared storage.
This makes stream clone difficult to implement when e.g. there may
be a single file backing storage for multiple tracked paths. It
also makes reporting difficult, as we don't know how many bytes are
actually used by storage since we can't easily identify shared files.
This commit implements a new method for obtaining storage metadata.
It is designed to accept arguments specifying what metadata to request
and to return a dict with those fields populated. We /could/ make
each of these attributes a separate method. But this is a specialized
API and I'm trying to avoid method bloat on the interfaces. There is
also the possibility that certain callers will want to obtain multiple
fields in different combinations and some backends may have performance
issues obtaining all that data via separate method calls.
Simple storage integration tests have been added. For now, we assume
fields can't be "None" (ignoring the interface documentation). We can
revisit this later.
Differential Revision: https://phab.mercurial-scm.org/D4747
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 11:27:41 -0700] rev 39868
lfs: drop unused import
A recent change dropped the last user of this module.
Differential Revision: https://phab.mercurial-scm.org/D4744
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 10:08:58 -0700] rev 39867
filelog: drop _generaldelta attribute (API)
With changegroup moving to emitrevisions(), this revlog-specific
attribute is no longer used and can be deleted. Good riddance.
Differential Revision: https://phab.mercurial-scm.org/D4727
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 24 Sep 2018 09:59:19 -0700] rev 39866
revlog: drop emitrevisiondeltas() and associated functionality (API)
emitrevisions() is the future!
Differential Revision: https://phab.mercurial-scm.org/D4726
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 21 Sep 2018 18:47:04 -0700] rev 39865
changegroup: port to emitrevisions() (
issue5976)
We now have a unified API for emitting revision data from a storage
backend. It handles sorting nodes and the complicated delta versus
revision decisions for us.
This commit ports changegroup to that API.
There should be no behavior changes for changegroups not using
ellipsis. And lack of test changes seems to confirm that.
There are some changes for ellipsis mode, however.
Before, when sending an ellipsis revision, we would always send a
fulltext revision (as opposed to a delta). There was a TODO tracking
this open item.
One of the things the emitrevisions() API does for us is figure out
whether we can safely emit a delta. So, it is now possible for
ellipsis revisions to be sent as deltas! (It does this by not
assuming parent/ancestor revisions are available and tracking which
revisions have been sent out.)
Because we eliminated the list of revision delta request objects,
performance has improved substantially:
$ hg perfchangegroupchangelog
before: ! wall 24.348077 comb 24.330000 user 24.140000 sys 0.190000 (best of 3)
after: ! wall 18.245911 comb 18.240000 user 18.100000 sys 0.140000 (best of 3)
That's a lot of overhead for creating a few hundred thousand Python
objects!
This is still a little slower than 4.7. Probably due to
23d582ca
introducing a type for the revision/delta results. There is
potentially room to optimize. But at some point we need to abstract
storage in order to support alternate storage backends. Unfortunately
that means using a Python data structure to represent results. And
unfortunately there is overhead with every new Python object created.
Differential Revision: https://phab.mercurial-scm.org/D4725