Martin von Zweigbergk <martinvonz@google.com> [Wed, 03 Oct 2018 10:27:44 -0700] rev 40029
cleanup: some Yoda conditions, this patch removes
It seems the factor 20 is less than the frequency of " < \d" compared
to " \d > ".
Differential Revision: https://phab.mercurial-scm.org/D4862
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:43:54 -0700] rev 40028
streamclone: don't support stream clone unless repo feature present
This change means custom repository types must opt in to enabling
stream clone. This seems reasonable, as stream clones are a very
low-level feature that has historically assumed the use of revlogs
and the layout of .hg/ that they entail.
Differential Revision: https://phab.mercurial-scm.org/D4853
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:40:39 -0700] rev 40027
localrepo: add repository feature when repo can be stream cloned
Right now, the wire protocol server assumes all repository objects can
be stream cloned (unless the stream clone feature is disabled via
config option).
But not all storage backends or repository objects may support stream
clone.
This commit defines a repository feature denoting whether stream clone
is supported. The feature is defined for revlog-based repositories,
which should currently be "all repositories."
Differential Revision: https://phab.mercurial-scm.org/D4852
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:08:08 -0700] rev 40026
wireprotov2: client support for following content redirects
And with the server actually sending content redirects, it is finally
time to implement client support for following them!
When a redirect response is seen, we wait until all data for that
request has been received (it should be nearly immediate since no
data is expected to follow the redirect message). Then we use
a URL opener to make a request. We stuff that response into the
client handler and construct a new response object to track it.
When readdata() is called for servicing requests, we attempt to
read data from the first redirected response. During data reading,
data is processed similarly to as if it came from a frame payload.
The existing test for the functionality demonstrates the client
transparently following the redirect and obtaining the command
response data from an alternate URL!
There is still plenty of work to do here, including shoring up
testing. I'm not convinced things will work in the presence of
multiple redirect responses. And we don't yet implement support
for integrity verification or configuring server certificates
to validate the connection. But it's a start. And it should enable
us to start experimenting with "real" caches.
Differential Revision: https://phab.mercurial-scm.org/D4778
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:07:55 -0700] rev 40025
wireprotov2: server support for sending content redirects
A "content redirect" can be sent in place of inline response content.
In terms of code, we model a content redirect as a special type of
response object holding the attributes describing that redirect.
Sending a content redirect thus becomes as simple as the object
emission layer sending an instance of that type. A cacher using
externally-addressable content storage could replace the outgoing
object stream with an object advertising its location.
The bulk of the code in this commit is teaching the output layer
which handles the object stream to recognize alternate location
objects. The rules are that if an alternate location object is
present, it must be the first and only object in the object stream.
Otherwise the server emits an error.
Differential Revision: https://phab.mercurial-scm.org/D4777
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:02:19 -0700] rev 40024
wireprotov2: client support for advertising redirect targets
With the server now able to emit a redirect target descriptor, we can
start to teach the client to recognize it.
This commit implements support for filtering the advertised
redirect targets against supported features and for advertising
compatible redirect targets as part of command requests. It also
adds the minimal boilerplate required to fail when a content
redirect is seen.
The server doesn't yet do anything with the advertised redirect
targets. And the client can't yet follow redirects if it did. But
at least we're putting bytes on the wire.
Differential Revision: https://phab.mercurial-scm.org/D4776
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:46:48 -0700] rev 40023
wireprotov2: advertise redirect targets in capabilities
This is pretty straightforward.
Redirect targets will require an extension to support. So we've added
a function that can be wrapped to define redirect targets.
To test this, we teach our simple cache test extension to read
redirect targets from a file. It's a bit hacky. But it gets the
job done.
Differential Revision: https://phab.mercurial-scm.org/D4775
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:02:06 -0700] rev 40022
wireprotov2: define semantics for content redirects
When I implemented the clonebundles feature and deployed it on
hg.mozilla.org using Amazon S3 as a content server, server-side CPU
and bandwidth usage dropped off a cliff and a ton of server scaling
headaches went away pretty much the instant clients with support for
clonebundles were rolled out to Firefox CI.
An obvious takeaway from that experience was that offloading server
load to scalable file servers - potentially backed by a CDN - is a
really good idea. Another takeaway was that Mercurial's wire protocol
wasn't in a good position to support data offload generally.
In wire protocol version 1, there isn't a mechanism in the protocol to
say "grab the data from over here instead." For HTTP, we could teach
the client to follow HTTP redirects. Or we could invent a media type
that encoded redirects inline. But for SSH, we were pretty much out of
luck because that protocol wasn't very flexible.
Wire protocol version 2 offers the opportunity to do something better.
The recent generic server-side content caching layer in the wire
protocol version 2 server demonstrated that it is possible to have
drop-in caching of responses to command requests. This by itself
adds tons of value and already makes the built-in server much more
scalable. But I don't want to stop there.
The existing server-side caching implementation has a big weakness:
it requires the server to send data to the client. This means that
the Mercurial server is potentially sending gigabytes of data to
thousands of clients. This is problematic because compared to scaling
static file servers, scaling dynamic servers is *hard*.
A solution to this is to "offload" serving of content to something
that isn't the Mercurial server. By offloading content serving, you
turn the Mercurial server from a centralized monolithic service to
a distributed mostly-indexing service. Assuming high rates of content
offload, this should drastically reduce the total work performed by
the Mercurial server, both in terms of CPU and data transfer. This
will make Mercurial servers vastly easier to scale.
This commit defines the semantics for "content redirects" in wire
protocol version 2. Essentially:
* Servers advertise the set of locations a response could be served
from.
* When making requests, clients advertise the set of locations they
are willing to fetch content from.
* Servers can then replace the inline response with one that says
"get the response from over here instead."
This feature - when fully implemented - will allow extending the
server-side caching layer to facilitate such things as integrating
your server-side cache with a scalable blob store (such as S3 or
a CDN) and offloading most data transfer to that external service.
This feature could also be leveraged for load balancing. e.g.
requests could come into a central server and then get redirected
to an available mirror depending on server availability or locality.
There's tons of potential :)
Differential Revision: https://phab.mercurial-scm.org/D4774
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:56 -0700] rev 40021
wireprotov2: support response caching
One of the things I've learned from managing VCS servers over the
years is that they are hard to scale. It is well known that some
companies have very beefy (read: very expensive) servers to power
their VCS needs. It is also known that specialized servers for
various VCS exist in order to facilitate scaling servers. (Mercurial
is in this boat.)
One of the aspects that make a VCS server hard to scale is the
high CPU load incurred by constant client clone/pull operations.
To alleviate the scaling pain associated with data retrieval
operations, I want to integrate caching into the Mercurial wire
protocol server as robustly as possible such that servers can
aggressively cache responses and defer as much server load as
possible.
This commit represents the initial implementation of a general
caching layer in wire protocol version 2.
We define a new interface and behavior for a wire protocol cacher
in repository.py. (This is probably where a reviewer should look
first to understand what is going on.)
The bulk of the added code is in wireprotov2server.py, where we
define how a command can opt in to being cached and integrate
caching into command dispatching.
From a very high-level:
* A command can declare itself as cacheable by providing a callable
that can be used to derive a cache key.
* At dispatch time, if a command is cacheable, we attempt to
construct a cacher and use it for serving the request and/or
caching the request.
* The dispatch layer handles the bulk of the business logic for
caching, making cachers mostly "dumb content stores."
* The mechanism for invalidating cached entries (one of the harder
parts about caching in general) is by varying the cache key when
state changes. As such, cachers don't need to be concerned with
cache invalidation.
Initially, we've hooked up support for caching "manifestdata" and
"filedata" commands. These are the simplest to cache, as they should
be immutable over time. Caching of commands related to changeset
data is a bit harder (because cache validation is impacted by
changes to bookmarks, phases, etc). This will be implemented later.
(Strictly speaking, censoring a file should invalidate caches. I've
added an inline TODO to track this edge case.)
To prove it works, this commit implements a test-only extension
providing in-memory caching backed by an lrucachedict. A new test
showing this extension behaving properly is added. FWIW, the
cacher is ~50 lines of code, demonstrating the relative ease with
which a cache can be added to a server.
While the test cacher is not suitable for production workloads, just
for kicks I performed a clone of just the changeset and manifest data
for the mozilla-unified repository. With a fully warmed cache (of just
the manifest data since changeset data is not cached), server-side
CPU usage dropped from ~73s to ~28s. That's pretty significant and
demonstrates the potential that response caching has on server
scalability!
Differential Revision: https://phab.mercurial-scm.org/D4773
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:27 -0700] rev 40020
wireprotov2: define type to represent pre-encoded object
An upcoming commit will introduce a caching layer to command
serving. This will require the ability to cache pre-encoded data.
This commit introduces a type to represent pre-encoded data and
teaches the output layer to not CBOR encode an instance of that
type.
Differential Revision: https://phab.mercurial-scm.org/D4772
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:53:49 -0700] rev 40019
wireprotov2: change name and behavior of readframe()
In the near future, we will want to support performing I/O from
other sources. Let's rename readframe() to readdata() and tweak
its logic to support future growth.
Differential Revision: https://phab.mercurial-scm.org/D4771
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 16:07:59 -0700] rev 40018
url: move _wraphttpresponse() from httpeer
This is a generally useful function. Having it on the url module
will make it more accessible outside of the HTTP peers.
Differential Revision: https://phab.mercurial-scm.org/D4770
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 14:54:15 -0700] rev 40017
debugcommands: print all CBOR objects
application/mercurial-cbor may contain multiple objects. Let's print
all of them.
Differential Revision: https://phab.mercurial-scm.org/D4769
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:48:19 +0900] rev 40016
help: document about "export" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:43:57 +0900] rev 40015
help: document about "config" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:34:18 +0900] rev 40014
help: document about "cat" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:38:49 +0900] rev 40013
help: document about "branches" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:32:18 +0900] rev 40012
help: document about "bookmarks" template keywords
Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:27:45 +0900] rev 40011
help: document about "annotate" template keywords
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:34:53 -0700] rev 40010
storageutil: pass nodes into emitrevisions()
The main emitrevisions() uses nodes. So it makes sense to use
nodes for the helper API.
Not bothering with API since this function was introduced a few
commits ago.
Differential Revision: https://phab.mercurial-scm.org/D4805
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:09 -0700] rev 40009
storageutil: make all callables optional
Not all storage backends may implement these callables. That's part
of the reason these methods aren't exposed on the storage interface.
Differential Revision: https://phab.mercurial-scm.org/D4804
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:22 -0700] rev 40008
storageutil: extract most of emitrevisions() to standalone function
As part of implementing a storage backend, I found myself copying
most of revlog.emitrevisions(). This code is highly nuanced and it
bothered me greatly to be copying such low-level code.
This commit extracts the bulk of revlog.emitrevisions() into a
new standalone function. In order to make the function generally
usable, all "self" function calls that aren't exposed on the
ifilestorage interface are passed in via callable arguments.
No meaningful behavior should have changed as part of the port.
Upcoming commits will tweak behavior to make the code more
generically usable.
Differential Revision: https://phab.mercurial-scm.org/D4803
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:51:17 -0700] rev 40007
storageutil: invert logic of file data comparison
IMO things make more sense when the function is explicitly a test
for file data equivalence.
Not bothering with API since the function was introduced by the
previous commit.
Differential Revision: https://phab.mercurial-scm.org/D4802
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:47:53 -0700] rev 40006
storageutil: extract filelog.cmp() to a standalone function
As part of implementing an alternate storage backend, I found myself
reimplementing this code.
With a little massaging, we can extract filelog.cmp() to a standalone
function.
As part of this, the call to revlog.cmp() was inlined (it is just a
2-line function).
I also tweaked some variable names to improve readability. I'll
further tweak names in a subsequent commit.
Differential Revision: https://phab.mercurial-scm.org/D4801
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:37:49 -0700] rev 40005
storageutil: extract copy metadata retrieval out of filelog
As part of implementing an alternate storage backend, I found myself
reinventing this wheel.
Let's create a utility function for doing the work.
Differential Revision: https://phab.mercurial-scm.org/D4800
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:29:05 -0700] rev 40004
storageutil: extract functionality for resolving strip revisions
I found myself having to copy this method as part of implementing a new
storage backend. With a little tweaking, we can extract it to a
standalone function so it can be reused easily.
We'll likely want to implement a better method for removing revisions
on the storage interface someday. But until then, let's use what we
have.
Differential Revision: https://phab.mercurial-scm.org/D4799
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:16:44 -0700] rev 40003
storageutil: consistently raise LookupError (API)
The interface docs say this is supposed to raise LookupError on
failure. But for invalid revision number input, it could raise
IndexError because ifileindex.node() is documented to raise
IndexError.
lookup() for files isn't used that much (pretty much just in
basefilectx in core AFAICT). And callers should already be catching
LookupError. So I don't anticipate that much fallout from this
change.
Differential Revision: https://phab.mercurial-scm.org/D4798
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:03:17 -0700] rev 40002
storageutil: implement file identifier resolution method (BC)
revlog.lookup() has a number of advanced features, including partial
node matching.
These advanced features aren't needed for file id lookup because file
identifiers are almost always from internal sources. (An exception to
this is hgweb, which appears to have some code paths that attempt
to resolve a user-supplied string to a node.)
This commit implements a new function for resolving file identifiers
to nodes. It behaves like revlog.lookup() but without the
advanced features.
Tests reveal behavior changes:
* Partial hex nodes are no longer resolved to nodes.
* "-1" now returns nullid instead of raising LookupError.
* "0" on an empty store now raises LookupError instead of returning
nullid.
I'm not sure why "-1" wasn't recognized before. But it seems reasonable
to accept it as a value since integer -1 resolves to nullid.
These changes all seem reasonable to me. And with the exception of
partial hex node matching, we may want to consider changing
revlog.lookup() as well.
Differential Revision: https://phab.mercurial-scm.org/D4797
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:00:20 -0700] rev 40001
testing: add more testing for ifileindex.lookup()
The tests demonstrate some... questionable behavior of revlog.lookup().
Differential Revision: https://phab.mercurial-scm.org/D4796
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 10:20:37 -0700] rev 40000
dagop: extract DAG local heads functionality from revlog
As part of implementing an alternate storage backend, I found myself
having to reimplement this function.
The DAG traversal logic is generic and can be factored out into a
standalone function.
We could probably combine this with headrevs(). But I'll leave that
for another time.
Differential Revision: https://phab.mercurial-scm.org/D4795