Mercurial: Changelog

Mercurial Mercurial > hg / changelog

Wed, 03 Oct 2018 13:54:31 -0700 exchangev2: add progress bar around manifest scanning

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 13:54:31 -0700] rev 40035

exchangev2: add progress bar around manifest scanning This can take a long time on large repositories. Let's add a progress bar so we don't have long periods where it isn't obvious what is going on. Differential Revision: https://phab.mercurial-scm.org/D4859

Mon, 01 Oct 2018 13:17:38 -0700 httppeer: report http statistics

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 13:17:38 -0700] rev 40034

httppeer: report http statistics Now that keepalive.py records HTTP request count and the number of bytes sent and received as part of performing those requests, we can easily print a report on the activity when closing a peer instance! Exact byte counts are globbed in tests because they are influenced by non-deterministic things, such as hostnames and port numbers. Plus, the exact byte count isn't too important anyway. I feel obliged to note that printing the byte count could have security implications. e.g. if sending a password via HTTP basic auth, the length of that password will influence the byte count and the reporting of the byte count could be a side-channel leak of the password length. I /think/ this is beyond our threshold for concern. But if we think it poses a problem, we can teach the byte count logging code to e.g. ignore sensitive HTTP request headers. We could also consider not reporting the byte count of request headers altogether. But since the wire protocol uses HTTP headers for sending command arguments, it is kind of important to report their size. Differential Revision: https://phab.mercurial-scm.org/D4858

Mon, 01 Oct 2018 12:30:32 -0700 keepalive: track number of bytes received from an HTTP response

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:30:32 -0700] rev 40033

keepalive: track number of bytes received from an HTTP response We also bubble the byte count up to the HTTPConnection instance and its parent opener at read time. Unlike sending, there isn't a clear "end of response" signal we can intercept to defer updating the accounting. Differential Revision: https://phab.mercurial-scm.org/D4857

Mon, 01 Oct 2018 12:02:54 -0700 keepalive: track request count and bytes sent

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:02:54 -0700] rev 40032

keepalive: track request count and bytes sent I want wire protocol interactions to report the number of requests made and bytes transferred. This commit teaches the very low-level custom HTTPConnection class to track the number of bytes sent to the socket. This may vary from the number of bytes that go on the wire due to e.g. TLS. That's OK. KeepAliveHandler is taught to track the total number of requests and total number of bytes sent across all requests. Differential Revision: https://phab.mercurial-scm.org/D4856

Mon, 01 Oct 2018 12:06:36 -0700 url: have httpsconnection inherit from our custom HTTPConnection

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Mon, 01 Oct 2018 12:06:36 -0700] rev 40031

url: have httpsconnection inherit from our custom HTTPConnection This will ensure that any customizations we perform to HTTPConnection will be available to httpsconnection. Differential Revision: https://phab.mercurial-scm.org/D4855

Wed, 03 Oct 2018 09:43:01 -0700 cborutil: change buffering strategy

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 03 Oct 2018 09:43:01 -0700] rev 40030

cborutil: change buffering strategy Profiling revealed that we were spending a lot of time on the line that was concatenating the old buffer with the incoming data when attempting to decode long byte strings, such as manifest revisions. Essentially, we were feeding N chunks of size len(X) << len(Y) into decode() and continuously allocating a new, larger buffer to hold the undecoded input. This created substantial memory churn and slowed down execution. Changing the code to aggregate pending chunks in a list until we have enough data to fully decode the next atom makes things much more efficient. I don't have exact data, but I recall the old code spending >1s on manifest fulltexts from the mozilla-unified repo. The new code doesn't significantly appear in profile output. Differential Revision: https://phab.mercurial-scm.org/D4854

Wed, 03 Oct 2018 10:27:44 -0700 cleanup: some Yoda conditions, this patch removes

changeset

Martin von Zweigbergk <martinvonz@google.com> [Wed, 03 Oct 2018 10:27:44 -0700] rev 40029

cleanup: some Yoda conditions, this patch removes It seems the factor 20 is less than the frequency of " < \d" compared to " \d > ". Differential Revision: https://phab.mercurial-scm.org/D4862

Tue, 02 Oct 2018 12:43:54 -0700 streamclone: don't support stream clone unless repo feature present

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:43:54 -0700] rev 40028

streamclone: don't support stream clone unless repo feature present This change means custom repository types must opt in to enabling stream clone. This seems reasonable, as stream clones are a very low-level feature that has historically assumed the use of revlogs and the layout of .hg/ that they entail. Differential Revision: https://phab.mercurial-scm.org/D4853

Tue, 02 Oct 2018 12:40:39 -0700 localrepo: add repository feature when repo can be stream cloned

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Tue, 02 Oct 2018 12:40:39 -0700] rev 40027

localrepo: add repository feature when repo can be stream cloned Right now, the wire protocol server assumes all repository objects can be stream cloned (unless the stream clone feature is disabled via config option). But not all storage backends or repository objects may support stream clone. This commit defines a repository feature denoting whether stream clone is supported. The feature is defined for revlog-based repositories, which should currently be "all repositories." Differential Revision: https://phab.mercurial-scm.org/D4852

Wed, 26 Sep 2018 18:08:08 -0700 wireprotov2: client support for following content redirects

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:08:08 -0700] rev 40026

wireprotov2: client support for following content redirects And with the server actually sending content redirects, it is finally time to implement client support for following them! When a redirect response is seen, we wait until all data for that request has been received (it should be nearly immediate since no data is expected to follow the redirect message). Then we use a URL opener to make a request. We stuff that response into the client handler and construct a new response object to track it. When readdata() is called for servicing requests, we attempt to read data from the first redirected response. During data reading, data is processed similarly to as if it came from a frame payload. The existing test for the functionality demonstrates the client transparently following the redirect and obtaining the command response data from an alternate URL! There is still plenty of work to do here, including shoring up testing. I'm not convinced things will work in the presence of multiple redirect responses. And we don't yet implement support for integrity verification or configuring server certificates to validate the connection. But it's a start. And it should enable us to start experimenting with "real" caches. Differential Revision: https://phab.mercurial-scm.org/D4778

Wed, 26 Sep 2018 18:07:55 -0700 wireprotov2: server support for sending content redirects

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:07:55 -0700] rev 40025

wireprotov2: server support for sending content redirects A "content redirect" can be sent in place of inline response content. In terms of code, we model a content redirect as a special type of response object holding the attributes describing that redirect. Sending a content redirect thus becomes as simple as the object emission layer sending an instance of that type. A cacher using externally-addressable content storage could replace the outgoing object stream with an object advertising its location. The bulk of the code in this commit is teaching the output layer which handles the object stream to recognize alternate location objects. The rules are that if an alternate location object is present, it must be the first and only object in the object stream. Otherwise the server emits an error. Differential Revision: https://phab.mercurial-scm.org/D4777

Wed, 26 Sep 2018 15:02:19 -0700 wireprotov2: client support for advertising redirect targets

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:02:19 -0700] rev 40024

wireprotov2: client support for advertising redirect targets With the server now able to emit a redirect target descriptor, we can start to teach the client to recognize it. This commit implements support for filtering the advertised redirect targets against supported features and for advertising compatible redirect targets as part of command requests. It also adds the minimal boilerplate required to fail when a content redirect is seen. The server doesn't yet do anything with the advertised redirect targets. And the client can't yet follow redirects if it did. But at least we're putting bytes on the wire. Differential Revision: https://phab.mercurial-scm.org/D4776

Wed, 26 Sep 2018 17:46:48 -0700 wireprotov2: advertise redirect targets in capabilities

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:46:48 -0700] rev 40023

wireprotov2: advertise redirect targets in capabilities This is pretty straightforward. Redirect targets will require an extension to support. So we've added a function that can be wrapped to define redirect targets. To test this, we teach our simple cache test extension to read redirect targets from a file. It's a bit hacky. But it gets the job done. Differential Revision: https://phab.mercurial-scm.org/D4775

Wed, 26 Sep 2018 18:02:06 -0700 wireprotov2: define semantics for content redirects

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 18:02:06 -0700] rev 40022

wireprotov2: define semantics for content redirects When I implemented the clonebundles feature and deployed it on hg.mozilla.org using Amazon S3 as a content server, server-side CPU and bandwidth usage dropped off a cliff and a ton of server scaling headaches went away pretty much the instant clients with support for clonebundles were rolled out to Firefox CI. An obvious takeaway from that experience was that offloading server load to scalable file servers - potentially backed by a CDN - is a really good idea. Another takeaway was that Mercurial's wire protocol wasn't in a good position to support data offload generally. In wire protocol version 1, there isn't a mechanism in the protocol to say "grab the data from over here instead." For HTTP, we could teach the client to follow HTTP redirects. Or we could invent a media type that encoded redirects inline. But for SSH, we were pretty much out of luck because that protocol wasn't very flexible. Wire protocol version 2 offers the opportunity to do something better. The recent generic server-side content caching layer in the wire protocol version 2 server demonstrated that it is possible to have drop-in caching of responses to command requests. This by itself adds tons of value and already makes the built-in server much more scalable. But I don't want to stop there. The existing server-side caching implementation has a big weakness: it requires the server to send data to the client. This means that the Mercurial server is potentially sending gigabytes of data to thousands of clients. This is problematic because compared to scaling static file servers, scaling dynamic servers is *hard*. A solution to this is to "offload" serving of content to something that isn't the Mercurial server. By offloading content serving, you turn the Mercurial server from a centralized monolithic service to a distributed mostly-indexing service. Assuming high rates of content offload, this should drastically reduce the total work performed by the Mercurial server, both in terms of CPU and data transfer. This will make Mercurial servers vastly easier to scale. This commit defines the semantics for "content redirects" in wire protocol version 2. Essentially: * Servers advertise the set of locations a response could be served from. * When making requests, clients advertise the set of locations they are willing to fetch content from. * Servers can then replace the inline response with one that says "get the response from over here instead." This feature - when fully implemented - will allow extending the server-side caching layer to facilitate such things as integrating your server-side cache with a scalable blob store (such as S3 or a CDN) and offloading most data transfer to that external service. This feature could also be leveraged for load balancing. e.g. requests could come into a central server and then get redirected to an available mirror depending on server availability or locality. There's tons of potential :) Differential Revision: https://phab.mercurial-scm.org/D4774

Wed, 26 Sep 2018 17:16:56 -0700 wireprotov2: support response caching

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:56 -0700] rev 40021

wireprotov2: support response caching One of the things I've learned from managing VCS servers over the years is that they are hard to scale. It is well known that some companies have very beefy (read: very expensive) servers to power their VCS needs. It is also known that specialized servers for various VCS exist in order to facilitate scaling servers. (Mercurial is in this boat.) One of the aspects that make a VCS server hard to scale is the high CPU load incurred by constant client clone/pull operations. To alleviate the scaling pain associated with data retrieval operations, I want to integrate caching into the Mercurial wire protocol server as robustly as possible such that servers can aggressively cache responses and defer as much server load as possible. This commit represents the initial implementation of a general caching layer in wire protocol version 2. We define a new interface and behavior for a wire protocol cacher in repository.py. (This is probably where a reviewer should look first to understand what is going on.) The bulk of the added code is in wireprotov2server.py, where we define how a command can opt in to being cached and integrate caching into command dispatching. From a very high-level: * A command can declare itself as cacheable by providing a callable that can be used to derive a cache key. * At dispatch time, if a command is cacheable, we attempt to construct a cacher and use it for serving the request and/or caching the request. * The dispatch layer handles the bulk of the business logic for caching, making cachers mostly "dumb content stores." * The mechanism for invalidating cached entries (one of the harder parts about caching in general) is by varying the cache key when state changes. As such, cachers don't need to be concerned with cache invalidation. Initially, we've hooked up support for caching "manifestdata" and "filedata" commands. These are the simplest to cache, as they should be immutable over time. Caching of commands related to changeset data is a bit harder (because cache validation is impacted by changes to bookmarks, phases, etc). This will be implemented later. (Strictly speaking, censoring a file should invalidate caches. I've added an inline TODO to track this edge case.) To prove it works, this commit implements a test-only extension providing in-memory caching backed by an lrucachedict. A new test showing this extension behaving properly is added. FWIW, the cacher is ~50 lines of code, demonstrating the relative ease with which a cache can be added to a server. While the test cacher is not suitable for production workloads, just for kicks I performed a clone of just the changeset and manifest data for the mozilla-unified repository. With a fully warmed cache (of just the manifest data since changeset data is not cached), server-side CPU usage dropped from ~73s to ~28s. That's pretty significant and demonstrates the potential that response caching has on server scalability! Differential Revision: https://phab.mercurial-scm.org/D4773

Wed, 26 Sep 2018 17:16:27 -0700 wireprotov2: define type to represent pre-encoded object

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 17:16:27 -0700] rev 40020

wireprotov2: define type to represent pre-encoded object An upcoming commit will introduce a caching layer to command serving. This will require the ability to cache pre-encoded data. This commit introduces a type to represent pre-encoded data and teaches the output layer to not CBOR encode an instance of that type. Differential Revision: https://phab.mercurial-scm.org/D4772

Wed, 26 Sep 2018 15:53:49 -0700 wireprotov2: change name and behavior of readframe()

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 15:53:49 -0700] rev 40019

wireprotov2: change name and behavior of readframe() In the near future, we will want to support performing I/O from other sources. Let's rename readframe() to readdata() and tweak its logic to support future growth. Differential Revision: https://phab.mercurial-scm.org/D4771

Wed, 26 Sep 2018 16:07:59 -0700 url: move _wraphttpresponse() from httpeer

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 16:07:59 -0700] rev 40018

url: move _wraphttpresponse() from httpeer This is a generally useful function. Having it on the url module will make it more accessible outside of the HTTP peers. Differential Revision: https://phab.mercurial-scm.org/D4770

Wed, 26 Sep 2018 14:54:15 -0700 debugcommands: print all CBOR objects

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Wed, 26 Sep 2018 14:54:15 -0700] rev 40017

debugcommands: print all CBOR objects application/mercurial-cbor may contain multiple objects. Let's print all of them. Differential Revision: https://phab.mercurial-scm.org/D4769

Wed, 03 Oct 2018 22:48:19 +0900 help: document about "export" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:48:19 +0900] rev 40016

help: document about "export" template keywords

Wed, 03 Oct 2018 22:43:57 +0900 help: document about "config" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:43:57 +0900] rev 40015

help: document about "config" template keywords

Wed, 03 Oct 2018 22:34:18 +0900 help: document about "cat" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:34:18 +0900] rev 40014

help: document about "cat" template keywords

Wed, 03 Oct 2018 22:38:49 +0900 help: document about "branches" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:38:49 +0900] rev 40013

help: document about "branches" template keywords

Wed, 03 Oct 2018 22:32:18 +0900 help: document about "bookmarks" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:32:18 +0900] rev 40012

help: document about "bookmarks" template keywords

Wed, 03 Oct 2018 22:27:45 +0900 help: document about "annotate" template keywords

changeset

Yuya Nishihara <yuya@tcha.org> [Wed, 03 Oct 2018 22:27:45 +0900] rev 40011

help: document about "annotate" template keywords

Fri, 28 Sep 2018 16:34:53 -0700 storageutil: pass nodes into emitrevisions()

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:34:53 -0700] rev 40010

storageutil: pass nodes into emitrevisions() The main emitrevisions() uses nodes. So it makes sense to use nodes for the helper API. Not bothering with API since this function was introduced a few commits ago. Differential Revision: https://phab.mercurial-scm.org/D4805

Fri, 28 Sep 2018 16:16:09 -0700 storageutil: make all callables optional

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:09 -0700] rev 40009

storageutil: make all callables optional Not all storage backends may implement these callables. That's part of the reason these methods aren't exposed on the storage interface. Differential Revision: https://phab.mercurial-scm.org/D4804

Fri, 28 Sep 2018 16:16:22 -0700 storageutil: extract most of emitrevisions() to standalone function

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 16:16:22 -0700] rev 40008

storageutil: extract most of emitrevisions() to standalone function As part of implementing a storage backend, I found myself copying most of revlog.emitrevisions(). This code is highly nuanced and it bothered me greatly to be copying such low-level code. This commit extracts the bulk of revlog.emitrevisions() into a new standalone function. In order to make the function generally usable, all "self" function calls that aren't exposed on the ifilestorage interface are passed in via callable arguments. No meaningful behavior should have changed as part of the port. Upcoming commits will tweak behavior to make the code more generically usable. Differential Revision: https://phab.mercurial-scm.org/D4803

Fri, 28 Sep 2018 11:51:17 -0700 storageutil: invert logic of file data comparison

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:51:17 -0700] rev 40007

storageutil: invert logic of file data comparison IMO things make more sense when the function is explicitly a test for file data equivalence. Not bothering with API since the function was introduced by the previous commit. Differential Revision: https://phab.mercurial-scm.org/D4802

Fri, 28 Sep 2018 11:47:53 -0700 storageutil: extract filelog.cmp() to a standalone function

changeset

Gregory Szorc <gregory.szorc@gmail.com> [Fri, 28 Sep 2018 11:47:53 -0700] rev 40006

storageutil: extract filelog.cmp() to a standalone function As part of implementing an alternate storage backend, I found myself reimplementing this code. With a little massaging, we can extract filelog.cmp() to a standalone function. As part of this, the call to revlog.cmp() was inlined (it is just a 2-line function). I also tweaked some variable names to improve readability. I'll further tweak names in a subsequent commit. Differential Revision: https://phab.mercurial-scm.org/D4801

(0) -30000 -10000 -3000 -1000 -300 -100 -50 -30 +30 +50 +100 +300 +1000 +3000 +10000 tip

Mercurial

RSS Atom