comparison hgext/clonebundles.py @ 37498:aacfca6f9767

wireproto: support for pullbundles Pullbundles are similar to clonebundles, but served as normal inline bundle streams. They are almost transparent to the client -- the only visible effect is that the client might get less changes than what it asked for, i.e. not all requested head revisions are provided. The client announces support for the necessary retries with the partial-pull capability. After receiving a partial bundle, it updates the set of revisions shared with the server and drops all now-known heads from the request list. It will then rerun getbundle until no changes are received or all remote heads are present. Extend badserverext to support per-socket limit, i.e. don't assume that the same limits should be applied to all sockets. Differential Revision: https://phab.mercurial-scm.org/D1856
author Joerg Sonnenberger <joerg@bec.de>
date Thu, 18 Jan 2018 12:54:01 +0100
parents d25802b0eef5
children b4d85bc122bd
comparison
equal deleted inserted replaced
37497:1541e1a8e87d 37498:aacfca6f9767
4 """advertise pre-generated bundles to seed clones 4 """advertise pre-generated bundles to seed clones
5 5
6 "clonebundles" is a server-side extension used to advertise the existence 6 "clonebundles" is a server-side extension used to advertise the existence
7 of pre-generated, externally hosted bundle files to clients that are 7 of pre-generated, externally hosted bundle files to clients that are
8 cloning so that cloning can be faster, more reliable, and require less 8 cloning so that cloning can be faster, more reliable, and require less
9 resources on the server. 9 resources on the server. "pullbundles" is a related feature for sending
10 pre-generated bundle files to clients as part of pull operations.
10 11
11 Cloning can be a CPU and I/O intensive operation on servers. Traditionally, 12 Cloning can be a CPU and I/O intensive operation on servers. Traditionally,
12 the server, in response to a client's request to clone, dynamically generates 13 the server, in response to a client's request to clone, dynamically generates
13 a bundle containing the entire repository content and sends it to the client. 14 a bundle containing the entire repository content and sends it to the client.
14 There is no caching on the server and the server will have to redundantly 15 There is no caching on the server and the server will have to redundantly
15 generate the same outgoing bundle in response to each clone request. For 16 generate the same outgoing bundle in response to each clone request. For
16 servers with large repositories or with high clone volume, the load from 17 servers with large repositories or with high clone volume, the load from
17 clones can make scaling the server challenging and costly. 18 clones can make scaling the server challenging and costly.
18 19
19 This extension provides server operators the ability to offload potentially 20 This extension provides server operators the ability to offload
20 expensive clone load to an external service. Here's how it works. 21 potentially expensive clone load to an external service. Pre-generated
22 bundles also allow using more CPU intensive compression, reducing the
23 effective bandwidth requirements.
24
25 Here's how clone bundles work:
21 26
22 1. A server operator establishes a mechanism for making bundle files available 27 1. A server operator establishes a mechanism for making bundle files available
23 on a hosting service where Mercurial clients can fetch them. 28 on a hosting service where Mercurial clients can fetch them.
24 2. A manifest file listing available bundle URLs and some optional metadata 29 2. A manifest file listing available bundle URLs and some optional metadata
25 is added to the Mercurial repository on the server. 30 is added to the Mercurial repository on the server.
31 6. The client downloads and applies an available bundle from the 36 6. The client downloads and applies an available bundle from the
32 server-specified URL. 37 server-specified URL.
33 7. The client reconnects to the original server and performs the equivalent 38 7. The client reconnects to the original server and performs the equivalent
34 of :hg:`pull` to retrieve all repository data not in the bundle. (The 39 of :hg:`pull` to retrieve all repository data not in the bundle. (The
35 repository could have been updated between when the bundle was created 40 repository could have been updated between when the bundle was created
36 and when the client started the clone.) 41 and when the client started the clone.) This may use "pullbundles".
37 42
38 Instead of the server generating full repository bundles for every clone 43 Instead of the server generating full repository bundles for every clone
39 request, it generates full bundles once and they are subsequently reused to 44 request, it generates full bundles once and they are subsequently reused to
40 bootstrap new clones. The server may still transfer data at clone time. 45 bootstrap new clones. The server may still transfer data at clone time.
41 However, this is only data that has been added/changed since the bundle was 46 However, this is only data that has been added/changed since the bundle was
42 created. For large, established repositories, this can reduce server load for 47 created. For large, established repositories, this can reduce server load for
43 clones to less than 1% of original. 48 clones to less than 1% of original.
44 49
50 Here's how pullbundles work:
51
52 1. A manifest file listing available bundles and describing the revisions
53 is added to the Mercurial repository on the server.
54 2. A new-enough client informs the server that it supports partial pulls
55 and initiates a pull.
56 3. If the server has pull bundles enabled and sees the client advertising
57 partial pulls, it checks for a matching pull bundle in the manifest.
58 A bundle matches if the format is supported by the client, the client
59 has the required revisions already and needs something from the bundle.
60 4. If there is at least one matching bundle, the server sends it to the client.
61 5. The client applies the bundle and notices that the server reply was
62 incomplete. It initiates another pull.
63
45 To work, this extension requires the following of server operators: 64 To work, this extension requires the following of server operators:
46 65
47 * Generating bundle files of repository content (typically periodically, 66 * Generating bundle files of repository content (typically periodically,
48 such as once per day). 67 such as once per day).
49 * A file server that clients have network access to and that Python knows 68 * Clone bundles: A file server that clients have network access to and that
50 how to talk to through its normal URL handling facility (typically an 69 Python knows how to talk to through its normal URL handling facility
51 HTTP server). 70 (typically an HTTP/HTTPS server).
52 * A process for keeping the bundles manifest in sync with available bundle 71 * A process for keeping the bundles manifest in sync with available bundle
53 files. 72 files.
54 73
55 Strictly speaking, using a static file hosting server isn't required: a server 74 Strictly speaking, using a static file hosting server isn't required: a server
56 operator could use a dynamic service for retrieving bundle data. However, 75 operator could use a dynamic service for retrieving bundle data. However,
59 78
60 Bundle files can be generated with the :hg:`bundle` command. Typically 79 Bundle files can be generated with the :hg:`bundle` command. Typically
61 :hg:`bundle --all` is used to produce a bundle of the entire repository. 80 :hg:`bundle --all` is used to produce a bundle of the entire repository.
62 81
63 :hg:`debugcreatestreamclonebundle` can be used to produce a special 82 :hg:`debugcreatestreamclonebundle` can be used to produce a special
64 *streaming clone bundle*. These are bundle files that are extremely efficient 83 *streaming clonebundle*. These are bundle files that are extremely efficient
65 to produce and consume (read: fast). However, they are larger than 84 to produce and consume (read: fast). However, they are larger than
66 traditional bundle formats and require that clients support the exact set 85 traditional bundle formats and require that clients support the exact set
67 of repository data store formats in use by the repository that created them. 86 of repository data store formats in use by the repository that created them.
68 Typically, a newer server can serve data that is compatible with older clients. 87 Typically, a newer server can serve data that is compatible with older clients.
69 However, *streaming clone bundles* don't have this guarantee. **Server 88 However, *streaming clone bundles* don't have this guarantee. **Server
71 streaming clone bundles incompatible with older Mercurial versions.** 90 streaming clone bundles incompatible with older Mercurial versions.**
72 91
73 A server operator is responsible for creating a ``.hg/clonebundles.manifest`` 92 A server operator is responsible for creating a ``.hg/clonebundles.manifest``
74 file containing the list of available bundle files suitable for seeding 93 file containing the list of available bundle files suitable for seeding
75 clones. If this file does not exist, the repository will not advertise the 94 clones. If this file does not exist, the repository will not advertise the
76 existence of clone bundles when clients connect. 95 existence of clone bundles when clients connect. For pull bundles,
96 ``.hg/pullbundles.manifest`` is used.
77 97
78 The manifest file contains a newline (\\n) delimited list of entries. 98 The manifest file contains a newline (\\n) delimited list of entries.
79 99
80 Each line in this file defines an available bundle. Lines have the format: 100 Each line in this file defines an available bundle. Lines have the format:
81 101
82 <URL> [<key>=<value>[ <key>=<value>]] 102 <URL> [<key>=<value>[ <key>=<value>]]
83 103
84 That is, a URL followed by an optional, space-delimited list of key=value 104 That is, a URL followed by an optional, space-delimited list of key=value
85 pairs describing additional properties of this bundle. Both keys and values 105 pairs describing additional properties of this bundle. Both keys and values
86 are URI encoded. 106 are URI encoded.
107
108 For pull bundles, the URL is a path under the ``.hg`` directory of the
109 repository.
87 110
88 Keys in UPPERCASE are reserved for use by Mercurial and are defined below. 111 Keys in UPPERCASE are reserved for use by Mercurial and are defined below.
89 All non-uppercase keys can be used by site installations. An example use 112 All non-uppercase keys can be used by site installations. An example use
90 for custom properties is to use the *datacenter* attribute to define which 113 for custom properties is to use the *datacenter* attribute to define which
91 data center a file is hosted in. Clients could then prefer a server in the 114 data center a file is hosted in. Clients could then prefer a server in the
130 If this is defined, it is important to advertise a non-SNI fallback 153 If this is defined, it is important to advertise a non-SNI fallback
131 URL or clients running old Python releases may not be able to clone 154 URL or clients running old Python releases may not be able to clone
132 with the clonebundles facility. 155 with the clonebundles facility.
133 156
134 Value should be "true". 157 Value should be "true".
158
159 heads
160 Used for pull bundles. This contains the ``;`` separated changeset
161 hashes of the heads of the bundle content.
162
163 bases
164 Used for pull bundles. This contains the ``;`` separated changeset
165 hashes of the roots of the bundle content. This can be skipped if
166 the bundle was created without ``--base``.
135 167
136 Manifests can contain multiple entries. Assuming metadata is defined, clients 168 Manifests can contain multiple entries. Assuming metadata is defined, clients
137 will filter entries from the manifest that they don't support. The remaining 169 will filter entries from the manifest that they don't support. The remaining
138 entries are optionally sorted by client preferences 170 entries are optionally sorted by client preferences
139 (``ui.clonebundleprefers`` config option). The client then attempts 171 (``ui.clonebundleprefers`` config option). The client then attempts