Mercurial > hg
view hgext/largefiles/proto.py @ 49658:523cacdfd324
delta-find: set the default candidate chunk size to 10
I ran performance and storage tests on repositories of various sizes and shapes
for the following values of the config : 5, 10, 20, 50, 100, no-chunking
The performance tests do not show any statistical impact on computation
times for large pushes and pulls.
For searching for an individual delta, this can provide a significant
performance improvement with a minor degradation of space-quality on the
result. (see data at the end of the commit).
For overall store size, the change :
- does not have any impact on many small repositories,
- has an observable, but very negligible impact on most larger repositories.
- One private repository we use for testing sees a small increase in size
(1%) in the narrower version.
We will try to get more numbers on a larger version of that repository to
make sure nothing pathological happens.
We pick "10" as the limit as "5" seems a bit more risky.
There are room to improve the current code, by using more aggressive filtering
and better (i.e any) sorting of the candidates. However this is already a large
improvement for pathological cases, with little impact in the common
situations.
The initial motivation for this change is to fix performance of delta
computation for a file where the previous code ended up testing 20 000 possible
candidate-bases in one go, which is… slow. This affected about ½ of the file
revisions leading to atrocious performance, especially during some push/pull
operations.
Details about individual delta finding timing:
----------------------------------------------
The vast majority of benchmark cases are unchanged but the three below. The first
two do not see any impact on the final delta. The last one sees a change in
delta-size that is negligible compared to the full text size.
### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog
# benchmark.name = perf-delta-find
# benchmark.variants.rev = manifest-snapshot-many-tries-a (revision 756096)
∞: 5.844783
5: 4.473523 (-23.46%)
10: 4.970053 (-14.97%)
20: 5.770386 (-1.27%)
50 5.821358
100: 5.834887
MANIFESTLOG: rev = 756096: (no-limit)
delta-base = 301840
search-rounds = 6
try-count = 60
delta-type = snapshot
snap-depth = 7
delta-size = 179
MANIFESTLOG: rev=756096: (limit = 10)
delta-base=301840
search-rounds=9
try-count=51
delta-type=snapshot
snap-depth=7
delta-size=179
### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog
# benchmark.name = perf-delta-find
# benchmark.variants.rev = manifest-snapshot-many-tries-d (revision 754060)
∞: 5.017663
5: 3.655931 (-27.14%)
10: 4.095436 (-18.38%)
20: 4.828949 (-3.76%)
50 4.987574
100: 4.994889
MANIFESTLOG: rev=754060: (no limit)
delta-base=301840
search-rounds=5
try-count=53
delta-type=snapshot
snap-depth=7
delta-size = 179
MANIFESTLOG: rev=754060: (limite = 10)
delta-base=301840
search-rounds=8
try-count=45
delta-type=snapshot
snap-depth=7
delta-size = 179
### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog
# benchmark.name = perf-delta-find
# bin-env-vars.hg.flavor = rust
# benchmark.variants.rev = manifest-snapshot-many-tries-e (revision 693368)
∞: 4.869282
5: 2.039732 (-58.11%)
10: 2.413537 (-50.43%)
20: 4.449639 (-8.62%)
50 4.865863
100: 4.882649
MANIFESTLOG: rev=693368:
delta-base=693336
search-rounds=6
try-count=53
delta-type=snapshot
snap-depth=6
full-test-size=131065
delta-size=199
MANIFESTLOG: rev=693368:
delta-base=278023
search-rounds=5
try-count=21
delta-type=snapshot
snap-depth=4
full-test-size=131065
delta-size=278
Raw data for store size (in bytes) for various chunk size value below:
----------------------------------------------------------------------
440 134 384 5 pypy/.hg/store/
440 134 384 10 pypy/.hg/store/
440 134 384 20 pypy/.hg/store/
440 134 384 50 pypy/.hg/store/
440 134 384 100 pypy/.hg/store/
440 134 384 ... pypy/.hg/store/
666 987 471 5 netbsd-xsrc-2022-11-15/.hg/store/
666 987 471 10 netbsd-xsrc-2022-11-15/.hg/store/
666 987 471 20 netbsd-xsrc-2022-11-15/.hg/store/
666 987 471 50 netbsd-xsrc-2022-11-15/.hg/store/
666 987 471 100 netbsd-xsrc-2022-11-15/.hg/store/
666 987 471 ... netbsd-xsrc-2022-11-15/.hg/store/
852 844 884 5 netbsd-pkgsrc-2022-11-15/.hg/store/
852 844 884 10 netbsd-pkgsrc-2022-11-15/.hg/store/
852 844 884 20 netbsd-pkgsrc-2022-11-15/.hg/store/
852 844 884 50 netbsd-pkgsrc-2022-11-15/.hg/store/
852 844 884 100 netbsd-pkgsrc-2022-11-15/.hg/store/
852 844 884 ... netbsd-pkgsrc-2022-11-15/.hg/store/
1 504 227 981 5 netbeans-2018-08-01-sparse-zstd/.hg/store/
1 504 227 871 10 netbeans-2018-08-01-sparse-zstd/.hg/store/
1 504 227 813 20 netbeans-2018-08-01-sparse-zstd/.hg/store/
1 504 227 813 50 netbeans-2018-08-01-sparse-zstd/.hg/store/
1 504 227 813 100 netbeans-2018-08-01-sparse-zstd/.hg/store/
1 504 227 813 ... netbeans-2018-08-01-sparse-zstd/.hg/store/
3 875 801 068 5 netbsd-src-2022-11-15/.hg/store/
3 875 696 767 10 netbsd-src-2022-11-15/.hg/store/
3 875 696 757 20 netbsd-src-2022-11-15/.hg/store/
3 875 696 653 50 netbsd-src-2022-11-15/.hg/store/
3 875 696 653 100 netbsd-src-2022-11-15/.hg/store/
3 875 696 653 ... netbsd-src-2022-11-15/.hg/store/
4 531 441 314 5 mozilla-central/.hg/store/
4 531 435 157 10 mozilla-central/.hg/store/
4 531 432 045 20 mozilla-central/.hg/store/
4 531 429 119 50 mozilla-central/.hg/store/
4 531 429 119 100 mozilla-central/.hg/store/
4 531 429 119 ... mozilla-central/.hg/store/
4 875 861 390 5 mozilla-unified/.hg/store/
4 875 855 155 10 mozilla-unified/.hg/store/
4 875 852 027 20 mozilla-unified/.hg/store/
4 875 848 851 50 mozilla-unified/.hg/store/
4 875 848 851 100 mozilla-unified/.hg/store/
4 875 848 851 ... mozilla-unified/.hg/store/
11 498 764 601 5 mozilla-try/.hg/store/
11 497 968 858 10 mozilla-try/.hg/store/
11 497 958 730 20 mozilla-try/.hg/store/
11 497 927 156 50 mozilla-try/.hg/store/
11 497 925 963 100 mozilla-try/.hg/store/
11 497 923 428 ... mozilla-try/.hg/store/
10 047 914 031 5 private-repo
9 969 132 101 10 private-repo
9 944 745 015 20 private-repo
9 939 756 703 50 private-repo
9 939 833 016 100 private-repo
9 939 822 035 ... private-repo
author | Pierre-Yves David <pierre-yves.david@octobus.net> |
---|---|
date | Wed, 23 Nov 2022 19:08:27 +0100 |
parents | 6000f5b25c9b |
children | bf92386f76fd |
line wrap: on
line source
# Copyright 2011 Fog Creek Software # # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. import os from mercurial.i18n import _ from mercurial.pycompat import open from mercurial import ( error, exthelper, httppeer, util, wireprototypes, wireprotov1peer, wireprotov1server, ) from . import lfutil urlerr = util.urlerr urlreq = util.urlreq LARGEFILES_REQUIRED_MSG = ( b'\nThis repository uses the largefiles extension.' b'\n\nPlease enable it in your Mercurial config ' b'file.\n' ) eh = exthelper.exthelper() def putlfile(repo, proto, sha): """Server command for putting a largefile into a repository's local store and into the user cache.""" with proto.mayberedirectstdio() as output: path = lfutil.storepath(repo, sha) util.makedirs(os.path.dirname(path)) tmpfp = util.atomictempfile(path, createmode=repo.store.createmode) try: for p in proto.getpayload(): tmpfp.write(p) tmpfp._fp.seek(0) if sha != lfutil.hexsha1(tmpfp._fp): raise IOError(0, _(b'largefile contents do not match hash')) tmpfp.close() lfutil.linktousercache(repo, sha) except IOError as e: repo.ui.warn( _(b'largefiles: failed to put %s into store: %s\n') % (sha, e.strerror) ) return wireprototypes.pushres( 1, output.getvalue() if output else b'' ) finally: tmpfp.discard() return wireprototypes.pushres(0, output.getvalue() if output else b'') def getlfile(repo, proto, sha): """Server command for retrieving a largefile from the repository-local cache or user cache.""" filename = lfutil.findfile(repo, sha) if not filename: raise error.Abort( _(b'requested largefile %s not present in cache') % sha ) f = open(filename, b'rb') length = os.fstat(f.fileno())[6] # Since we can't set an HTTP content-length header here, and # Mercurial core provides no way to give the length of a streamres # (and reading the entire file into RAM would be ill-advised), we # just send the length on the first line of the response, like the # ssh proto does for string responses. def generator(): yield b'%d\n' % length for chunk in util.filechunkiter(f): yield chunk return wireprototypes.streamreslegacy(gen=generator()) def statlfile(repo, proto, sha): """Server command for checking if a largefile is present - returns '2\n' if the largefile is missing, '0\n' if it seems to be in good condition. The value 1 is reserved for mismatched checksum, but that is too expensive to be verified on every stat and must be caught be running 'hg verify' server side.""" filename = lfutil.findfile(repo, sha) if not filename: return wireprototypes.bytesresponse(b'2\n') return wireprototypes.bytesresponse(b'0\n') def wirereposetup(ui, repo): orig_commandexecutor = repo.commandexecutor class lfileswirerepository(repo.__class__): def commandexecutor(self): executor = orig_commandexecutor() if self.capable(b'largefiles'): orig_callcommand = executor.callcommand class lfscommandexecutor(executor.__class__): def callcommand(self, command, args): if command == b'heads': command = b'lheads' return orig_callcommand(command, args) executor.__class__ = lfscommandexecutor return executor @wireprotov1peer.batchable def lheads(self): return self.heads.batchable(self) def putlfile(self, sha, fd): # unfortunately, httprepository._callpush tries to convert its # input file-like into a bundle before sending it, so we can't use # it ... if issubclass(self.__class__, httppeer.httppeer): res = self._call( b'putlfile', data=fd, sha=sha, headers={'content-type': 'application/mercurial-0.1'}, ) try: d, output = res.split(b'\n', 1) for l in output.splitlines(True): self.ui.warn(_(b'remote: '), l) # assume l ends with \n return int(d) except ValueError: self.ui.warn(_(b'unexpected putlfile response: %r\n') % res) return 1 # ... but we can't use sshrepository._call because the data= # argument won't get sent, and _callpush does exactly what we want # in this case: send the data straight through else: try: ret, output = self._callpush(b"putlfile", fd, sha=sha) if ret == b"": raise error.ResponseError( _(b'putlfile failed:'), output ) return int(ret) except IOError: return 1 except ValueError: raise error.ResponseError( _(b'putlfile failed (unexpected response):'), ret ) def getlfile(self, sha): """returns an iterable with the chunks of the file with sha sha""" stream = self._callstream(b"getlfile", sha=sha) length = stream.readline() try: length = int(length) except ValueError: self._abort( error.ResponseError(_(b"unexpected response:"), length) ) # SSH streams will block if reading more than length for chunk in util.filechunkiter(stream, limit=length): yield chunk # HTTP streams must hit the end to process the last empty # chunk of Chunked-Encoding so the connection can be reused. if issubclass(self.__class__, httppeer.httppeer): chunk = stream.read(1) if chunk: self._abort( error.ResponseError(_(b"unexpected response:"), chunk) ) @wireprotov1peer.batchable def statlfile(self, sha): def decode(d): try: return int(d) except (ValueError, urlerr.httperror): # If the server returns anything but an integer followed by a # newline, newline, it's not speaking our language; if we get # an HTTP error, we can't be sure the largefile is present; # either way, consider it missing. return 2 result = {b'sha': sha} return result, decode repo.__class__ = lfileswirerepository # advertise the largefiles=serve capability @eh.wrapfunction(wireprotov1server, b'_capabilities') def _capabilities(orig, repo, proto): '''announce largefile server capability''' caps = orig(repo, proto) caps.append(b'largefiles=serve') return caps def heads(orig, repo, proto): """Wrap server command - largefile capable clients will know to call lheads instead""" if lfutil.islfilesrepo(repo): return wireprototypes.ooberror(LARGEFILES_REQUIRED_MSG) return orig(repo, proto)