view hgext/largefiles/proto.py @ 49658:523cacdfd324

delta-find: set the default candidate chunk size to 10 I ran performance and storage tests on repositories of various sizes and shapes for the following values of the config : 5, 10, 20, 50, 100, no-chunking The performance tests do not show any statistical impact on computation times for large pushes and pulls. For searching for an individual delta, this can provide a significant performance improvement with a minor degradation of space-quality on the result. (see data at the end of the commit). For overall store size, the change : - does not have any impact on many small repositories, - has an observable, but very negligible impact on most larger repositories. - One private repository we use for testing sees a small increase in size (1%) in the narrower version. We will try to get more numbers on a larger version of that repository to make sure nothing pathological happens. We pick "10" as the limit as "5" seems a bit more risky. There are room to improve the current code, by using more aggressive filtering and better (i.e any) sorting of the candidates. However this is already a large improvement for pathological cases, with little impact in the common situations. The initial motivation for this change is to fix performance of delta computation for a file where the previous code ended up testing 20 000 possible candidate-bases in one go, which is… slow. This affected about ½ of the file revisions leading to atrocious performance, especially during some push/pull operations. Details about individual delta finding timing: ---------------------------------------------- The vast majority of benchmark cases are unchanged but the three below. The first two do not see any impact on the final delta. The last one sees a change in delta-size that is negligible compared to the full text size. ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # benchmark.variants.rev = manifest-snapshot-many-tries-a (revision 756096) ∞: 5.844783 5: 4.473523 (-23.46%) 10: 4.970053 (-14.97%) 20: 5.770386 (-1.27%) 50 5.821358 100: 5.834887 MANIFESTLOG: rev = 756096: (no-limit) delta-base = 301840 search-rounds = 6 try-count = 60 delta-type = snapshot snap-depth = 7 delta-size = 179 MANIFESTLOG: rev=756096: (limit = 10) delta-base=301840 search-rounds=9 try-count=51 delta-type=snapshot snap-depth=7 delta-size=179 ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # benchmark.variants.rev = manifest-snapshot-many-tries-d (revision 754060) ∞: 5.017663 5: 3.655931 (-27.14%) 10: 4.095436 (-18.38%) 20: 4.828949 (-3.76%) 50 4.987574 100: 4.994889 MANIFESTLOG: rev=754060: (no limit) delta-base=301840 search-rounds=5 try-count=53 delta-type=snapshot snap-depth=7 delta-size = 179 MANIFESTLOG: rev=754060: (limite = 10) delta-base=301840 search-rounds=8 try-count=45 delta-type=snapshot snap-depth=7 delta-size = 179 ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # bin-env-vars.hg.flavor = rust # benchmark.variants.rev = manifest-snapshot-many-tries-e (revision 693368) ∞: 4.869282 5: 2.039732 (-58.11%) 10: 2.413537 (-50.43%) 20: 4.449639 (-8.62%) 50 4.865863 100: 4.882649 MANIFESTLOG: rev=693368: delta-base=693336 search-rounds=6 try-count=53 delta-type=snapshot snap-depth=6 full-test-size=131065 delta-size=199 MANIFESTLOG: rev=693368: delta-base=278023 search-rounds=5 try-count=21 delta-type=snapshot snap-depth=4 full-test-size=131065 delta-size=278 Raw data for store size (in bytes) for various chunk size value below: ---------------------------------------------------------------------- 440 134 384 5 pypy/.hg/store/ 440 134 384 10 pypy/.hg/store/ 440 134 384 20 pypy/.hg/store/ 440 134 384 50 pypy/.hg/store/ 440 134 384 100 pypy/.hg/store/ 440 134 384 ... pypy/.hg/store/ 666 987 471 5 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 10 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 20 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 50 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 100 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 ... netbsd-xsrc-2022-11-15/.hg/store/ 852 844 884 5 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 10 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 20 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 50 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 100 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 ... netbsd-pkgsrc-2022-11-15/.hg/store/ 1 504 227 981 5 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 871 10 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 20 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 50 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 100 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 ... netbeans-2018-08-01-sparse-zstd/.hg/store/ 3 875 801 068 5 netbsd-src-2022-11-15/.hg/store/ 3 875 696 767 10 netbsd-src-2022-11-15/.hg/store/ 3 875 696 757 20 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 50 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 100 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 ... netbsd-src-2022-11-15/.hg/store/ 4 531 441 314 5 mozilla-central/.hg/store/ 4 531 435 157 10 mozilla-central/.hg/store/ 4 531 432 045 20 mozilla-central/.hg/store/ 4 531 429 119 50 mozilla-central/.hg/store/ 4 531 429 119 100 mozilla-central/.hg/store/ 4 531 429 119 ... mozilla-central/.hg/store/ 4 875 861 390 5 mozilla-unified/.hg/store/ 4 875 855 155 10 mozilla-unified/.hg/store/ 4 875 852 027 20 mozilla-unified/.hg/store/ 4 875 848 851 50 mozilla-unified/.hg/store/ 4 875 848 851 100 mozilla-unified/.hg/store/ 4 875 848 851 ... mozilla-unified/.hg/store/ 11 498 764 601 5 mozilla-try/.hg/store/ 11 497 968 858 10 mozilla-try/.hg/store/ 11 497 958 730 20 mozilla-try/.hg/store/ 11 497 927 156 50 mozilla-try/.hg/store/ 11 497 925 963 100 mozilla-try/.hg/store/ 11 497 923 428 ... mozilla-try/.hg/store/ 10 047 914 031 5 private-repo 9 969 132 101 10 private-repo 9 944 745 015 20 private-repo 9 939 756 703 50 private-repo 9 939 833 016 100 private-repo 9 939 822 035 ... private-repo
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Wed, 23 Nov 2022 19:08:27 +0100
parents 6000f5b25c9b
children bf92386f76fd
line wrap: on
line source

# Copyright 2011 Fog Creek Software
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

import os

from mercurial.i18n import _
from mercurial.pycompat import open

from mercurial import (
    error,
    exthelper,
    httppeer,
    util,
    wireprototypes,
    wireprotov1peer,
    wireprotov1server,
)

from . import lfutil

urlerr = util.urlerr
urlreq = util.urlreq

LARGEFILES_REQUIRED_MSG = (
    b'\nThis repository uses the largefiles extension.'
    b'\n\nPlease enable it in your Mercurial config '
    b'file.\n'
)

eh = exthelper.exthelper()


def putlfile(repo, proto, sha):
    """Server command for putting a largefile into a repository's local store
    and into the user cache."""
    with proto.mayberedirectstdio() as output:
        path = lfutil.storepath(repo, sha)
        util.makedirs(os.path.dirname(path))
        tmpfp = util.atomictempfile(path, createmode=repo.store.createmode)

        try:
            for p in proto.getpayload():
                tmpfp.write(p)
            tmpfp._fp.seek(0)
            if sha != lfutil.hexsha1(tmpfp._fp):
                raise IOError(0, _(b'largefile contents do not match hash'))
            tmpfp.close()
            lfutil.linktousercache(repo, sha)
        except IOError as e:
            repo.ui.warn(
                _(b'largefiles: failed to put %s into store: %s\n')
                % (sha, e.strerror)
            )
            return wireprototypes.pushres(
                1, output.getvalue() if output else b''
            )
        finally:
            tmpfp.discard()

    return wireprototypes.pushres(0, output.getvalue() if output else b'')


def getlfile(repo, proto, sha):
    """Server command for retrieving a largefile from the repository-local
    cache or user cache."""
    filename = lfutil.findfile(repo, sha)
    if not filename:
        raise error.Abort(
            _(b'requested largefile %s not present in cache') % sha
        )
    f = open(filename, b'rb')
    length = os.fstat(f.fileno())[6]

    # Since we can't set an HTTP content-length header here, and
    # Mercurial core provides no way to give the length of a streamres
    # (and reading the entire file into RAM would be ill-advised), we
    # just send the length on the first line of the response, like the
    # ssh proto does for string responses.
    def generator():
        yield b'%d\n' % length
        for chunk in util.filechunkiter(f):
            yield chunk

    return wireprototypes.streamreslegacy(gen=generator())


def statlfile(repo, proto, sha):
    """Server command for checking if a largefile is present - returns '2\n' if
    the largefile is missing, '0\n' if it seems to be in good condition.

    The value 1 is reserved for mismatched checksum, but that is too expensive
    to be verified on every stat and must be caught be running 'hg verify'
    server side."""
    filename = lfutil.findfile(repo, sha)
    if not filename:
        return wireprototypes.bytesresponse(b'2\n')
    return wireprototypes.bytesresponse(b'0\n')


def wirereposetup(ui, repo):
    orig_commandexecutor = repo.commandexecutor

    class lfileswirerepository(repo.__class__):
        def commandexecutor(self):
            executor = orig_commandexecutor()
            if self.capable(b'largefiles'):
                orig_callcommand = executor.callcommand

                class lfscommandexecutor(executor.__class__):
                    def callcommand(self, command, args):
                        if command == b'heads':
                            command = b'lheads'
                        return orig_callcommand(command, args)

                executor.__class__ = lfscommandexecutor
            return executor

        @wireprotov1peer.batchable
        def lheads(self):
            return self.heads.batchable(self)

        def putlfile(self, sha, fd):
            # unfortunately, httprepository._callpush tries to convert its
            # input file-like into a bundle before sending it, so we can't use
            # it ...
            if issubclass(self.__class__, httppeer.httppeer):
                res = self._call(
                    b'putlfile',
                    data=fd,
                    sha=sha,
                    headers={'content-type': 'application/mercurial-0.1'},
                )
                try:
                    d, output = res.split(b'\n', 1)
                    for l in output.splitlines(True):
                        self.ui.warn(_(b'remote: '), l)  # assume l ends with \n
                    return int(d)
                except ValueError:
                    self.ui.warn(_(b'unexpected putlfile response: %r\n') % res)
                    return 1
            # ... but we can't use sshrepository._call because the data=
            # argument won't get sent, and _callpush does exactly what we want
            # in this case: send the data straight through
            else:
                try:
                    ret, output = self._callpush(b"putlfile", fd, sha=sha)
                    if ret == b"":
                        raise error.ResponseError(
                            _(b'putlfile failed:'), output
                        )
                    return int(ret)
                except IOError:
                    return 1
                except ValueError:
                    raise error.ResponseError(
                        _(b'putlfile failed (unexpected response):'), ret
                    )

        def getlfile(self, sha):
            """returns an iterable with the chunks of the file with sha sha"""
            stream = self._callstream(b"getlfile", sha=sha)
            length = stream.readline()
            try:
                length = int(length)
            except ValueError:
                self._abort(
                    error.ResponseError(_(b"unexpected response:"), length)
                )

            # SSH streams will block if reading more than length
            for chunk in util.filechunkiter(stream, limit=length):
                yield chunk
            # HTTP streams must hit the end to process the last empty
            # chunk of Chunked-Encoding so the connection can be reused.
            if issubclass(self.__class__, httppeer.httppeer):
                chunk = stream.read(1)
                if chunk:
                    self._abort(
                        error.ResponseError(_(b"unexpected response:"), chunk)
                    )

        @wireprotov1peer.batchable
        def statlfile(self, sha):
            def decode(d):
                try:
                    return int(d)
                except (ValueError, urlerr.httperror):
                    # If the server returns anything but an integer followed by a
                    # newline, newline, it's not speaking our language; if we get
                    # an HTTP error, we can't be sure the largefile is present;
                    # either way, consider it missing.
                    return 2

            result = {b'sha': sha}
            return result, decode

    repo.__class__ = lfileswirerepository


# advertise the largefiles=serve capability
@eh.wrapfunction(wireprotov1server, b'_capabilities')
def _capabilities(orig, repo, proto):
    '''announce largefile server capability'''
    caps = orig(repo, proto)
    caps.append(b'largefiles=serve')
    return caps


def heads(orig, repo, proto):
    """Wrap server command - largefile capable clients will know to call
    lheads instead"""
    if lfutil.islfilesrepo(repo):
        return wireprototypes.ooberror(LARGEFILES_REQUIRED_MSG)

    return orig(repo, proto)