view doc/gendoc.py @ 49658:523cacdfd324

delta-find: set the default candidate chunk size to 10 I ran performance and storage tests on repositories of various sizes and shapes for the following values of the config : 5, 10, 20, 50, 100, no-chunking The performance tests do not show any statistical impact on computation times for large pushes and pulls. For searching for an individual delta, this can provide a significant performance improvement with a minor degradation of space-quality on the result. (see data at the end of the commit). For overall store size, the change : - does not have any impact on many small repositories, - has an observable, but very negligible impact on most larger repositories. - One private repository we use for testing sees a small increase in size (1%) in the narrower version. We will try to get more numbers on a larger version of that repository to make sure nothing pathological happens. We pick "10" as the limit as "5" seems a bit more risky. There are room to improve the current code, by using more aggressive filtering and better (i.e any) sorting of the candidates. However this is already a large improvement for pathological cases, with little impact in the common situations. The initial motivation for this change is to fix performance of delta computation for a file where the previous code ended up testing 20 000 possible candidate-bases in one go, which is… slow. This affected about ½ of the file revisions leading to atrocious performance, especially during some push/pull operations. Details about individual delta finding timing: ---------------------------------------------- The vast majority of benchmark cases are unchanged but the three below. The first two do not see any impact on the final delta. The last one sees a change in delta-size that is negligible compared to the full text size. ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # benchmark.variants.rev = manifest-snapshot-many-tries-a (revision 756096) ∞: 5.844783 5: 4.473523 (-23.46%) 10: 4.970053 (-14.97%) 20: 5.770386 (-1.27%) 50 5.821358 100: 5.834887 MANIFESTLOG: rev = 756096: (no-limit) delta-base = 301840 search-rounds = 6 try-count = 60 delta-type = snapshot snap-depth = 7 delta-size = 179 MANIFESTLOG: rev=756096: (limit = 10) delta-base=301840 search-rounds=9 try-count=51 delta-type=snapshot snap-depth=7 delta-size=179 ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # benchmark.variants.rev = manifest-snapshot-many-tries-d (revision 754060) ∞: 5.017663 5: 3.655931 (-27.14%) 10: 4.095436 (-18.38%) 20: 4.828949 (-3.76%) 50 4.987574 100: 4.994889 MANIFESTLOG: rev=754060: (no limit) delta-base=301840 search-rounds=5 try-count=53 delta-type=snapshot snap-depth=7 delta-size = 179 MANIFESTLOG: rev=754060: (limite = 10) delta-base=301840 search-rounds=8 try-count=45 delta-type=snapshot snap-depth=7 delta-size = 179 ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-delta-find # bin-env-vars.hg.flavor = rust # benchmark.variants.rev = manifest-snapshot-many-tries-e (revision 693368) ∞: 4.869282 5: 2.039732 (-58.11%) 10: 2.413537 (-50.43%) 20: 4.449639 (-8.62%) 50 4.865863 100: 4.882649 MANIFESTLOG: rev=693368: delta-base=693336 search-rounds=6 try-count=53 delta-type=snapshot snap-depth=6 full-test-size=131065 delta-size=199 MANIFESTLOG: rev=693368: delta-base=278023 search-rounds=5 try-count=21 delta-type=snapshot snap-depth=4 full-test-size=131065 delta-size=278 Raw data for store size (in bytes) for various chunk size value below: ---------------------------------------------------------------------- 440 134 384 5 pypy/.hg/store/ 440 134 384 10 pypy/.hg/store/ 440 134 384 20 pypy/.hg/store/ 440 134 384 50 pypy/.hg/store/ 440 134 384 100 pypy/.hg/store/ 440 134 384 ... pypy/.hg/store/ 666 987 471 5 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 10 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 20 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 50 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 100 netbsd-xsrc-2022-11-15/.hg/store/ 666 987 471 ... netbsd-xsrc-2022-11-15/.hg/store/ 852 844 884 5 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 10 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 20 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 50 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 100 netbsd-pkgsrc-2022-11-15/.hg/store/ 852 844 884 ... netbsd-pkgsrc-2022-11-15/.hg/store/ 1 504 227 981 5 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 871 10 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 20 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 50 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 100 netbeans-2018-08-01-sparse-zstd/.hg/store/ 1 504 227 813 ... netbeans-2018-08-01-sparse-zstd/.hg/store/ 3 875 801 068 5 netbsd-src-2022-11-15/.hg/store/ 3 875 696 767 10 netbsd-src-2022-11-15/.hg/store/ 3 875 696 757 20 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 50 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 100 netbsd-src-2022-11-15/.hg/store/ 3 875 696 653 ... netbsd-src-2022-11-15/.hg/store/ 4 531 441 314 5 mozilla-central/.hg/store/ 4 531 435 157 10 mozilla-central/.hg/store/ 4 531 432 045 20 mozilla-central/.hg/store/ 4 531 429 119 50 mozilla-central/.hg/store/ 4 531 429 119 100 mozilla-central/.hg/store/ 4 531 429 119 ... mozilla-central/.hg/store/ 4 875 861 390 5 mozilla-unified/.hg/store/ 4 875 855 155 10 mozilla-unified/.hg/store/ 4 875 852 027 20 mozilla-unified/.hg/store/ 4 875 848 851 50 mozilla-unified/.hg/store/ 4 875 848 851 100 mozilla-unified/.hg/store/ 4 875 848 851 ... mozilla-unified/.hg/store/ 11 498 764 601 5 mozilla-try/.hg/store/ 11 497 968 858 10 mozilla-try/.hg/store/ 11 497 958 730 20 mozilla-try/.hg/store/ 11 497 927 156 50 mozilla-try/.hg/store/ 11 497 925 963 100 mozilla-try/.hg/store/ 11 497 923 428 ... mozilla-try/.hg/store/ 10 047 914 031 5 private-repo 9 969 132 101 10 private-repo 9 944 745 015 20 private-repo 9 939 756 703 50 private-repo 9 939 833 016 100 private-repo 9 939 822 035 ... private-repo
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Wed, 23 Nov 2022 19:08:27 +0100
parents a932cad26d37
children 76387080f238
line wrap: on
line source

#!/usr/bin/env python3
"""usage: %s DOC ...

where DOC is the name of a document
"""


import os
import sys
import textwrap

try:
    import msvcrt

    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)
except ImportError:
    pass

# This script is executed during installs and may not have C extensions
# available. Relax C module requirements.
os.environ['HGMODULEPOLICY'] = 'allow'
# import from the live mercurial repo
sys.path.insert(0, os.path.abspath(".."))
from mercurial import demandimport

demandimport.enable()

from mercurial import (
    commands,
    encoding,
    extensions,
    fancyopts,
    help,
    minirst,
    pycompat,
    ui as uimod,
)
from mercurial.i18n import (
    gettext,
    _,
)
from mercurial.utils import stringutil

table = commands.table
globalopts = commands.globalopts
helptable = help.helptable
loaddoc = help.loaddoc


def get_desc(docstr):
    if not docstr:
        return b"", b""
    # sanitize
    docstr = docstr.strip(b"\n")
    docstr = docstr.rstrip()
    shortdesc = docstr.splitlines()[0].strip()

    i = docstr.find(b"\n")
    if i != -1:
        desc = docstr[i + 2 :]
    else:
        desc = shortdesc

    desc = textwrap.dedent(desc.decode('latin1')).encode('latin1')

    return (shortdesc, desc)


def get_opts(opts):
    for opt in opts:
        if len(opt) == 5:
            shortopt, longopt, default, desc, optlabel = opt
        else:
            shortopt, longopt, default, desc = opt
            optlabel = _(b"VALUE")
        allopts = []
        if shortopt:
            allopts.append(b"-%s" % shortopt)
        if longopt:
            allopts.append(b"--%s" % longopt)
        if isinstance(default, list):
            allopts[-1] += b" <%s[+]>" % optlabel
        elif (default is not None) and not isinstance(default, bool):
            allopts[-1] += b" <%s>" % optlabel
        if b'\n' in desc:
            # only remove line breaks and indentation
            desc = b' '.join(l.lstrip() for l in desc.split(b'\n'))
        if isinstance(default, fancyopts.customopt):
            default = default.getdefaultvalue()
        if default:
            default = stringutil.forcebytestr(default)
            desc += _(b" (default: %s)") % default
        yield (b", ".join(allopts), desc)


def get_cmd(cmd, cmdtable):
    d = {}
    attr = cmdtable[cmd]
    cmds = cmd.lstrip(b"^").split(b"|")

    d[b'cmd'] = cmds[0]
    d[b'aliases'] = cmd.split(b"|")[1:]
    d[b'desc'] = get_desc(gettext(pycompat.getdoc(attr[0])))
    d[b'opts'] = list(get_opts(attr[1]))

    s = b'hg ' + cmds[0]
    if len(attr) > 2:
        if not attr[2].startswith(b'hg'):
            s += b' ' + attr[2]
        else:
            s = attr[2]
    d[b'synopsis'] = s.strip()

    return d


def showdoc(ui):
    # print options
    ui.write(minirst.section(_(b"Options")))
    multioccur = False
    for optstr, desc in get_opts(globalopts):
        ui.write(b"%s\n    %s\n\n" % (optstr, desc))
        if optstr.endswith(b"[+]>"):
            multioccur = True
    if multioccur:
        ui.write(_(b"\n[+] marked option can be specified multiple times\n"))
        ui.write(b"\n")

    # print cmds
    ui.write(minirst.section(_(b"Commands")))
    commandprinter(ui, table, minirst.subsection, minirst.subsubsection)

    # print help topics
    # The config help topic is included in the hgrc.5 man page.
    helpprinter(ui, helptable, minirst.section, exclude=[b'config'])

    ui.write(minirst.section(_(b"Extensions")))
    ui.write(
        _(
            b"This section contains help for extensions that are "
            b"distributed together with Mercurial. Help for other "
            b"extensions is available in the help system."
        )
    )
    ui.write(
        (
            b"\n\n"
            b".. contents::\n"
            b"   :class: htmlonly\n"
            b"   :local:\n"
            b"   :depth: 1\n\n"
        )
    )

    for extensionname in sorted(allextensionnames()):
        mod = extensions.load(ui, extensionname, None)
        ui.write(minirst.subsection(extensionname))
        ui.write(b"%s\n\n" % gettext(pycompat.getdoc(mod)))
        cmdtable = getattr(mod, 'cmdtable', None)
        if cmdtable:
            ui.write(minirst.subsubsection(_(b'Commands')))
            commandprinter(
                ui,
                cmdtable,
                minirst.subsubsubsection,
                minirst.subsubsubsubsection,
            )


def showtopic(ui, topic):
    extrahelptable = [
        ([b"common"], b'', loaddoc(b'common'), help.TOPIC_CATEGORY_MISC),
        ([b"hg.1"], b'', loaddoc(b'hg.1'), help.TOPIC_CATEGORY_CONFIG),
        ([b"hg-ssh.8"], b'', loaddoc(b'hg-ssh.8'), help.TOPIC_CATEGORY_CONFIG),
        (
            [b"hgignore.5"],
            b'',
            loaddoc(b'hgignore.5'),
            help.TOPIC_CATEGORY_CONFIG,
        ),
        ([b"hgrc.5"], b'', loaddoc(b'hgrc.5'), help.TOPIC_CATEGORY_CONFIG),
        (
            [b"hgignore.5.gendoc"],
            b'',
            loaddoc(b'hgignore'),
            help.TOPIC_CATEGORY_CONFIG,
        ),
        (
            [b"hgrc.5.gendoc"],
            b'',
            loaddoc(b'config'),
            help.TOPIC_CATEGORY_CONFIG,
        ),
    ]
    helpprinter(ui, helptable + extrahelptable, None, include=[topic])


def helpprinter(ui, helptable, sectionfunc, include=[], exclude=[]):
    for h in helptable:
        names, sec, doc = h[0:3]
        if exclude and names[0] in exclude:
            continue
        if include and names[0] not in include:
            continue
        for name in names:
            ui.write(b".. _%s:\n" % name)
        ui.write(b"\n")
        if sectionfunc:
            ui.write(sectionfunc(sec))
        if callable(doc):
            doc = doc(ui)
        ui.write(doc)
        ui.write(b"\n")


def commandprinter(ui, cmdtable, sectionfunc, subsectionfunc):
    """Render restructuredtext describing a list of commands and their
    documentations, grouped by command category.

    Args:
      ui: UI object to write the output to
      cmdtable: a dict that maps a string of the command name plus its aliases
        (separated with pipes) to a 3-tuple of (the command's function, a list
        of its option descriptions, and a string summarizing available
        options). Example, with aliases added for demonstration purposes:

          'phase|alias1|alias2': (
             <function phase at 0x7f0816b05e60>,
             [ ('p', 'public', False, 'set changeset phase to public'),
               ...,
               ('r', 'rev', [], 'target revision', 'REV')],
             '[-p|-d|-s] [-f] [-r] [REV...]'
          )
      sectionfunc: minirst function to format command category headers
      subsectionfunc: minirst function to format command headers
    """
    h = {}
    for c, attr in cmdtable.items():
        f = c.split(b"|")[0]
        f = f.lstrip(b"^")
        h[f] = c
    cmds = h.keys()

    def helpcategory(cmd):
        """Given a canonical command name from `cmds` (above), retrieve its
        help category. If helpcategory is None, default to CATEGORY_NONE.
        """
        fullname = h[cmd]
        details = cmdtable[fullname]
        helpcategory = details[0].helpcategory
        return helpcategory or help.registrar.command.CATEGORY_NONE

    cmdsbycategory = {category: [] for category in help.CATEGORY_ORDER}
    for cmd in cmds:
        # If a command category wasn't registered, the command won't get
        # rendered below, so we raise an AssertionError.
        if helpcategory(cmd) not in cmdsbycategory:
            raise AssertionError(
                "The following command did not register its (category) in "
                "help.CATEGORY_ORDER: %s (%s)" % (cmd, helpcategory(cmd))
            )
        cmdsbycategory[helpcategory(cmd)].append(cmd)

    # Print the help for each command. We present the commands grouped by
    # category, and we use help.CATEGORY_ORDER as a guide for a helpful order
    # in which to present the categories.
    for category in help.CATEGORY_ORDER:
        categorycmds = cmdsbycategory[category]
        if not categorycmds:
            # Skip empty categories
            continue
        # Print a section header for the category.
        # For now, the category header is at the same level as the headers for
        # the commands in the category; this is fixed in the next commit.
        ui.write(sectionfunc(help.CATEGORY_NAMES[category]))
        # Print each command in the category
        for f in sorted(categorycmds):
            if f.startswith(b"debug"):
                continue
            d = get_cmd(h[f], cmdtable)
            ui.write(subsectionfunc(d[b'cmd']))
            # short description
            ui.write(d[b'desc'][0])
            # synopsis
            ui.write(b"::\n\n")
            synopsislines = d[b'synopsis'].splitlines()
            for line in synopsislines:
                # some commands (such as rebase) have a multi-line
                # synopsis
                ui.write(b"   %s\n" % line)
            ui.write(b'\n')
            # description
            ui.write(b"%s\n\n" % d[b'desc'][1])
            # options
            opt_output = list(d[b'opts'])
            if opt_output:
                opts_len = max([len(line[0]) for line in opt_output])
                ui.write(_(b"Options:\n\n"))
                multioccur = False
                for optstr, desc in opt_output:
                    if desc:
                        s = b"%-*s  %s" % (opts_len, optstr, desc)
                    else:
                        s = optstr
                    ui.write(b"%s\n" % s)
                    if optstr.endswith(b"[+]>"):
                        multioccur = True
                if multioccur:
                    ui.write(
                        _(
                            b"\n[+] marked option can be specified"
                            b" multiple times\n"
                        )
                    )
                ui.write(b"\n")
            # aliases
            if d[b'aliases']:
                # Note the empty comment, this is required to separate this
                # (which should be a blockquote) from any preceding things (such
                # as a definition list).
                ui.write(
                    _(b"..\n\n    aliases: %s\n\n") % b" ".join(d[b'aliases'])
                )


def allextensionnames():
    return set(extensions.enabled().keys()) | set(extensions.disabled().keys())


if __name__ == "__main__":
    doc = b'hg.1.gendoc'
    if len(sys.argv) > 1:
        doc = encoding.strtolocal(sys.argv[1])

    ui = uimod.ui.load()
    # Trigger extensions to load. This is disabled by default because it uses
    # the current user's configuration, which is often not what is wanted.
    if encoding.environ.get(b'GENDOC_LOAD_CONFIGURED_EXTENSIONS', b'0') != b'0':
        extensions.loadall(ui)

    if doc == b'hg.1.gendoc':
        showdoc(ui)
    else:
        showtopic(ui, encoding.strtolocal(sys.argv[1]))