mercurial/i18n.py
author Georges Racinet <georges.racinet@octobus.net>
Wed, 20 Feb 2019 09:04:54 +0100
changeset 42761 4d20b1fe8a72
parent 40254 dd83aafdb64a
child 43076 2372284d9457
permissions -rw-r--r--
rust-discovery: using from Python code As previously done in other topics, the Rust version is used if it's been built. The version fully in Rust of the partialdiscovery class has the performance advantage over the Python version (actually using the Rust MissingAncestor) if the undecided set is big enough. Otherwise no sampling occurs, and the discovery is reasonably fast anyway. Note: it's hard to predict the size of the initial undecided set, it can depend on the kind of topological changes between the local and remote graphs. The point of the Rust version is to make the bad cases acceptable. More specifically, the performance advantages are: - faster sampling, especially takefullsample() - much faster addmissings() in almost all cases (see commit message in grandparent of the present changeset) - no conversion cost of the undecided set at the interface between Rust and Python == Measurements with big undecided sets For an extreme example, discovery between mozilla-try and mozilla-unified (over one million undecided revisions, same case as in dbd0fcca6dfc), we get roughly a x2.5/x3 better performance: Growing sample size (5% starting with 200): time goes down from 210 to 72 seconds. Constant sample size of 200: time down from 1853 to 659 seconds. With a sample size computed from number of roots and heads of the undecided set (`respectsize` is `False`), here are perfdiscovery results: Before ! wall 9.358729 comb 9.360000 user 9.310000 sys 0.050000 (median of 50) After ! wall 3.793819 comb 3.790000 user 3.750000 sys 0.040000 (median of 50) In that later case, the sample sizes are routinely in the hundreds of thousands of revisions. While still faster, the Rust iteration in addmissings has less of an advantage than with smaller sample sizes, but one sees addcommons becoming faster, probably a consequence of not having to copy big sets back and forth. This example is not a goal in itself, but it showcases several different areas in which the process can become slow, due to different factors, and how this full Rust version can help. == Measurements with small undecided sets In cases the undecided set is small enough than no sampling occurs, the Rust version has a disadvantage at init if `targetheads` is really big (some time is lost in the translation to Rust data structures), and that is compensated by the faster `addmissings()`. On a private repository with over one million commits, we still get a minor improvement, of 6.8%: Before ! wall 0.593585 comb 0.590000 user 0.550000 sys 0.040000 (median of 50) After ! wall 0.553035 comb 0.550000 user 0.520000 sys 0.030000 (median of 50) What's interesting in that case is the first addinfo() at 180ms for Rust and 233ms for Python+C, mostly due to add_missings and the children cache computation being done in less than 0.2ms on the Rust side vs over 40ms on the Python side. The worst case we have on hand is with mozilla-try, prepared with discovery-helper.sh for 10 heads and depth 10, time goes up 2.2% on the median. In this case `targetheads` is really huge with 165842 server heads. Before ! wall 0.823884 comb 0.810000 user 0.790000 sys 0.020000 (median of 50) After ! wall 0.842607 comb 0.840000 user 0.800000 sys 0.040000 (median of 50) If that would be considered a problem, more adjustments can be made, which are prematurate at this stage: cooking special variants of methods of the inner MissingAncestors object, retrieving local heads directly from Rust to avoid the cost of conversion. Effort would probably be better spent at this point improving the surroundings if needed. Here's another data point with a smaller repository, pypy, where performance is almost identical Before ! wall 0.015121 comb 0.030000 user 0.020000 sys 0.010000 (median of 186) After ! wall 0.015009 comb 0.010000 user 0.010000 sys 0.000000 (median of 184) Differential Revision: https://phab.mercurial-scm.org/D6430
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8226
8b2cd04a6e97 put license and copyright info into comment blocks
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
     1
# i18n.py - internationalization support for mercurial
8b2cd04a6e97 put license and copyright info into comment blocks
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
     2
#
8b2cd04a6e97 put license and copyright info into comment blocks
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
     3
# Copyright 2005, 2006 Matt Mackall <mpm@selenic.com>
8b2cd04a6e97 put license and copyright info into comment blocks
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
     4
#
8b2cd04a6e97 put license and copyright info into comment blocks
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 9538
diff changeset
     6
# GNU General Public License version 2 or any later version.
1400
cf9a1233738a i18n first part: make '_' available for files who need it
Benoit Boissinot <benoit.boissinot@ens-lyon.org
parents:
diff changeset
     7
25955
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
     8
from __future__ import absolute_import
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
     9
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
    10
import gettext as gettextmod
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
    11
import locale
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
    12
import os
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
    13
import sys
2c07c6884394 i18n: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 23031
diff changeset
    14
30050
d229be12e256 py3: convert to unicode to pass into encode()
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30035
diff changeset
    15
from . import (
d229be12e256 py3: convert to unicode to pass into encode()
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30035
diff changeset
    16
    encoding,
d229be12e256 py3: convert to unicode to pass into encode()
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30035
diff changeset
    17
    pycompat,
d229be12e256 py3: convert to unicode to pass into encode()
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30035
diff changeset
    18
)
7650
85ae7aaf08e9 i18n: lookup .mo files in private locale/ directory
Martin Geisler <mg@daimi.au.dk>
parents: 3888
diff changeset
    19
85ae7aaf08e9 i18n: lookup .mo files in private locale/ directory
Martin Geisler <mg@daimi.au.dk>
parents: 3888
diff changeset
    20
# modelled after templater.templatepath:
14975
b64538363dbe i18n: use getattr instead of hasattr
Augie Fackler <durin42@gmail.com>
parents: 13849
diff changeset
    21
if getattr(sys, 'frozen', None) is not None:
30672
10b17ed9b591 py3: replace sys.executable with pycompat.sysexecutable
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30644
diff changeset
    22
    module = pycompat.sysexecutable
7650
85ae7aaf08e9 i18n: lookup .mo files in private locale/ directory
Martin Geisler <mg@daimi.au.dk>
parents: 3888
diff changeset
    23
else:
31091
2912b06905dc py3: use pycompat.fsencode() to convert __file__ to bytes
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30672
diff changeset
    24
    module = pycompat.fsencode(__file__)
7650
85ae7aaf08e9 i18n: lookup .mo files in private locale/ directory
Martin Geisler <mg@daimi.au.dk>
parents: 3888
diff changeset
    25
21987
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    26
_languages = None
34645
75979c8d4572 codemod: use pycompat.iswindows
Jun Wu <quark@fb.com>
parents: 31091
diff changeset
    27
if (pycompat.iswindows
30035
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    28
    and 'LANGUAGE' not in encoding.environ
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    29
    and 'LC_ALL' not in encoding.environ
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    30
    and 'LC_MESSAGES' not in encoding.environ
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    31
    and 'LANG' not in encoding.environ):
21987
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    32
    # Try to detect UI language by "User Interface Language Management" API
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    33
    # if no locale variables are set. Note that locale.getdefaultlocale()
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    34
    # uses GetLocaleInfo(), which may be different from UI language.
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    35
    # (See http://msdn.microsoft.com/en-us/library/dd374098(v=VS.85).aspx )
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    36
    try:
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    37
        import ctypes
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    38
        langid = ctypes.windll.kernel32.GetUserDefaultUILanguage()
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    39
        _languages = [locale.windows_locale[langid]]
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    40
    except (ImportError, AttributeError, KeyError):
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    41
        # ctypes not found or unknown langid
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    42
        pass
4953cd193e84 i18n: detect UI language without POSIX-style locale variable on Windows (BC)
Yuya Nishihara <yuya@tcha.org>
parents: 21746
diff changeset
    43
22638
0d0350cfc7ab i18n: use datapath for i18n like for templates and help
Mads Kiilerich <madski@unity3d.com>
parents: 21987
diff changeset
    44
_ugettext = None
0d0350cfc7ab i18n: use datapath for i18n like for templates and help
Mads Kiilerich <madski@unity3d.com>
parents: 21987
diff changeset
    45
0d0350cfc7ab i18n: use datapath for i18n like for templates and help
Mads Kiilerich <madski@unity3d.com>
parents: 21987
diff changeset
    46
def setdatapath(datapath):
30314
8321b083a83d py3: make util.datapath a bytes variable
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30085
diff changeset
    47
    datapath = pycompat.fsdecode(datapath)
36843
5bc7ff103081 py3: use r'' instead of sysstr('') to get around code transformer
Yuya Nishihara <yuya@tcha.org>
parents: 36717
diff changeset
    48
    localedir = os.path.join(datapath, r'locale')
36717
aeaf9c7f7528 py3: make gettext domain a system string
Yuya Nishihara <yuya@tcha.org>
parents: 34660
diff changeset
    49
    t = gettextmod.translation(r'hg', localedir, _languages, fallback=True)
22638
0d0350cfc7ab i18n: use datapath for i18n like for templates and help
Mads Kiilerich <madski@unity3d.com>
parents: 21987
diff changeset
    50
    global _ugettext
28674
03d1ecbbd81e py3: handle ugettext + unicode in i18n
timeless <timeless@mozdev.org>
parents: 25955
diff changeset
    51
    try:
03d1ecbbd81e py3: handle ugettext + unicode in i18n
timeless <timeless@mozdev.org>
parents: 25955
diff changeset
    52
        _ugettext = t.ugettext
03d1ecbbd81e py3: handle ugettext + unicode in i18n
timeless <timeless@mozdev.org>
parents: 25955
diff changeset
    53
    except AttributeError:
03d1ecbbd81e py3: handle ugettext + unicode in i18n
timeless <timeless@mozdev.org>
parents: 25955
diff changeset
    54
        _ugettext = t.gettext
7651
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    55
34660
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    56
_msgcache = {}  # encoding: {message: translation}
23031
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    57
7651
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    58
def gettext(message):
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    59
    """Translate message.
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    60
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    61
    The message is looked up in the catalog to get a Unicode string,
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    62
    which is encoded in the local encoding before being returned.
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    63
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    64
    Important: message is restricted to characters in the encoding
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    65
    given by sys.getdefaultencoding() which is most likely 'ascii'.
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    66
    """
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    67
    # If message is None, t.ugettext will return u'None' as the
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    68
    # translation whereas our callers expect us to return None.
22638
0d0350cfc7ab i18n: use datapath for i18n like for templates and help
Mads Kiilerich <madski@unity3d.com>
parents: 21987
diff changeset
    69
    if message is None or not _ugettext:
7651
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    70
        return message
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    71
34660
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    72
    cache = _msgcache.setdefault(encoding.encoding, {})
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    73
    if message not in cache:
38321
79dd61a4554f py3: replace `unicode` with pycompat.unicode
Pulkit Goyal <7895pulkit@gmail.com>
parents: 36843
diff changeset
    74
        if type(message) is pycompat.unicode:
23031
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    75
            # goofy unicode docstrings in test
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    76
            paragraphs = message.split(u'\n\n')
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    77
        else:
40254
dd83aafdb64a py3: get around unicode docstrings in test-encoding-textwrap.t and test-help.t
Yuya Nishihara <yuya@tcha.org>
parents: 38321
diff changeset
    78
            # should be ascii, but we have unicode docstrings in test, which
dd83aafdb64a py3: get around unicode docstrings in test-encoding-textwrap.t and test-help.t
Yuya Nishihara <yuya@tcha.org>
parents: 38321
diff changeset
    79
            # are converted to utf-8 bytes on Python 3.
dd83aafdb64a py3: get around unicode docstrings in test-encoding-textwrap.t and test-help.t
Yuya Nishihara <yuya@tcha.org>
parents: 38321
diff changeset
    80
            paragraphs = [p.decode("utf-8") for p in message.split('\n\n')]
23031
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    81
        # Be careful not to translate the empty string -- it holds the
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    82
        # meta data of the .po file.
29415
47fb4beb992b i18n: use unicode literal
Gregory Szorc <gregory.szorc@gmail.com>
parents: 28674
diff changeset
    83
        u = u'\n\n'.join([p and _ugettext(p) or u'' for p in paragraphs])
23031
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    84
        try:
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    85
            # encoding.tolocal cannot be used since it will first try to
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    86
            # decode the Unicode string. Calling u.decode(enc) really
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    87
            # means u.encode(sys.getdefaultencoding()).decode(enc). Since
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    88
            # the Python encoding defaults to 'ascii', this fails if the
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    89
            # translated string use non-ASCII characters.
30050
d229be12e256 py3: convert to unicode to pass into encode()
Pulkit Goyal <7895pulkit@gmail.com>
parents: 30035
diff changeset
    90
            encodingstr = pycompat.sysstr(encoding.encoding)
34660
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    91
            cache[message] = u.encode(encodingstr, "replace")
23031
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    92
        except LookupError:
3c0983cc279e i18n: cache the result of every gettext call
Augie Fackler <raf@durin42.com>
parents: 22638
diff changeset
    93
            # An unknown encoding results in a LookupError.
34660
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    94
            cache[message] = message
d00ec62d156f i18n: cache translated messages per encoding
Yuya Nishihara <yuya@tcha.org>
parents: 34645
diff changeset
    95
    return cache[message]
7651
5b5036ef847a i18n: encode output in user's local encoding
Martin Geisler <mg@daimi.au.dk>
parents: 7650
diff changeset
    96
13849
9f97de157aad HGPLAIN: allow exceptions to plain mode, like i18n, via HGPLAINEXCEPT
Brodie Rao <brodie@bitheap.org>
parents: 11403
diff changeset
    97
def _plain():
30035
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    98
    if ('HGPLAIN' not in encoding.environ
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
    99
        and 'HGPLAINEXCEPT' not in encoding.environ):
13849
9f97de157aad HGPLAIN: allow exceptions to plain mode, like i18n, via HGPLAINEXCEPT
Brodie Rao <brodie@bitheap.org>
parents: 11403
diff changeset
   100
        return False
30035
02328b5d775d py3: make i18n use encoding.environ
Yuya Nishihara <yuya@tcha.org>
parents: 29415
diff changeset
   101
    exceptions = encoding.environ.get('HGPLAINEXCEPT', '').strip().split(',')
13849
9f97de157aad HGPLAIN: allow exceptions to plain mode, like i18n, via HGPLAINEXCEPT
Brodie Rao <brodie@bitheap.org>
parents: 11403
diff changeset
   102
    return 'i18n' not in exceptions
9f97de157aad HGPLAIN: allow exceptions to plain mode, like i18n, via HGPLAINEXCEPT
Brodie Rao <brodie@bitheap.org>
parents: 11403
diff changeset
   103
9f97de157aad HGPLAIN: allow exceptions to plain mode, like i18n, via HGPLAINEXCEPT
Brodie Rao <brodie@bitheap.org>
parents: 11403
diff changeset
   104
if _plain():
10455
40dfd46d098f ui: add HGPLAIN environment variable for easier scripting
Brodie Rao <me+hg@dackz.net>
parents: 10263
diff changeset
   105
    _ = lambda message: message
40dfd46d098f ui: add HGPLAIN environment variable for easier scripting
Brodie Rao <me+hg@dackz.net>
parents: 10263
diff changeset
   106
else:
40dfd46d098f ui: add HGPLAIN environment variable for easier scripting
Brodie Rao <me+hg@dackz.net>
parents: 10263
diff changeset
   107
    _ = gettext