mercurial/pure/base85.py
author Gregory Szorc <gregory.szorc@gmail.com>
Sat, 28 Mar 2020 12:18:58 -0700
changeset 44651 00e0c5c06ed5
parent 43667 4cd911040ba5
child 48875 6000f5b25c9b
permissions -rw-r--r--
pycompat: change argv conversion semantics Use of os.fsencode() to convert Python's sys.argv back to bytes was not correct because it isn't the logically inverse operation from what CPython was doing under the hood. This commit changes the logic for doing the str -> bytes conversion. This required a separate implementation for POSIX and Windows. The Windows behavior is arguably not ideal. The previous behavior on Windows was leading to failing tests, such as test-http-branchmap.t, which defines a utf-8 branch name via a command argument. Previously, Mercurial's argument parser looked to be receiving wchar_t bytes in some cases. After this commit, behavior on Windows is compatible with Python 2, where CPython did not implement `int wmain()` and Windows was performing a Unicode to ANSI conversion on the wchar_t native command line. Arguably better behavior on Windows would be for Mercurial to preserve the original Unicode sequence coming from Python and to wrap this in a bytes-like type so we can round trip safely. But, this would be new, backwards incompatible behavior. My goal for this commit was to converge Mercurial behavior on Python 3 on Windows to fix busted tests. And I believe I was successful, as this commit fixes 9 tests on my Windows machine and 14 tests in the AWS CI environment! Differential Revision: https://phab.mercurial-scm.org/D8337
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
     1
# base85.py: pure python base85 codec
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
     2
#
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
     3
# Copyright (C) 2009 Brendan Cully <brendan@kublai.com>
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
     4
#
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7881
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 9029
diff changeset
     6
# GNU General Public License version 2 or any later version.
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
     7
27334
9007f697e8ef base85: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16598
diff changeset
     8
from __future__ import absolute_import
9007f697e8ef base85: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16598
diff changeset
     9
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    10
import struct
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    11
35944
01b4d88ccb24 py3: use pycompat.bytestr to convert _b85chars to bytes
Pulkit Goyal <7895pulkit@gmail.com>
parents: 27334
diff changeset
    12
from .. import pycompat
01b4d88ccb24 py3: use pycompat.bytestr to convert _b85chars to bytes
Pulkit Goyal <7895pulkit@gmail.com>
parents: 27334
diff changeset
    13
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    14
_b85chars = pycompat.bytestr(
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    15
    b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef"
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    16
    b"ghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~"
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    17
)
7835
2505e9f84153 Optimization of pure.base85.b85encode
Mads Kiilerich <mads@kiilerich.com>
parents: 7701
diff changeset
    18
_b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    19
_b85dec = {}
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    20
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    21
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    22
def _mkb85dec():
8632
9e055cfdd620 replace "i in range(len(xs))" with "i, x in enumerate(xs)"
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
    23
    for i, c in enumerate(_b85chars):
9e055cfdd620 replace "i in range(len(xs))" with "i, x in enumerate(xs)"
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
    24
        _b85dec[c] = i
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    25
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    26
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    27
def b85encode(text, pad=False):
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    28
    """encode text in base85 format"""
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    29
    l = len(text)
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    30
    r = l % 4
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    31
    if r:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    32
        text += b'\0' * (4 - r)
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    33
    longs = len(text) >> 2
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    34
    words = struct.unpack(b'>%dL' % longs, text)
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    35
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    36
    out = b''.join(
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    37
        _b85chars[(word // 52200625) % 85]
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    38
        + _b85chars2[(word // 7225) % 7225]
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    39
        + _b85chars2[word % 7225]
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    40
        for word in words
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    41
    )
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    42
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    43
    if pad:
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    44
        return out
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    45
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    46
    # Trim padding
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    47
    olen = l % 4
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    48
    if olen:
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    49
        olen += 1
9029
0001e49f1c11 compat: use // for integer division
Alejandro Santos <alejolp@alejolp.com>
parents: 8632
diff changeset
    50
    olen += l // 4 * 5
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    51
    return out[:olen]
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    52
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    53
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    54
def b85decode(text):
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    55
    """decode base85-encoded text"""
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    56
    if not _b85dec:
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    57
        _mkb85dec()
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    58
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    59
    l = len(text)
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    60
    out = []
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    61
    for i in range(0, len(text), 5):
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    62
        chunk = text[i : i + 5]
36191
80301c90a2dc py3: converts bytes to pycompat.bytestr to get bytechrs while enumerating
Pulkit Goyal <7895pulkit@gmail.com>
parents: 35944
diff changeset
    63
        chunk = pycompat.bytestr(chunk)
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    64
        acc = 0
8632
9e055cfdd620 replace "i in range(len(xs))" with "i, x in enumerate(xs)"
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
    65
        for j, c in enumerate(chunk):
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    66
            try:
8632
9e055cfdd620 replace "i in range(len(xs))" with "i, x in enumerate(xs)"
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
    67
                acc = acc * 85 + _b85dec[c]
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    68
            except KeyError:
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    69
                raise ValueError(
43666
4394687b298b pure: use string for exception in the pure version of base85
Pierre-Yves David <pierre-yves.david@octobus.net>
parents: 43077
diff changeset
    70
                    'bad base85 character at position %d' % (i + j)
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    71
                )
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    72
        if acc > 4294967295:
43667
4cd911040ba5 pure: use string for another exception in the pure version of base85
Pierre-Yves David <pierre-yves.david@octobus.net>
parents: 43666
diff changeset
    73
            raise ValueError('Base85 overflow in hunk starting at byte %d' % i)
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    74
        out.append(acc)
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    75
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    76
    # Pad final chunk if necessary
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    77
    cl = l % 5
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    78
    if cl:
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    79
        acc *= 85 ** (5 - cl)
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    80
        if cl > 1:
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    81
            acc += 0xFFFFFF >> (cl - 2) * 8
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    82
        out[-1] = acc
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    83
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    84
    out = struct.pack(b'>%dL' % (len(out)), *out)
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    85
    if cl:
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36191
diff changeset
    86
        out = out[: -(5 - cl)]
7701
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    87
4bdead043d8d Pure python base85 fallback
Brendan Cully <brendan@kublai.com>
parents:
diff changeset
    88
    return out