view mercurial/pure/base85.py @ 51723:9367571fea21

cext: correct the argument handling of `b85encode()` The type stub indicated that this argument is `Optional`, which implies None is allowed. I don't see in the documentation where that's the case for `i`[1], and trying it in `hg debugshell` resulted in the method failing with a TypeError. I guess it was typed as an `int` argument because the `p` format unit wasn't added until Python 3.3[2]. In any event, 2 clients in core (`pvec` and `obsolete`) call this with no argument supplied, and `mdiff` calls it with True. So I guess we've avoided the None arg case, and when no arg is supplied, it defaults to the 0 initialization of the `pad` variable in C. Since the `p` format unit accepts both `int` and None, as well as `bool`, I'm not bothering to bump the module version- this code is more permissive than it was, in addition to being more correct. Interestingly, when I first imported the `cext` and `pure` methods in the same manner as the previous commit, it dropped the `Optional` part of the argument type when generating `util.pyi`. No idea why. [1] https://docs.python.org/3/c-api/arg.html#numbers [2] https://docs.python.org/3/c-api/arg.html#other-objects
author Matt Harbison <matt_harbison@yahoo.com>
date Sat, 20 Jul 2024 01:55:09 -0400
parents 6000f5b25c9b
children f4733654f144
line wrap: on
line source

# base85.py: pure python base85 codec
#
# Copyright (C) 2009 Brendan Cully <brendan@kublai.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.


import struct

from .. import pycompat

_b85chars = pycompat.bytestr(
    b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef"
    b"ghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~"
)
_b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]
_b85dec = {}


def _mkb85dec():
    for i, c in enumerate(_b85chars):
        _b85dec[c] = i


def b85encode(text: bytes, pad: bool = False) -> bytes:
    """encode text in base85 format"""
    l = len(text)
    r = l % 4
    if r:
        text += b'\0' * (4 - r)
    longs = len(text) >> 2
    words = struct.unpack(b'>%dL' % longs, text)

    out = b''.join(
        _b85chars[(word // 52200625) % 85]
        + _b85chars2[(word // 7225) % 7225]
        + _b85chars2[word % 7225]
        for word in words
    )

    if pad:
        return out

    # Trim padding
    olen = l % 4
    if olen:
        olen += 1
    olen += l // 4 * 5
    return out[:olen]


def b85decode(text: bytes) -> bytes:
    """decode base85-encoded text"""
    if not _b85dec:
        _mkb85dec()

    l = len(text)
    out = []
    for i in range(0, len(text), 5):
        chunk = text[i : i + 5]
        chunk = pycompat.bytestr(chunk)
        acc = 0
        for j, c in enumerate(chunk):
            try:
                acc = acc * 85 + _b85dec[c]
            except KeyError:
                raise ValueError(
                    'bad base85 character at position %d' % (i + j)
                )
        if acc > 4294967295:
            raise ValueError('Base85 overflow in hunk starting at byte %d' % i)
        out.append(acc)

    # Pad final chunk if necessary
    cl = l % 5
    if cl:
        acc *= 85 ** (5 - cl)
        if cl > 1:
            acc += 0xFFFFFF >> (cl - 2) * 8
        out[-1] = acc

    out = struct.pack(b'>%dL' % (len(out)), *out)
    if cl:
        out = out[: -(5 - cl)]

    return out