mercurial/pure/charencode.py
author Matt Harbison <matt_harbison@yahoo.com>
Wed, 18 Sep 2024 17:50:57 -0400
changeset 51870 9d4ad05bc91c
parent 51863 f4733654f144
permissions -rw-r--r--
typing: make `bundlerepository` subclass `localrepository` while type checking Currently, `mercurial/bundlerepo.py` is excluded from pytype, mostly because it complains that various `ui` and `vfs` fields in `localrepository` are missing. (`bundlerepository` dynamically subclasses `localrepository` when it is instantiated, so it works at runtime.) This makes that class hierarchy known to pytype. Having a protocol for `Repository` is probably the right thing to do, but that will be a lot of work and this still reflects the class at runtime. Subclassing also has the benefit of making sure any method overrides have a matching signature, so maybe this is a situation where we do both of these things. (I'm not sure how clear the diagnostics are if a class *almost* implements a protocol, but is missing a method argument or similar.) The subclassing is not done outside of type checking runs to avoid any side effects on already complex code.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     1
# charencode.py - miscellaneous character encoding
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     2
#
46819
d4ba4d51f85f contributor: change mentions of mpm to olivia
Raphaël Gomès <rgomes@octobus.net>
parents: 45942
diff changeset
     3
#  Copyright 2005-2009 Olivia Mackall <olivia@selenic.com> and others
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     4
#
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     5
# This software may be used and distributed according to the terms of the
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     6
# GNU General Public License version 2 or any later version.
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     7
51863
f4733654f144 typing: add `from __future__ import annotations` to most files
Matt Harbison <matt_harbison@yahoo.com>
parents: 51722
diff changeset
     8
from __future__ import annotations
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
     9
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    10
import array
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    11
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    12
from .. import pycompat
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    13
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    14
51722
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48907
diff changeset
    15
def isasciistr(s: bytes) -> bool:
33926
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    16
    try:
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    17
        s.decode('ascii')
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    18
        return True
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    19
    except UnicodeDecodeError:
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    20
        return False
f4433f2713d0 encoding: add function to test if a str consists of ASCII characters
Yuya Nishihara <yuya@tcha.org>
parents: 33924
diff changeset
    21
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    22
51722
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48907
diff changeset
    23
def asciilower(s: bytes) -> bytes:
45942
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43506
diff changeset
    24
    """convert a string to lowercase if ASCII
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    25
45942
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43506
diff changeset
    26
    Raises UnicodeDecodeError if non-ASCII characters are found."""
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    27
    s.decode('ascii')
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    28
    return s.lower()
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    29
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    30
51722
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48907
diff changeset
    31
def asciiupper(s: bytes) -> bytes:
45942
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43506
diff changeset
    32
    """convert a string to uppercase if ASCII
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    33
45942
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43506
diff changeset
    34
    Raises UnicodeDecodeError if non-ASCII characters are found."""
33761
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    35
    s.decode('ascii')
f5fc54e7e467 encoding: drop circular import by proxying through '<policy>.charencode'
Yuya Nishihara <yuya@tcha.org>
parents:
diff changeset
    36
    return s.upper()
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    37
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    38
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    39
_jsonmap = []
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    40
_jsonmap.extend(b"\\u%04x" % x for x in range(32))
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    41
_jsonmap.extend(pycompat.bytechr(x) for x in range(32, 127))
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    42
_jsonmap.append(b'\\u007f')
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    43
_jsonmap[0x09] = b'\\t'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    44
_jsonmap[0x0A] = b'\\n'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    45
_jsonmap[0x22] = b'\\"'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    46
_jsonmap[0x5C] = b'\\\\'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    47
_jsonmap[0x08] = b'\\b'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    48
_jsonmap[0x0C] = b'\\f'
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    49
_jsonmap[0x0D] = b'\\r'
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    50
_paranoidjsonmap = _jsonmap[:]
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    51
_paranoidjsonmap[0x3C] = b'\\u003c'  # '<' (e.g. escape "</script>")
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    52
_paranoidjsonmap[0x3E] = b'\\u003e'  # '>'
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    53
_jsonmap.extend(pycompat.bytechr(x) for x in range(128, 256))
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    54
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    55
51722
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48907
diff changeset
    56
def jsonescapeu8fast(u8chars: bytes, paranoid: bool) -> bytes:
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    57
    """Convert a UTF-8 byte string to JSON-escaped form (fast path)
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    58
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    59
    Raises ValueError if non-ASCII characters have to be escaped.
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    60
    """
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    61
    if paranoid:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    62
        jm = _paranoidjsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    63
    else:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    64
        jm = _jsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    65
    try:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    66
        return b''.join(jm[x] for x in bytearray(u8chars))
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    67
    except IndexError:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    68
        raise ValueError
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    69
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    70
48907
b677bccf74b9 charencode: remove Python 2 support code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 48875
diff changeset
    71
_utf8strict = r'surrogatepass'
34218
aa877860d4d7 py3: use 'surrogatepass' error handler to process U+DCxx transparently
Yuya Nishihara <yuya@tcha.org>
parents: 34217
diff changeset
    72
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 34218
diff changeset
    73
51722
43adbe03079b typing: add type hints to the `charencode` module
Matt Harbison <matt_harbison@yahoo.com>
parents: 48907
diff changeset
    74
def jsonescapeu8fallback(u8chars: bytes, paranoid: bool) -> bytes:
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    75
    """Convert a UTF-8 byte string to JSON-escaped form (slow path)
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    76
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    77
    Escapes all non-ASCII characters no matter if paranoid is False.
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    78
    """
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    79
    if paranoid:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    80
        jm = _paranoidjsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    81
    else:
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    82
        jm = _jsonmap
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    83
    # non-BMP char is represented as UTF-16 surrogate pair
34218
aa877860d4d7 py3: use 'surrogatepass' error handler to process U+DCxx transparently
Yuya Nishihara <yuya@tcha.org>
parents: 34217
diff changeset
    84
    u16b = u8chars.decode('utf-8', _utf8strict).encode('utf-16', _utf8strict)
43506
9f70512ae2cf cleanup: remove pointless r-prefixes on single-quoted strings
Augie Fackler <augie@google.com>
parents: 43077
diff changeset
    85
    u16codes = array.array('H', u16b)
33924
b9101467d88b encoding: extract stub for fast JSON escape
Yuya Nishihara <yuya@tcha.org>
parents: 33761
diff changeset
    86
    u16codes.pop(0)  # drop BOM
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
    87
    return b''.join(jm[x] if x < 128 else b'\\u%04x' % x for x in u16codes)