mercurial/cffi/bdiff.py
author Gregory Szorc <gregory.szorc@gmail.com>
Sat, 14 Apr 2018 16:36:15 -0700
changeset 37711 65a23cc8e75b
parent 33572 857876ebaed4
child 43076 2372284d9457
permissions -rw-r--r--
cborutil: implement support for streaming encoding, bytestring decoding The vendored cbor2 package is... a bit disappointing. On the encoding side, it insists that you pass it something with a write() to send data to. That means if you want to emit data to a generator, you have to construct an e.g. io.BytesIO(), write() to it, then get the data back out. There can be non-trivial overhead involved. The encoder also doesn't support indefinite types - bytestrings, arrays, and maps that don't have a known length. Again, this is really unfortunate because it requires you to buffer the entire source and destination in memory to encode large things. On the decoding side, it supports reading indefinite length types. But it buffers them completely before returning. More sadness. This commit implements "streaming" encoders for various CBOR types. Encoding emits a generator of hunks. So you can efficiently stream encoded data elsewhere. It also implements support for emitting indefinite length bytestrings, arrays, and maps. On the decoding side, we only implement support for decoding an indefinite length bytestring from a file object. It will emit a generator of raw chunks from the source. I didn't want to reinvent so many wheels. But profiling the wire protocol revealed that the overhead of constructing io.BytesIO() instances to temporarily hold results has a non-trivial overhead. We're talking >15% of execution time for operations like "transfer the fulltexts of all files in a revision." So I can justify this effort. Fortunately, CBOR is a relatively straightforward format. And we have a reference implementation in the repo we can test against. Differential Revision: https://phab.mercurial-scm.org/D3303
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
32512
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
     1
# bdiff.py - CFFI implementation of bdiff.c
7703
9044d3567f6d pure Python implementation of bdiff.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
     2
#
32512
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
     3
# Copyright 2016 Maciej Fijalkowski <fijall@gmail.com>
7703
9044d3567f6d pure Python implementation of bdiff.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
     4
#
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7944
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8225
diff changeset
     6
# GNU General Public License version 2 or any later version.
7703
9044d3567f6d pure Python implementation of bdiff.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
     7
27335
c4e3ff497f89 bdiff: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 15530
diff changeset
     8
from __future__ import absolute_import
c4e3ff497f89 bdiff: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 15530
diff changeset
     9
c4e3ff497f89 bdiff: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 15530
diff changeset
    10
import struct
7944
e9b48afd0e78 pure/bdiff: fix circular import
Matt Mackall <mpm@selenic.com>
parents: 7703
diff changeset
    11
32512
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
    12
from ..pure.bdiff import *
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
    13
from . import _bdiff
7703
9044d3567f6d pure Python implementation of bdiff.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
    14
32512
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
    15
ffi = _bdiff.ffi
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
    16
lib = _bdiff.lib
7703
9044d3567f6d pure Python implementation of bdiff.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
    17
32513
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    18
def blocks(sa, sb):
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    19
    a = ffi.new("struct bdiff_line**")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    20
    b = ffi.new("struct bdiff_line**")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    21
    ac = ffi.new("char[]", str(sa))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    22
    bc = ffi.new("char[]", str(sb))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    23
    l = ffi.new("struct bdiff_hunk*")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    24
    try:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    25
        an = lib.bdiff_splitlines(ac, len(sa), a)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    26
        bn = lib.bdiff_splitlines(bc, len(sb), b)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    27
        if not a[0] or not b[0]:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    28
            raise MemoryError
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    29
        count = lib.bdiff_diff(a[0], an, b[0], bn, l)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    30
        if count < 0:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    31
            raise MemoryError
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    32
        rl = [None] * count
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    33
        h = l.next
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    34
        i = 0
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    35
        while h:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    36
            rl[i] = (h.a1, h.a2, h.b1, h.b2)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    37
            h = h.next
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    38
            i += 1
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    39
    finally:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    40
        lib.free(a[0])
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    41
        lib.free(b[0])
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    42
        lib.bdiff_freehunks(l.next)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    43
    return rl
29834
1ea77b75d266 bdiff: implement cffi version of bdiff
Maciej Fijalkowski <fijall@gmail.com>
parents: 29833
diff changeset
    44
32513
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    45
def bdiff(sa, sb):
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    46
    a = ffi.new("struct bdiff_line**")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    47
    b = ffi.new("struct bdiff_line**")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    48
    ac = ffi.new("char[]", str(sa))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    49
    bc = ffi.new("char[]", str(sb))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    50
    l = ffi.new("struct bdiff_hunk*")
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    51
    try:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    52
        an = lib.bdiff_splitlines(ac, len(sa), a)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    53
        bn = lib.bdiff_splitlines(bc, len(sb), b)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    54
        if not a[0] or not b[0]:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    55
            raise MemoryError
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    56
        count = lib.bdiff_diff(a[0], an, b[0], bn, l)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    57
        if count < 0:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    58
            raise MemoryError
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    59
        rl = []
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    60
        h = l.next
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    61
        la = lb = 0
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    62
        while h:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    63
            if h.a1 != la or h.b1 != lb:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    64
                lgt = (b[0] + h.b1).l - (b[0] + lb).l
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    65
                rl.append(struct.pack(">lll", (a[0] + la).l - a[0].l,
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    66
                                      (a[0] + h.a1).l - a[0].l, lgt))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    67
                rl.append(str(ffi.buffer((b[0] + lb).l, lgt)))
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    68
            la = h.a2
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    69
            lb = h.b2
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    70
            h = h.next
29834
1ea77b75d266 bdiff: implement cffi version of bdiff
Maciej Fijalkowski <fijall@gmail.com>
parents: 29833
diff changeset
    71
32513
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    72
    finally:
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    73
        lib.free(a[0])
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    74
        lib.free(b[0])
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    75
        lib.bdiff_freehunks(l.next)
25b37900d6e0 cffi: remove superfluous "if True" blocks
Yuya Nishihara <yuya@tcha.org>
parents: 32512
diff changeset
    76
    return "".join(rl)