view tests/test-encoding-func.py @ 38893:23d582caae30

changegroup: capture revision delta in a data structure The current changegroup generation code is tightly coupled to the revlog API. This tight coupling makes it difficult to implement alternate storage backends without requiring a large surface area of the revlog API to be exposed. This is not desirable. In order to support changegroup generation with non-revlog storage, we'll need to abstract the concept of delta generation. This commit is the first step down that road. We introduce a data structure for representing a delta in a changegroup. The API still leaves a lot to be desired. But at least we now have separation between data and actions performed on it. As part of this, we tweak behavior slightly: we no longer concatenate the delta prefix with the metadata header. Instead, we track and emit the prefix as a separate chunk. This shouldn't have any meaningful impact since all the chunks just get sent to the wire, the compressor, etc. Because we're introducing a new object, this does add some overhead to changegroup execution. `hg perfchangegroupchangelog` on my clone of the Mercurial repo (~40,000 visible revisions in the changelog) slows down a bit: ! wall 1.268600 comb 1.270000 user 1.270000 sys 0.000000 (best of 8) ! wall 1.419479 comb 1.410000 user 1.410000 sys 0.000000 (best of 8) With for `hg bundle -t none-v2 -a /dev/null`: before: real 6.610 secs (user 6.460+0.000 sys 0.140+0.000) after: real 7.210 secs (user 7.060+0.000 sys 0.140+0.000) I plan to claw back this regression in future commits. And I may even do away with this data structure once the refactor is complete. For now, it makes things easier to comprehend. Differential Revision: https://phab.mercurial-scm.org/D4075
author Gregory Szorc <gregory.szorc@gmail.com>
date Fri, 03 Aug 2018 10:05:26 -0700
parents 3ea3c96ada54
children 2372284d9457
line wrap: on
line source

from __future__ import absolute_import

import unittest

from mercurial import (
    encoding,
)

class IsasciistrTest(unittest.TestCase):
    asciistrs = [
        b'a',
        b'ab',
        b'abc',
        b'abcd',
        b'abcde',
        b'abcdefghi',
        b'abcd\0fghi',
    ]

    def testascii(self):
        for s in self.asciistrs:
            self.assertTrue(encoding.isasciistr(s))

    def testnonasciichar(self):
        for s in self.asciistrs:
            for i in range(len(s)):
                t = bytearray(s)
                t[i] |= 0x80
                self.assertFalse(encoding.isasciistr(bytes(t)))

class LocalEncodingTest(unittest.TestCase):
    def testasciifastpath(self):
        s = b'\0' * 100
        self.assertTrue(s is encoding.tolocal(s))
        self.assertTrue(s is encoding.fromlocal(s))

class Utf8bEncodingTest(unittest.TestCase):
    def setUp(self):
        self.origencoding = encoding.encoding

    def tearDown(self):
        encoding.encoding = self.origencoding

    def testasciifastpath(self):
        s = b'\0' * 100
        self.assertTrue(s is encoding.toutf8b(s))
        self.assertTrue(s is encoding.fromutf8b(s))

    def testlossylatin(self):
        encoding.encoding = b'ascii'
        s = u'\xc0'.encode('utf-8')
        l = encoding.tolocal(s)
        self.assertEqual(l, b'?')  # lossy
        self.assertEqual(s, encoding.toutf8b(l))  # utf8 sequence preserved

    def testlosslesslatin(self):
        encoding.encoding = b'latin-1'
        s = u'\xc0'.encode('utf-8')
        l = encoding.tolocal(s)
        self.assertEqual(l, b'\xc0')  # lossless
        self.assertEqual(s, encoding.toutf8b(l))  # convert back to utf-8

    def testlossy0xed(self):
        encoding.encoding = b'euc-kr'  # U+Dxxx Hangul
        s = u'\ud1bc\xc0'.encode('utf-8')
        l = encoding.tolocal(s)
        self.assertIn(b'\xed', l)
        self.assertTrue(l.endswith(b'?'))  # lossy
        self.assertEqual(s, encoding.toutf8b(l))  # utf8 sequence preserved

    def testlossless0xed(self):
        encoding.encoding = b'euc-kr'  # U+Dxxx Hangul
        s = u'\ud1bc'.encode('utf-8')
        l = encoding.tolocal(s)
        self.assertEqual(l, b'\xc5\xed')  # lossless
        self.assertEqual(s, encoding.toutf8b(l))  # convert back to utf-8

if __name__ == '__main__':
    import silenttestrunner
    silenttestrunner.main(__name__)