view tests/test-remotefilelog-datapack.py @ 46582:b0a3ca02d17a

copies-rust: implement PartialEqual manually Now that we know that each (dest, rev) pair has at most a unique CopySource, we can simplify comparison a lot. This "simple" step buy a good share of the previous slowdown back in some case: Repo Case Source-Rev Dest-Rev # of revisions old time new time Difference Factor time per rev --------------------------------------------------------------------------------------------------------------------------------------------------------------- mozilla-try x00000_revs_x00000_added_x000_copies 9b2a99adc05e 8e29777b48e6 : 382065 revs, 43.304637 s, 34.443661 s, -8.860976 s, × 0.7954, 90 µs/rev Full benchmark: Repo Case Source-Rev Dest-Rev # of revisions old time new time Difference Factor time per rev --------------------------------------------------------------------------------------------------------------------------------------------------------------- mercurial x_revs_x_added_0_copies ad6b123de1c7 39cfcef4f463 : 1 revs, 0.000043 s, 0.000043 s, +0.000000 s, × 1.0000, 43 µs/rev mercurial x_revs_x_added_x_copies 2b1c78674230 0c1d10351869 : 6 revs, 0.000114 s, 0.000117 s, +0.000003 s, × 1.0263, 19 µs/rev mercurial x000_revs_x000_added_x_copies 81f8ff2a9bf2 dd3267698d84 : 1032 revs, 0.004937 s, 0.004892 s, -0.000045 s, × 0.9909, 4 µs/rev pypy x_revs_x_added_0_copies aed021ee8ae8 099ed31b181b : 9 revs, 0.000339 s, 0.000196 s, -0.000143 s, × 0.5782, 21 µs/rev pypy x_revs_x000_added_0_copies 4aa4e1f8e19a 359343b9ac0e : 1 revs, 0.000049 s, 0.000050 s, +0.000001 s, × 1.0204, 50 µs/rev pypy x_revs_x_added_x_copies ac52eb7bbbb0 72e022663155 : 7 revs, 0.000202 s, 0.000117 s, -0.000085 s, × 0.5792, 16 µs/rev pypy x_revs_x00_added_x_copies c3b14617fbd7 ace7255d9a26 : 1 revs, 0.000409 s, 0.6f1f4a s, -0.000087 s, × 0.7873, 322 µs/rev pypy x_revs_x000_added_x000_copies df6f7a526b60 a83dc6a2d56f : 6 revs, 0.011984 s, 0.011949 s, -0.000035 s, × 0.9971, 1991 µs/rev pypy x000_revs_xx00_added_0_copies 89a76aede314 2f22446ff07e : 4785 revs, 0.050820 s, 0.050802 s, -0.000018 s, × 0.9996, 10 µs/rev pypy x000_revs_x000_added_x_copies 8a3b5bfd266e 2c68e87c3efe : 6780 revs, 0.087953 s, 0.088090 s, +0.000137 s, × 1.0016, 12 µs/rev pypy x000_revs_x000_added_x000_copies 89a76aede314 7b3dda341c84 : 5441 revs, 0.062902 s, 0.062079 s, -0.000823 s, × 0.9869, 11 µs/rev pypy x0000_revs_x_added_0_copies d1defd0dc478 c9cb1334cc78 : 43645 revs, 0.679234 s, 0.635337 s, -0.043897 s, × 0.9354, 14 µs/rev pypy x0000_revs_xx000_added_0_copies bf2c629d0071 4ffed77c095c : 2 revs, 0.013095 s, 0.013262 s, +0.000167 s, × 1.0128, 6631 µs/rev pypy x0000_revs_xx000_added_x000_copies 08ea3258278e d9fa043f30c0 : 11316 revs, 0.120910 s, 0.120085 s, -0.000825 s, × 0.9932, 10 µs/rev netbeans x_revs_x_added_0_copies fb0955ffcbcd a01e9239f9e7 : 2 revs, 0.000087 s, 0.000085 s, -0.000002 s, × 0.9770, 42 µs/rev netbeans x_revs_x000_added_0_copies 6f360122949f 20eb231cc7d0 : 2 revs, 0.000107 s, 0.000110 s, +0.000003 s, × 1.0280, 55 µs/rev netbeans x_revs_x_added_x_copies 1ada3faf6fb6 5a39d12eecf4 : 3 revs, 0.000186 s, 0.000177 s, -0.000009 s, × 0.9516, 59 µs/rev netbeans x_revs_x00_added_x_copies 35be93ba1e2c 9eec5e90c05f : 9 revs, 0.000754 s, 0.000743 s, -0.000011 s, × 0.9854, 82 µs/rev netbeans x000_revs_xx00_added_0_copies eac3045b4fdd 51d4ae7f1290 : 1421 revs, 0.010443 s, 0.010168 s, -0.000275 s, × 0.9737, 7 µs/rev netbeans x000_revs_x000_added_x_copies e2063d266acd 6081d72689dc : 1533 revs, 0.015697 s, 0.015946 s, +0.000249 s, × 1.0159, 10 µs/rev netbeans x000_revs_x000_added_x000_copies ff453e9fee32 411350406ec2 : 5750 revs, 0.063528 s, 0.062712 s, -0.000816 s, × 0.9872, 10 µs/rev netbeans x0000_revs_xx000_added_x000_copies 588c2d1ced70 1aad62e59ddd : 66949 revs, 0.545515 s, 0.523832 s, -0.021683 s, × 0.9603, 7 µs/rev mozilla-central x_revs_x_added_0_copies 3697f962bb7b 7015fcdd43a2 : 2 revs, 0.000089 s, 0.000090 s, +0.000001 s, × 1.0112, 45 µs/rev mozilla-central x_revs_x000_added_0_copies dd390860c6c9 40d0c5bed75d : 8 revs, 0.000265 s, 0.000264 s, -0.000001 s, × 0.9962, 33 µs/rev mozilla-central x_revs_x_added_x_copies 8d198483ae3b 14207ffc2b2f : 9 revs, 0.000381 s, 0.000187 s, -0.000194 s, × 0.4908, 20 µs/rev mozilla-central x_revs_x00_added_x_copies 98cbc58cc6bc 446a150332c3 : 7 revs, 0.000672 s, 0.000665 s, -0.000007 s, × 0.9896, 95 µs/rev mozilla-central x_revs_x000_added_x000_copies 3c684b4b8f68 0a5e72d1b479 : 3 revs, 0.003497 s, 0.003556 s, +0.000059 s, × 1.0169, 1185 µs/rev mozilla-central x_revs_x0000_added_x0000_copies effb563bb7e5 c07a39dc4e80 : 6 revs, 0.073204 s, 0.071345 s, -0.001859 s, × 0.9746, 11890 µs/rev mozilla-central x000_revs_xx00_added_0_copies 6100d773079a 04a55431795e : 1593 revs, 0.006482 s, 0.006551 s, +0.000069 s, × 1.0106, 4 µs/rev mozilla-central x000_revs_x000_added_x_copies 9f17a6fc04f9 2d37b966abed : 41 revs, 0.005066 s, 0.005078 s, +0.000012 s, × 1.0024, 123 µs/rev mozilla-central x000_revs_x000_added_x000_copies 7c97034feb78 4407bd0c6330 : 7839 revs, 0.065707 s, 0.065823 s, +0.000116 s, × 1.0018, 8 µs/rev mozilla-central x0000_revs_xx000_added_0_copies 9eec5917337d 67118cc6dcad : 615 revs, 0.026800 s, 0.027050 s, +0.000250 s, × 1.0093, 43 µs/rev mozilla-central x0000_revs_xx000_added_x000_copies f78c615a656c 96a38b690156 : 30263 revs, 0.203856 s, 0.202443 s, -0.001413 s, × 0.9931, 6 µs/rev mozilla-central x00000_revs_x0000_added_x0000_copies 6832ae71433c 4c222a1d9a00 : 153721 revs, 1.293394 s, 1.261583 s, -0.031811 s, × 0.9754, 8 µs/rev mozilla-central x00000_revs_x00000_added_x000_copies 76caed42cf7c 1daa622bbe42 : 204976 revs, 1.698239 s, 1.643869 s, -0.054370 s, × 0.9680, 8 µs/rev mozilla-try x_revs_x_added_0_copies aaf6dde0deb8 9790f499805a : 2 revs, 0.000875 s, 0.000868 s, -0.000007 s, × 0.9920, 434 µs/rev mozilla-try x_revs_x000_added_0_copies d8d0222927b4 5bb8ce8c7450 : 2 revs, 0.000891 s, 0.000887 s, -0.000004 s, × 0.9955, 443 µs/rev mozilla-try x_revs_x_added_x_copies 092fcca11bdb 936255a0384a : 4 revs, 0.000292 s, 0.000168 s, -0.000124 s, × 0.5753, 42 µs/rev mozilla-try x_revs_x00_added_x_copies b53d2fadbdb5 017afae788ec : 2 revs, 0.003939 s, 0.001160 s, -0.002779 s, × 0.2945, 580 µs/rev mozilla-try x_revs_x000_added_x000_copies 20408ad61ce5 6f0ee96e21ad : 1 revs, 0.033027 s, 0.033016 s, -0.000011 s, × 0.9997, 33016 µs/rev mozilla-try x_revs_x0000_added_x0000_copies effb563bb7e5 c07a39dc4e80 : 6 revs, 0.073703 s, 0.073312 s, -0.39ae31 s, × 0.9947, 12218 µs/rev mozilla-try x000_revs_xx00_added_0_copies 6100d773079a 04a55431795e : 1593 revs, 0.006469 s, 0.006485 s, +0.000016 s, × 1.0025, 4 µs/rev mozilla-try x000_revs_x000_added_x_copies 9f17a6fc04f9 2d37b966abed : 41 revs, 0.005278 s, 0.005494 s, +0.000216 s, × 1.0409, 134 µs/rev mozilla-try x000_revs_x000_added_x000_copies 1346fd0130e4 4c65cbdabc1f : 6657 revs, 0.064995 s, 0.064879 s, -0.000116 s, × 0.9982, 9 µs/rev mozilla-try x0000_revs_x_added_0_copies 63519bfd42ee a36a2a865d92 : 40314 revs, 0.301041 s, 0.301469 s, +0.000428 s, × 1.0014, 7 µs/rev mozilla-try x0000_revs_x_added_x_copies 9fe69ff0762d bcabf2a78927 : 38690 revs, 0.285575 s, 0.297113 s, +0.011538 s, × 1.0404, 7 µs/rev mozilla-try x0000_revs_xx000_added_x_copies 156f6e2674f2 4d0f2c178e66 : 8598 revs, 0.085597 s, 0.085890 s, +0.000293 s, × 1.0034, 9 µs/rev mozilla-try x0000_revs_xx000_added_0_copies 9eec5917337d 67118cc6dcad : 615 revs, 0.027118 s, 0.027718 s, +0.000600 s, × 1.0221, 45 µs/rev mozilla-try x0000_revs_xx000_added_x000_copies 89294cd501d9 7ccb2fc7ccb5 : 97052 revs, 2.119204 s, 2.048949 s, -0.070255 s, × 0.9668, 21 µs/rev mozilla-try x0000_revs_x0000_added_x0000_copies e928c65095ed e951f4ad123a : 52031 revs, 0.701479 s, 0.685924 s, -0.015555 s, × 0.9778, 13 µs/rev mozilla-try x00000_revs_x_added_0_copies 6a320851d377 1ebb79acd503 : 363753 revs, 4.482399 s, 4.482891 s, +0.000492 s, × 1.0001, 12 µs/rev mozilla-try x00000_revs_x00000_added_0_copies dc8a3ca7010e d16fde900c9c : 34414 revs, 0.574082 s, 0.577633 s, +0.003551 s, × 1.0062, 16 µs/rev mozilla-try x00000_revs_x_added_x_copies 5173c4b6f97c 95d83ee7242d : 362229 revs, 4.480366 s, 4.397816 s, -0.082550 s, × 0.9816, 12 µs/rev mozilla-try x00000_revs_x000_added_x_copies 9126823d0e9c ca82787bb23c : 359344 revs, 4.369070 s, 4.370538 s, +0.001468 s, × 1.0003, 12 µs/rev mozilla-try x00000_revs_x0000_added_x0000_copies 8d3fafa80d4b eb884023b810 : 192665 revs, 1.592506 s, 1.570439 s, -0.022067 s, × 0.9861, 8 µs/rev mozilla-try x00000_revs_x00000_added_x0000_copies 1b661134e2ca 1ae03d022d6d : 228985 revs, 87.824489 s, 88.388512 s, +0.564023 s, × 1.0064, 386 µs/rev mozilla-try x00000_revs_x00000_added_x000_copies 9b2a99adc05e 8e29777b48e6 : 382065 revs, 43.304637 s, 34.443661 s, -8.860976 s, × 0.7954, 90 µs/rev private : 459513 revs, 33.853687 s, 27.370148 s, -6.483539 s, × 0.8085, 59 µs/rev Differential Revision: https://phab.mercurial-scm.org/D9653
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Wed, 16 Dec 2020 11:11:05 +0100
parents 89a2afe31e82
children d55b71393907
line wrap: on
line source

#!/usr/bin/env python3
from __future__ import absolute_import, print_function

import hashlib
import os
import random
import shutil
import stat
import struct
import sys
import tempfile
import time
import unittest

import silenttestrunner

# Load the local remotefilelog, not the system one
sys.path[0:0] = [os.path.join(os.path.dirname(__file__), '..')]
from mercurial.node import nullid
from mercurial import policy

if not policy._packageprefs.get(policy.policy, (False, False))[1]:
    if __name__ == '__main__':
        msg = "skipped: pure module not available with module policy:"
        print(msg, policy.policy, file=sys.stderr)
        sys.exit(80)

from mercurial import (
    pycompat,
    ui as uimod,
)
from hgext.remotefilelog import (
    basepack,
    constants,
    datapack,
)


class datapacktestsbase(object):
    def __init__(self, datapackreader, paramsavailable):
        self.datapackreader = datapackreader
        self.paramsavailable = paramsavailable

    def setUp(self):
        self.tempdirs = []

    def tearDown(self):
        for d in self.tempdirs:
            shutil.rmtree(d)

    def makeTempDir(self):
        tempdir = pycompat.bytestr(tempfile.mkdtemp())
        self.tempdirs.append(tempdir)
        return tempdir

    def getHash(self, content):
        return hashlib.sha1(content).digest()

    def getFakeHash(self):
        return b''.join(
            pycompat.bytechr(random.randint(0, 255)) for _ in range(20)
        )

    def createPack(self, revisions=None, packdir=None):
        if revisions is None:
            revisions = [(b"filename", self.getFakeHash(), nullid, b"content")]

        if packdir is None:
            packdir = self.makeTempDir()

        packer = datapack.mutabledatapack(uimod.ui(), packdir, version=2)

        for args in revisions:
            filename, node, base, content = args[0:4]
            # meta is optional
            meta = None
            if len(args) > 4:
                meta = args[4]
            packer.add(filename, node, base, content, metadata=meta)

        path = packer.close()
        return self.datapackreader(path)

    def _testAddSingle(self, content):
        """Test putting a simple blob into a pack and reading it out."""
        filename = b"foo"
        node = self.getHash(content)

        revisions = [(filename, node, nullid, content)]
        pack = self.createPack(revisions)
        if self.paramsavailable:
            self.assertEqual(
                pack.params.fanoutprefix, basepack.SMALLFANOUTPREFIX
            )

        chain = pack.getdeltachain(filename, node)
        self.assertEqual(content, chain[0][4])

    def testAddSingle(self):
        self._testAddSingle(b'')

    def testAddSingleEmpty(self):
        self._testAddSingle(b'abcdef')

    def testAddMultiple(self):
        """Test putting multiple unrelated blobs into a pack and reading them
        out.
        """
        revisions = []
        for i in range(10):
            filename = b"foo%d" % i
            content = b"abcdef%d" % i
            node = self.getHash(content)
            revisions.append((filename, node, self.getFakeHash(), content))

        pack = self.createPack(revisions)

        for filename, node, base, content in revisions:
            entry = pack.getdelta(filename, node)
            self.assertEqual((content, filename, base, {}), entry)

            chain = pack.getdeltachain(filename, node)
            self.assertEqual(content, chain[0][4])

    def testAddDeltas(self):
        """Test putting multiple delta blobs into a pack and read the chain."""
        revisions = []
        filename = b"foo"
        lastnode = nullid
        for i in range(10):
            content = b"abcdef%d" % i
            node = self.getHash(content)
            revisions.append((filename, node, lastnode, content))
            lastnode = node

        pack = self.createPack(revisions)

        entry = pack.getdelta(filename, revisions[0][1])
        realvalue = (revisions[0][3], filename, revisions[0][2], {})
        self.assertEqual(entry, realvalue)

        # Test that the chain for the final entry has all the others
        chain = pack.getdeltachain(filename, node)
        for i in range(10):
            content = b"abcdef%d" % i
            self.assertEqual(content, chain[-i - 1][4])

    def testPackMany(self):
        """Pack many related and unrelated objects."""
        # Build a random pack file
        revisions = []
        blobs = {}
        random.seed(0)
        for i in range(100):
            filename = b"filename-%d" % i
            filerevs = []
            for j in range(random.randint(1, 100)):
                content = b"content-%d" % j
                node = self.getHash(content)
                lastnode = nullid
                if len(filerevs) > 0:
                    lastnode = filerevs[random.randint(0, len(filerevs) - 1)]
                filerevs.append(node)
                blobs[(filename, node, lastnode)] = content
                revisions.append((filename, node, lastnode, content))

        pack = self.createPack(revisions)

        # Verify the pack contents
        for (filename, node, lastnode), content in sorted(blobs.items()):
            chain = pack.getdeltachain(filename, node)
            for entry in chain:
                expectedcontent = blobs[(entry[0], entry[1], entry[3])]
                self.assertEqual(entry[4], expectedcontent)

    def testPackMetadata(self):
        revisions = []
        for i in range(100):
            filename = b'%d.txt' % i
            content = b'put-something-here \n' * i
            node = self.getHash(content)
            meta = {
                constants.METAKEYFLAG: i ** 4,
                constants.METAKEYSIZE: len(content),
                b'Z': b'random_string',
                b'_': b'\0' * i,
            }
            revisions.append((filename, node, nullid, content, meta))
        pack = self.createPack(revisions)
        for name, node, x, content, origmeta in revisions:
            parsedmeta = pack.getmeta(name, node)
            # flag == 0 should be optimized out
            if origmeta[constants.METAKEYFLAG] == 0:
                del origmeta[constants.METAKEYFLAG]
            self.assertEqual(parsedmeta, origmeta)

    def testGetMissing(self):
        """Test the getmissing() api."""
        revisions = []
        filename = b"foo"
        lastnode = nullid
        for i in range(10):
            content = b"abcdef%d" % i
            node = self.getHash(content)
            revisions.append((filename, node, lastnode, content))
            lastnode = node

        pack = self.createPack(revisions)

        missing = pack.getmissing([(b"foo", revisions[0][1])])
        self.assertFalse(missing)

        missing = pack.getmissing(
            [(b"foo", revisions[0][1]), (b"foo", revisions[1][1])]
        )
        self.assertFalse(missing)

        fakenode = self.getFakeHash()
        missing = pack.getmissing(
            [(b"foo", revisions[0][1]), (b"foo", fakenode)]
        )
        self.assertEqual(missing, [(b"foo", fakenode)])

    def testAddThrows(self):
        pack = self.createPack()

        try:
            pack.add(b'filename', nullid, b'contents')
            self.assertTrue(False, "datapack.add should throw")
        except RuntimeError:
            pass

    def testBadVersionThrows(self):
        pack = self.createPack()
        path = pack.path + b'.datapack'
        with open(path, 'rb') as f:
            raw = f.read()
        raw = struct.pack('!B', 255) + raw[1:]
        os.chmod(path, os.stat(path).st_mode | stat.S_IWRITE)
        with open(path, 'wb+') as f:
            f.write(raw)

        try:
            self.datapackreader(pack.path)
            self.assertTrue(False, "bad version number should have thrown")
        except RuntimeError:
            pass

    def testMissingDeltabase(self):
        fakenode = self.getFakeHash()
        revisions = [(b"filename", fakenode, self.getFakeHash(), b"content")]
        pack = self.createPack(revisions)
        chain = pack.getdeltachain(b"filename", fakenode)
        self.assertEqual(len(chain), 1)

    def testLargePack(self):
        """Test creating and reading from a large pack with over X entries.
        This causes it to use a 2^16 fanout table instead."""
        revisions = []
        blobs = {}
        total = basepack.SMALLFANOUTCUTOFF + 1
        for i in pycompat.xrange(total):
            filename = b"filename-%d" % i
            content = filename
            node = self.getHash(content)
            blobs[(filename, node)] = content
            revisions.append((filename, node, nullid, content))

        pack = self.createPack(revisions)
        if self.paramsavailable:
            self.assertEqual(
                pack.params.fanoutprefix, basepack.LARGEFANOUTPREFIX
            )

        for (filename, node), content in blobs.items():
            actualcontent = pack.getdeltachain(filename, node)[0][4]
            self.assertEqual(actualcontent, content)

    def testPacksCache(self):
        """Test that we remember the most recent packs while fetching the delta
        chain."""

        packdir = self.makeTempDir()
        deltachains = []

        numpacks = 10
        revisionsperpack = 100

        for i in range(numpacks):
            chain = []
            revision = (b'%d' % i, self.getFakeHash(), nullid, b"content")

            for _ in range(revisionsperpack):
                chain.append(revision)
                revision = (
                    b'%d' % i,
                    self.getFakeHash(),
                    revision[1],
                    self.getFakeHash(),
                )

            self.createPack(chain, packdir)
            deltachains.append(chain)

        class testdatapackstore(datapack.datapackstore):
            # Ensures that we are not keeping everything in the cache.
            DEFAULTCACHESIZE = numpacks // 2

        store = testdatapackstore(uimod.ui(), packdir)

        random.shuffle(deltachains)
        for randomchain in deltachains:
            revision = random.choice(randomchain)
            chain = store.getdeltachain(revision[0], revision[1])

            mostrecentpack = next(iter(store.packs), None)
            self.assertEqual(
                mostrecentpack.getdeltachain(revision[0], revision[1]), chain
            )

            self.assertEqual(randomchain.index(revision) + 1, len(chain))

    # perf test off by default since it's slow
    def _testIndexPerf(self):
        random.seed(0)
        print("Multi-get perf test")
        packsizes = [
            100,
            10000,
            100000,
            500000,
            1000000,
            3000000,
        ]
        lookupsizes = [
            10,
            100,
            1000,
            10000,
            100000,
            1000000,
        ]
        for packsize in packsizes:
            revisions = []
            for i in pycompat.xrange(packsize):
                filename = b"filename-%d" % i
                content = b"content-%d" % i
                node = self.getHash(content)
                revisions.append((filename, node, nullid, content))

            path = self.createPack(revisions).path

            # Perf of large multi-get
            import gc

            gc.disable()
            pack = self.datapackreader(path)
            for lookupsize in lookupsizes:
                if lookupsize > packsize:
                    continue
                random.shuffle(revisions)
                findnodes = [(rev[0], rev[1]) for rev in revisions]

                start = time.time()
                pack.getmissing(findnodes[:lookupsize])
                elapsed = time.time() - start
                print(
                    "%s pack %d lookups = %0.04f"
                    % (
                        ('%d' % packsize).rjust(7),
                        ('%d' % lookupsize).rjust(7),
                        elapsed,
                    )
                )

            print("")
            gc.enable()

        # The perf test is meant to produce output, so we always fail the test
        # so the user sees the output.
        raise RuntimeError("perf test always fails")


class datapacktests(datapacktestsbase, unittest.TestCase):
    def __init__(self, *args, **kwargs):
        datapacktestsbase.__init__(self, datapack.datapack, True)
        unittest.TestCase.__init__(self, *args, **kwargs)


# TODO:
# datapack store:
# - getmissing
# - GC two packs into one

if __name__ == '__main__':
    if pycompat.iswindows:
        sys.exit(80)  # Skip on Windows
    silenttestrunner.main(__name__)