view tests/list-tree.py @ 50400:95acba2c29f6

encoding: avoid quadratic time complexity when json-encoding non-UTF8 strings Apparently the code uses "+=" with a bytes object, which is linear-time, so the whole encoding is quadratic-time. This patch makes us use a bytearray object, instead, which has a(n amortized-)constant-time append operation. The encoding is still not particularly fast, but at least a 10MB file takes tens of seconds, not many hours to encode.
author Arseniy Alekseyev <aalekseyev@janestreet.com>
date Mon, 06 Mar 2023 11:27:57 +0000
parents 6000f5b25c9b
children
line wrap: on
line source

import argparse
import os

ap = argparse.ArgumentParser()
ap.add_argument('path', nargs='+')
opts = ap.parse_args()


def gather():
    for p in opts.path:
        if not os.path.exists(p):
            return
        if os.path.isdir(p):
            yield p + os.path.sep
            for dirpath, dirs, files in os.walk(p):
                for d in dirs:
                    yield os.path.join(dirpath, d) + os.path.sep
                for f in files:
                    yield os.path.join(dirpath, f)
        else:
            yield p


print('\n'.join(sorted(gather(), key=lambda x: x.replace(os.path.sep, '/'))))