Mercurial > hg
view mercurial/pure/base85.py @ 38732:be4984261611
merge: mark file gets as not thread safe (issue5933)
In default installs, this has the effect of disabling the thread-based
worker on Windows when manifesting files in the working directory. My
measurements have shown that with revlog-based repositories, Mercurial
spends a lot of CPU time in revlog code resolving file data. This ends
up incurring a lot of context switching across threads and slows down
`hg update` operations when going from an empty working directory to
the tip of the repo.
On mozilla-unified (246,351 files) on an i7-6700K (4+4 CPUs):
before: 487s wall
after: 360s wall (equivalent to worker.enabled=false)
cpus=2: 379s wall
Even with only 2 threads, the thread pool is still slower.
The introduction of the thread-based worker (02b36e860e0b) states that
it resulted in a "~50%" speedup for `hg sparse --enable-profile` and
`hg sparse --disable-profile`. This disagrees with my measurement
above. I theorize a few reasons for this:
1) Removal of files from the working directory is I/O - not CPU - bound
and should benefit from a thread pool (unless I/O is insanely fast
and the GIL release is near instantaneous). So tests like `hg sparse
--enable-profile` may exercise deletion throughput and aren't good
benchmarks for worker tasks that are CPU heavy.
2) The patch was authored by someone at Facebook. The results were
likely measured against a repository using remotefilelog. And I
believe that revision retrieval during working directory updates with
remotefilelog will often use a remote store, thus being I/O and not
CPU bound. This probably resulted in an overstated performance gain.
Since there appears to be a need to enable the thread-based worker with
some stores, I've made the flagging of file gets as thread safe
configurable. I've made it experimental because I don't want to formalize
a boolean flag for this option and because this attribute is best
captured against the store implementation. But we don't have a proper
store API for this yet. I'd rather cross this bridge later.
It is possible there are revlog-based repositories that do benefit from
a thread-based worker. I didn't do very comprehensive testing. If there
are, we may want to devise a more proper algorithm for whether to use
the thread-based worker, including possibly config options to limit the
number of threads to use. But until I see evidence that justifies
complexity, simplicity wins.
Differential Revision: https://phab.mercurial-scm.org/D3963
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Wed, 18 Jul 2018 09:49:34 -0700 |
parents | 80301c90a2dc |
children | 2372284d9457 |
line wrap: on
line source
# base85.py: pure python base85 codec # # Copyright (C) 2009 Brendan Cully <brendan@kublai.com> # # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. from __future__ import absolute_import import struct from .. import pycompat _b85chars = pycompat.bytestr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef" "ghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~") _b85chars2 = [(a + b) for a in _b85chars for b in _b85chars] _b85dec = {} def _mkb85dec(): for i, c in enumerate(_b85chars): _b85dec[c] = i def b85encode(text, pad=False): """encode text in base85 format""" l = len(text) r = l % 4 if r: text += '\0' * (4 - r) longs = len(text) >> 2 words = struct.unpack('>%dL' % (longs), text) out = ''.join(_b85chars[(word // 52200625) % 85] + _b85chars2[(word // 7225) % 7225] + _b85chars2[word % 7225] for word in words) if pad: return out # Trim padding olen = l % 4 if olen: olen += 1 olen += l // 4 * 5 return out[:olen] def b85decode(text): """decode base85-encoded text""" if not _b85dec: _mkb85dec() l = len(text) out = [] for i in range(0, len(text), 5): chunk = text[i:i + 5] chunk = pycompat.bytestr(chunk) acc = 0 for j, c in enumerate(chunk): try: acc = acc * 85 + _b85dec[c] except KeyError: raise ValueError('bad base85 character at position %d' % (i + j)) if acc > 4294967295: raise ValueError('Base85 overflow in hunk starting at byte %d' % i) out.append(acc) # Pad final chunk if necessary cl = l % 5 if cl: acc *= 85 ** (5 - cl) if cl > 1: acc += 0xffffff >> (cl - 2) * 8 out[-1] = acc out = struct.pack('>%dL' % (len(out)), *out) if cl: out = out[:-(5 - cl)] return out