Mercurial > hg
changeset 28292:3eb7faf6d958
worker: document poor partitioning scheme impact
mpm isn't a fan of the existing or previous partitioning scheme. He
provided a fantastic justification for why on the mailing list.
This patch adds his words to the code so they aren't forgotten.
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sat, 27 Feb 2016 21:43:17 -0800 |
parents | c7f89ad87bae |
children | a22b6fa5a844 |
files | mercurial/worker.py |
diffstat | 1 files changed, 22 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/worker.py Mon Feb 29 17:52:17 2016 -0600 +++ b/mercurial/worker.py Sat Feb 27 21:43:17 2016 -0800 @@ -157,6 +157,28 @@ The current strategy takes every Nth element from the input. If we ever write workers that need to preserve grouping in input we should consider allowing callers to specify a partition strategy. + + mpm is not a fan of this partitioning strategy when files are involved. + In his words: + + Single-threaded Mercurial makes a point of creating and visiting + files in a fixed order (alphabetical). When creating files in order, + a typical filesystem is likely to allocate them on nearby regions on + disk. Thus, when revisiting in the same order, locality is maximized + and various forms of OS and disk-level caching and read-ahead get a + chance to work. + + This effect can be quite significant on spinning disks. I discovered it + circa Mercurial v0.4 when revlogs were named by hashes of filenames. + Tarring a repo and copying it to another disk effectively randomized + the revlog ordering on disk by sorting the revlogs by hash and suddenly + performance of my kernel checkout benchmark dropped by ~10x because the + "working set" of sectors visited no longer fit in the drive's cache and + the workload switched from streaming to random I/O. + + What we should really be doing is have workers read filenames from a + ordered queue. This preserves locality and also keeps any worker from + getting more than one file out of balance. ''' for i in range(nslices): yield lst[i::nslices]