comparison mercurial/store.py @ 41978:a56487081109

store: don't read the whole fncache in memory In large repositories with lot of files, the fncache grows more than 100 MB and reading that whole thing into memory slows things down. Let's not read the whole thing into memory. This patch changes fncache loading code to read 1 MB at once. Loading 1 MB at once saves ~1 sec on perffncacheload for our internal repository. I tried various values such as 0.5 MB, 5 MB, 10 MB but best results were produced using 1 MB as the chunksize. On a narrow clone with fncache around 40 MB, this patch saves ~0.04 seconds on average on perffncacheload. To test the code, I have coded an extension in test-fncache.t which set chunksize to 1 byte, and the test passes with that. Differential Revision: https://phab.mercurial-scm.org/D5296
author Pulkit Goyal <pulkit@yandex-team.ru>
date Thu, 22 Nov 2018 15:14:24 +0300
parents d7ef84e595f8
children a920a9e1795a
comparison
equal deleted inserted replaced
41976:51685c6dcca3 41978:a56487081109
6 # GNU General Public License version 2 or any later version. 6 # GNU General Public License version 2 or any later version.
7 7
8 from __future__ import absolute_import 8 from __future__ import absolute_import
9 9
10 import errno 10 import errno
11 import functools
11 import hashlib 12 import hashlib
12 import os 13 import os
13 import stat 14 import stat
14 15
15 from .i18n import _ 16 from .i18n import _
21 util, 22 util,
22 vfs as vfsmod, 23 vfs as vfsmod,
23 ) 24 )
24 25
25 parsers = policy.importmod(r'parsers') 26 parsers = policy.importmod(r'parsers')
27 # how much bytes should be read from fncache in one read
28 # It is done to prevent loading large fncache files into memory
29 fncache_chunksize = 10 ** 6
26 30
27 def _matchtrackedpath(path, matcher): 31 def _matchtrackedpath(path, matcher):
28 """parses a fncache entry and returns whether the entry is tracking a path 32 """parses a fncache entry and returns whether the entry is tracking a path
29 matched by matcher or not. 33 matched by matcher or not.
30 34
461 fp = self.vfs('fncache', mode='rb') 465 fp = self.vfs('fncache', mode='rb')
462 except IOError: 466 except IOError:
463 # skip nonexistent file 467 # skip nonexistent file
464 self.entries = set() 468 self.entries = set()
465 return 469 return
466 self.entries = set(decodedir(fp.read()).splitlines()) 470
471 self.entries = set()
472 chunk = b''
473 for c in iter(functools.partial(fp.read, fncache_chunksize), b''):
474 chunk += c
475 try:
476 p = chunk.rindex(b'\n')
477 self.entries.update(decodedir(chunk[:p + 1]).splitlines())
478 chunk = chunk[p + 1:]
479 except ValueError:
480 # substring '\n' not found, maybe the entry is bigger than the
481 # chunksize, so let's keep iterating
482 pass
483
467 self._checkentries(fp) 484 self._checkentries(fp)
468 fp.close() 485 fp.close()
469 486
470 def _checkentries(self, fp): 487 def _checkentries(self, fp):
471 """ make sure there is no empty string in entries """ 488 """ make sure there is no empty string in entries """