comparison mercurial/dirstate.py @ 25585:868b7ee8b570

dirstate: use a presized dict for the dirstate This uses a simple heuristic to avoid expensive resizes. On a real-world repo with around 400,000 files, perfdirstate: before: ! wall 0.155562 comb 0.160000 user 0.150000 sys 0.010000 (best of 64) after: ! wall 0.132638 comb 0.130000 user 0.120000 sys 0.010000 (best of 75) On another real-world repo with around 250,000 files: before: ! wall 0.098459 comb 0.100000 user 0.090000 sys 0.010000 (best of 100) after: ! wall 0.089084 comb 0.090000 user 0.080000 sys 0.010000 (best of 100)
author Siddharth Agarwal <sid0@fb.com>
date Tue, 16 Jun 2015 00:46:01 -0700
parents 2bbfc2042d93
children e93036747902
comparison
equal deleted inserted replaced
25584:72b2711f12ea 25585:868b7ee8b570
335 if err.errno != errno.ENOENT: 335 if err.errno != errno.ENOENT:
336 raise 336 raise
337 return 337 return
338 if not st: 338 if not st:
339 return 339 return
340
341 if util.safehasattr(parsers, 'dict_new_presized'):
342 # Make an estimate of the number of files in the dirstate based on
343 # its size. From a linear regression on a set of real-world repos,
344 # all over 10,000 files, the size of a dirstate entry is 85
345 # bytes. The cost of resizing is significantly higher than the cost
346 # of filling in a larger presized dict, so subtract 20% from the
347 # size.
348 #
349 # This heuristic is imperfect in many ways, so in a future dirstate
350 # format update it makes sense to just record the number of entries
351 # on write.
352 self._map = parsers.dict_new_presized(len(st) / 71)
340 353
341 # Python's garbage collector triggers a GC each time a certain number 354 # Python's garbage collector triggers a GC each time a certain number
342 # of container objects (the number being defined by 355 # of container objects (the number being defined by
343 # gc.get_threshold()) are allocated. parse_dirstate creates a tuple 356 # gc.get_threshold()) are allocated. parse_dirstate creates a tuple
344 # for each file in the dirstate. The C version then immediately marks 357 # for each file in the dirstate. The C version then immediately marks