Mercurial > hg-stable
changeset 45269:ad7006830106
dirstate: restore original estimation and update comment
The former comment didn't reflect the content of the dirstate entries,
the two nodes are a fixed header in the file and not per-entry. Split
the documented entry into the path part and the fixed header. The
heuristic itself is still valid, e.g. for the NetBSD src tree a maximum
path size of 142 and an average of 49, resulting in 66 bytes per entry
on average.
Differential Revision: https://phab.mercurial-scm.org/D8850
author | Joerg Sonnenberger <joerg@bec.de> |
---|---|
date | Thu, 30 Jul 2020 16:13:17 +0200 |
parents | 5780a04a1b46 |
children | d4a28b76fa54 |
files | mercurial/dirstate.py |
diffstat | 1 files changed, 6 insertions(+), 12 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/dirstate.py Fri Jul 31 17:09:31 2020 +0530 +++ b/mercurial/dirstate.py Thu Jul 30 16:13:17 2020 +0200 @@ -1658,18 +1658,12 @@ if util.safehasattr(parsers, b'dict_new_presized'): # Make an estimate of the number of files in the dirstate based on - # its size. From a linear regression on a set of real-world repos, - # all over 10,000 files, the size of a dirstate entry is 2 nodes - # plus 45 bytes. The cost of resizing is significantly higher than the cost - # of filling in a larger presized dict, so subtract 20% from the - # size. - # - # This heuristic is imperfect in many ways, so in a future dirstate - # format update it makes sense to just record the number of entries - # on write. - self._map = parsers.dict_new_presized( - len(st) // ((2 * self._nodelen + 45) * 4 // 5) - ) + # its size. This trades wasting some memory for avoiding costly + # resizes. Each entry have a prefix of 17 bytes followed by one or + # two path names. Studies on various large-scale real-world repositories + # found 54 bytes a reasonable upper limit for the average path names. + # Copy entries are ignored for the sake of this estimate. + self._map = parsers.dict_new_presized(len(st) // 71) # Python's garbage collector triggers a GC each time a certain number # of container objects (the number being defined by