Mercurial > hg
comparison mercurial/dirstate.py @ 45243:ad7006830106
dirstate: restore original estimation and update comment
The former comment didn't reflect the content of the dirstate entries,
the two nodes are a fixed header in the file and not per-entry. Split
the documented entry into the path part and the fixed header. The
heuristic itself is still valid, e.g. for the NetBSD src tree a maximum
path size of 142 and an average of 49, resulting in 66 bytes per entry
on average.
Differential Revision: https://phab.mercurial-scm.org/D8850
author | Joerg Sonnenberger <joerg@bec.de> |
---|---|
date | Thu, 30 Jul 2020 16:13:17 +0200 |
parents | e0bfde04f957 |
children | 89a2afe31e82 |
comparison
equal
deleted
inserted
replaced
45242:5780a04a1b46 | 45243:ad7006830106 |
---|---|
1656 if not st: | 1656 if not st: |
1657 return | 1657 return |
1658 | 1658 |
1659 if util.safehasattr(parsers, b'dict_new_presized'): | 1659 if util.safehasattr(parsers, b'dict_new_presized'): |
1660 # Make an estimate of the number of files in the dirstate based on | 1660 # Make an estimate of the number of files in the dirstate based on |
1661 # its size. From a linear regression on a set of real-world repos, | 1661 # its size. This trades wasting some memory for avoiding costly |
1662 # all over 10,000 files, the size of a dirstate entry is 2 nodes | 1662 # resizes. Each entry have a prefix of 17 bytes followed by one or |
1663 # plus 45 bytes. The cost of resizing is significantly higher than the cost | 1663 # two path names. Studies on various large-scale real-world repositories |
1664 # of filling in a larger presized dict, so subtract 20% from the | 1664 # found 54 bytes a reasonable upper limit for the average path names. |
1665 # size. | 1665 # Copy entries are ignored for the sake of this estimate. |
1666 # | 1666 self._map = parsers.dict_new_presized(len(st) // 71) |
1667 # This heuristic is imperfect in many ways, so in a future dirstate | |
1668 # format update it makes sense to just record the number of entries | |
1669 # on write. | |
1670 self._map = parsers.dict_new_presized( | |
1671 len(st) // ((2 * self._nodelen + 45) * 4 // 5) | |
1672 ) | |
1673 | 1667 |
1674 # Python's garbage collector triggers a GC each time a certain number | 1668 # Python's garbage collector triggers a GC each time a certain number |
1675 # of container objects (the number being defined by | 1669 # of container objects (the number being defined by |
1676 # gc.get_threshold()) are allocated. parse_dirstate creates a tuple | 1670 # gc.get_threshold()) are allocated. parse_dirstate creates a tuple |
1677 # for each file in the dirstate. The C version then immediately marks | 1671 # for each file in the dirstate. The C version then immediately marks |