Mercurial > hg
view mercurial/tags.py @ 39506:b66ea3fc3a86
sparse-revlog: set max delta chain length to on thousand
The new snapshot system used in the sparse-revlog case gave us some small size
benefit so far. However its most important property is to gracefully handle
harder limit on delta chainlength.
Long delta chain has a very detrimental impact on read (and write) performance
in revlog. Being able to shorter them provide a great boost. However, shorting
delta used to result significantly lower compression ratio. The intermediate
snapshots effectively suppress most of this effect (even all in some case).
# Effect on the test repository
The repository we use for test is not "realistic" but can still show this in
action using an unreasonably low chain limit. Limiting the chain length show a
sizeable increase but stay under control: +6% for limit=15; +15% for limit=10.
Without the snapshot system the increase is significantly bigger: +45% for
limit=15; +80% for limit=10. Even slightly larger than without delta chain
limit, the resulting size is still smaller than before we started doing
snapshots.
Here is a table for comparison. *Since the repository is not branchy, the
initial sparse-revlog version does not bring much benefit compare to the
non-sparse one):
chain length limit | none | limit=15 | limit=10 |
without sparse-revlog | 62 818 987 | 112 664 615 | 131 222 574 |
without snapshot | 74 365 490 | 108 211 410 | 133 857 764 |
with snapshot | 59 230 936 | 63 002 924 | 68 415 329 |
# Effect On Real Life Repositories
The series provides significant benefits on all kind of repositories.
Using `hg debugupgraderepo -o redeltaparent --run`, we recomputed delta chain
for various repositories with different settings:
- delta chain length: unlimited or 1000 limit
- sparse-revlog: enabled or disabled
- this series: applied or not applied
We can observe multiple types of effect:
- On very branchy repositories:
* The delta chain limit as low impact on the repo size.
* Intermediate snapshot greatly reduces manifest size:
- pypy: -80%
- netbeans: -95%
* The delta chain limit is effective, without a size impact:
- netbeans average: 613 -> 282
- private #1 average: 1 068 -> 307
- On more linear repository:
* Intermediate snapshot limit the impact of delta chain limit:
- mozilla:
without the series: +360%
with the series: +25%
* The delta chain limit provides large improvement:
- mozilla's average chain length:
unlimited: 15 338
limited: 469
* Despite the chain length limit, the manifest size is reduced:
- mercurial: -25%
- mozilla: -30%
It is clear that the use of chains of intermediate snapshots provide large
benefits both in storage size and delta chains quality. We should now switch our
effort toward making sure the write performance are acceptable. Then,
`sparse-revlog` will be a suitable format for all new repository.
# Raw Statistic
* no-sparse: general delta repository not using sparse-revlog
* no-snapshot: sparse-revlog repository not using this series
* snapshot: sparse-revlog repository using this series
mercurial
Manifest Size:
limit | none | 1000
------------|-------------|------------
no-sparse | 8 021 373 | 8 199 366
no-snapshot | 8 103 561 | 8 259 719
snapshot | 6 137 116 | 6 126 433
Manifest Chain length data
limit || none || 1000 ||
value || average | max || average | max ||
------------||---------|---------||---------|---------||
no-sparse || 307 | 1456 || 279 | 1000 ||
no-snapshot || 312 | 1456 || 283 | 1000 ||
snapshot || 248 | 1208 || 241 | 1000 ||
Full Store Size
limit | none | 1000
------------|-------------|------------
no-sparse | 51 013 198 | 51 201 574
no-snapshot | 50 930 795 | 51 141 006
snapshot | 48 072 037 | 48 093 572
pypy
Manifest Size:
limit | none | 1000
------------|-------------|------------
no-sparse | 193 987 784 | 193 987 784
no-snapshot | 163 171 745 | 163 312 229
snapshot | 34 605 900 | 34 600 750
Manifest Chain length data
limit || none || 1000 ||
value || average | max || average | max ||
------------||---------|---------||---------|---------||
no-sparse || 101 | 692 || 101 | 692 ||
no-snapshot || 151 | 1307 || 148 | 1000 ||
snapshot || 128 | 1309 || 125 | 1000 ||
Full Store Size
limit | none | 1000
------------|-------------|------------
no-sparse | 495 931 473 | 495 931 473
no-snapshot | 465 441 017 | 465 581 501
snapshot | 355 467 301 | 355 472 451
Mozilla
Manifest Size:
limit | none | 1000
------------|----------------|---------------
no-sparse | 416 757 148 | 1 869 009 668
no-snapshot | 401 592 370 | 1 843 493 795
snapshot | 224 359 521 | 284 615 500
Manifest Chain length data
limit || none || 1000 ||
value || average | max || average | max ||
------------||---------|---------||---------|---------||
no-sparse || 15 333 | 58 980 || 468 | 1 000 ||
no-snapshot || 15 336 | 58 980 || 469 | 1 000 ||
snapshot || 15 338 | 58 983 || 469 | 1 000 ||
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 2 712 477 887 | 4 164 995 451
no-snapshot | 2 698 887 835 | 4 141 054 304
snapshot | 2 518 130 385 | 2 578 587 596
Netbeans
Manifest Size:
limit | none | 1000
------------|----------------|---------------
no-sparse | 4 766 794 101 | 4 870 642 687
no-snapshot | 4 334 806 082 | 4 428 681 309
snapshot | 232 659 666 | 240 330 665
Manifest Chain length data
limit || none || 1000 ||
value || average | max || average | max ||
------------||---------|---------||---------|---------||
no-sparse || 597 | 6802 || 254 | 1 000 ||
no-snapshot || 648 | 6 802 || 305 | 1 000 ||
snapshot || 613 | 6 804 || 282 | 1 000 ||
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 5 807 347 998 | 5 911 196 584
no-snapshot | 5 375 398 602 | 5 469 273 829
snapshot | 1 282 519 928 | 1 290 190 927
Private repo #1
Manifest Size:
limit | none | 1000
------------|-----------------|---------------
no-sparse | 41 389 010 840 | 41 398 162 091
no-snapshot | 9 737 319 435 | 10 223 773 150
snapshot | 744 215 807 | 747 961 822
Manifest Chain length data
limit || none || 1000 ||
value || average | max || average | max ||
------------||---------|---------||---------|---------||
no-sparse || 245 | 8 885 || 81 | 1 000 ||
no-snapshot || 1 225 | 8 885 || 336 | 1 000 ||
snapshot || 1 068 | 7 909 || 307 | 1 000 ||
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 49 646 065 126 | 49 655 216 377
no-snapshot | 17 924 862 856 | 18 411 316 571
snapshot | 9 009 024 710 | 9 012 770 725
Private repo #2
We currently have less data available for that repository.
* Before is a sparse-revlog repository without this series
* After is a sparse-revlog repository with this series + 1000 chain limit
Manifest Size:
Before: 1 531 485 040 bytes
After: 1 091 422 451 bytes
Manifest Chain:
Before: 2 218 avg; 6 575 Max
After: 442 avg; 1 000 Max
Full Store Size
Before: 15 203 955 615
after: 8 207 180 693
author | Boris Feld <boris.feld@octobus.net> |
---|---|
date | Fri, 07 Sep 2018 11:18:45 -0400 |
parents | f0b6fbea00cf |
children | 4c5864dad8b0 |
line wrap: on
line source
# tags.py - read tag info from local repository # # Copyright 2009 Matt Mackall <mpm@selenic.com> # Copyright 2009 Greg Ward <greg@gerg.ca> # # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. # Currently this module only deals with reading and caching tags. # Eventually, it could take care of updating (adding/removing/moving) # tags too. from __future__ import absolute_import import errno from .node import ( bin, hex, nullid, short, ) from .i18n import _ from . import ( encoding, error, match as matchmod, scmutil, util, ) from .utils import ( stringutil, ) # Tags computation can be expensive and caches exist to make it fast in # the common case. # # The "hgtagsfnodes1" cache file caches the .hgtags filenode values for # each revision in the repository. The file is effectively an array of # fixed length records. Read the docs for "hgtagsfnodescache" for technical # details. # # The .hgtags filenode cache grows in proportion to the length of the # changelog. The file is truncated when the # changelog is stripped. # # The purpose of the filenode cache is to avoid the most expensive part # of finding global tags, which is looking up the .hgtags filenode in the # manifest for each head. This can take dozens or over 100ms for # repositories with very large manifests. Multiplied by dozens or even # hundreds of heads and there is a significant performance concern. # # There also exist a separate cache file for each repository filter. # These "tags-*" files store information about the history of tags. # # The tags cache files consists of a cache validation line followed by # a history of tags. # # The cache validation line has the format: # # <tiprev> <tipnode> [<filteredhash>] # # <tiprev> is an integer revision and <tipnode> is a 40 character hex # node for that changeset. These redundantly identify the repository # tip from the time the cache was written. In addition, <filteredhash>, # if present, is a 40 character hex hash of the contents of the filtered # revisions for this filter. If the set of filtered revs changes, the # hash will change and invalidate the cache. # # The history part of the tags cache consists of lines of the form: # # <node> <tag> # # (This format is identical to that of .hgtags files.) # # <tag> is the tag name and <node> is the 40 character hex changeset # the tag is associated with. # # Tags are written sorted by tag name. # # Tags associated with multiple changesets have an entry for each changeset. # The most recent changeset (in terms of revlog ordering for the head # setting it) for each tag is last. def fnoderevs(ui, repo, revs): """return the list of '.hgtags' fnodes used in a set revisions This is returned as list of unique fnodes. We use a list instead of a set because order matters when it comes to tags.""" unfi = repo.unfiltered() tonode = unfi.changelog.node nodes = [tonode(r) for r in revs] fnodes = _getfnodes(ui, repo, nodes[::-1]) # reversed help the cache fnodes = _filterfnodes(fnodes, nodes) return fnodes def _nulltonone(value): """convert nullid to None For tag value, nullid means "deleted". This small utility function helps translating that to None.""" if value == nullid: return None return value def difftags(ui, repo, oldfnodes, newfnodes): """list differences between tags expressed in two set of file-nodes The list contains entries in the form: (tagname, oldvalue, new value). None is used to expressed missing value: ('foo', None, 'abcd') is a new tag, ('bar', 'ef01', None) is a deletion, ('baz', 'abcd', 'ef01') is a tag movement. """ if oldfnodes == newfnodes: return [] oldtags = _tagsfromfnodes(ui, repo, oldfnodes) newtags = _tagsfromfnodes(ui, repo, newfnodes) # list of (tag, old, new): None means missing entries = [] for tag, (new, __) in newtags.items(): new = _nulltonone(new) old, __ = oldtags.pop(tag, (None, None)) old = _nulltonone(old) if old != new: entries.append((tag, old, new)) # handle deleted tags for tag, (old, __) in oldtags.items(): old = _nulltonone(old) if old is not None: entries.append((tag, old, None)) entries.sort() return entries def writediff(fp, difflist): """write tags diff information to a file. Data are stored with a line based format: <action> <hex-node> <tag-name>\n Action are defined as follow: -R tag is removed, +A tag is added, -M tag is moved (old value), +M tag is moved (new value), Example: +A 875517b4806a848f942811a315a5bce30804ae85 t5 See documentation of difftags output for details about the input. """ add = '+A %s %s\n' remove = '-R %s %s\n' updateold = '-M %s %s\n' updatenew = '+M %s %s\n' for tag, old, new in difflist: # translate to hex if old is not None: old = hex(old) if new is not None: new = hex(new) # write to file if old is None: fp.write(add % (new, tag)) elif new is None: fp.write(remove % (old, tag)) else: fp.write(updateold % (old, tag)) fp.write(updatenew % (new, tag)) def findglobaltags(ui, repo): '''Find global tags in a repo: return a tagsmap tagsmap: tag name to (node, hist) 2-tuples. The tags cache is read and updated as a side-effect of calling. ''' (heads, tagfnode, valid, cachetags, shouldwrite) = _readtagcache(ui, repo) if cachetags is not None: assert not shouldwrite # XXX is this really 100% correct? are there oddball special # cases where a global tag should outrank a local tag but won't, # because cachetags does not contain rank info? alltags = {} _updatetags(cachetags, alltags) return alltags for head in reversed(heads): # oldest to newest assert head in repo.changelog.nodemap, \ "tag cache returned bogus head %s" % short(head) fnodes = _filterfnodes(tagfnode, reversed(heads)) alltags = _tagsfromfnodes(ui, repo, fnodes) # and update the cache (if necessary) if shouldwrite: _writetagcache(ui, repo, valid, alltags) return alltags def _filterfnodes(tagfnode, nodes): """return a list of unique fnodes The order of this list matches the order of "nodes". Preserving this order is important as reading tags in different order provides different results.""" seen = set() # set of fnode fnodes = [] for no in nodes: # oldest to newest fnode = tagfnode.get(no) if fnode and fnode not in seen: seen.add(fnode) fnodes.append(fnode) return fnodes def _tagsfromfnodes(ui, repo, fnodes): """return a tagsmap from a list of file-node tagsmap: tag name to (node, hist) 2-tuples. The order of the list matters.""" alltags = {} fctx = None for fnode in fnodes: if fctx is None: fctx = repo.filectx('.hgtags', fileid=fnode) else: fctx = fctx.filectx(fnode) filetags = _readtags(ui, repo, fctx.data().splitlines(), fctx) _updatetags(filetags, alltags) return alltags def readlocaltags(ui, repo, alltags, tagtypes): '''Read local tags in repo. Update alltags and tagtypes.''' try: data = repo.vfs.read("localtags") except IOError as inst: if inst.errno != errno.ENOENT: raise return # localtags is in the local encoding; re-encode to UTF-8 on # input for consistency with the rest of this module. filetags = _readtags( ui, repo, data.splitlines(), "localtags", recode=encoding.fromlocal) # remove tags pointing to invalid nodes cl = repo.changelog for t in list(filetags): try: cl.rev(filetags[t][0]) except (LookupError, ValueError): del filetags[t] _updatetags(filetags, alltags, 'local', tagtypes) def _readtaghist(ui, repo, lines, fn, recode=None, calcnodelines=False): '''Read tag definitions from a file (or any source of lines). This function returns two sortdicts with similar information: - the first dict, bintaghist, contains the tag information as expected by the _readtags function, i.e. a mapping from tag name to (node, hist): - node is the node id from the last line read for that name, - hist is the list of node ids previously associated with it (in file order). All node ids are binary, not hex. - the second dict, hextaglines, is a mapping from tag name to a list of [hexnode, line number] pairs, ordered from the oldest to the newest node. When calcnodelines is False the hextaglines dict is not calculated (an empty dict is returned). This is done to improve this function's performance in cases where the line numbers are not needed. ''' bintaghist = util.sortdict() hextaglines = util.sortdict() count = 0 def dbg(msg): ui.debug("%s, line %d: %s\n" % (fn, count, msg)) for nline, line in enumerate(lines): count += 1 if not line: continue try: (nodehex, name) = line.split(" ", 1) except ValueError: dbg("cannot parse entry") continue name = name.strip() if recode: name = recode(name) try: nodebin = bin(nodehex) except TypeError: dbg("node '%s' is not well formed" % nodehex) continue # update filetags if calcnodelines: # map tag name to a list of line numbers if name not in hextaglines: hextaglines[name] = [] hextaglines[name].append([nodehex, nline]) continue # map tag name to (node, hist) if name not in bintaghist: bintaghist[name] = [] bintaghist[name].append(nodebin) return bintaghist, hextaglines def _readtags(ui, repo, lines, fn, recode=None, calcnodelines=False): '''Read tag definitions from a file (or any source of lines). Returns a mapping from tag name to (node, hist). "node" is the node id from the last line read for that name. "hist" is the list of node ids previously associated with it (in file order). All node ids are binary, not hex. ''' filetags, nodelines = _readtaghist(ui, repo, lines, fn, recode=recode, calcnodelines=calcnodelines) # util.sortdict().__setitem__ is much slower at replacing then inserting # new entries. The difference can matter if there are thousands of tags. # Create a new sortdict to avoid the performance penalty. newtags = util.sortdict() for tag, taghist in filetags.items(): newtags[tag] = (taghist[-1], taghist[:-1]) return newtags def _updatetags(filetags, alltags, tagtype=None, tagtypes=None): """Incorporate the tag info read from one file into dictionnaries The first one, 'alltags', is a "tagmaps" (see 'findglobaltags' for details). The second one, 'tagtypes', is optional and will be updated to track the "tagtype" of entries in the tagmaps. When set, the 'tagtype' argument also needs to be set.""" if tagtype is None: assert tagtypes is None for name, nodehist in filetags.iteritems(): if name not in alltags: alltags[name] = nodehist if tagtype is not None: tagtypes[name] = tagtype continue # we prefer alltags[name] if: # it supersedes us OR # mutual supersedes and it has a higher rank # otherwise we win because we're tip-most anode, ahist = nodehist bnode, bhist = alltags[name] if (bnode != anode and anode in bhist and (bnode not in ahist or len(bhist) > len(ahist))): anode = bnode elif tagtype is not None: tagtypes[name] = tagtype ahist.extend([n for n in bhist if n not in ahist]) alltags[name] = anode, ahist def _filename(repo): """name of a tagcache file for a given repo or repoview""" filename = 'tags2' if repo.filtername: filename = '%s-%s' % (filename, repo.filtername) return filename def _readtagcache(ui, repo): '''Read the tag cache. Returns a tuple (heads, fnodes, validinfo, cachetags, shouldwrite). If the cache is completely up-to-date, "cachetags" is a dict of the form returned by _readtags() and "heads", "fnodes", and "validinfo" are None and "shouldwrite" is False. If the cache is not up to date, "cachetags" is None. "heads" is a list of all heads currently in the repository, ordered from tip to oldest. "validinfo" is a tuple describing cache validation info. This is used when writing the tags cache. "fnodes" is a mapping from head to .hgtags filenode. "shouldwrite" is True. If the cache is not up to date, the caller is responsible for reading tag info from each returned head. (See findglobaltags().) ''' try: cachefile = repo.cachevfs(_filename(repo), 'r') # force reading the file for static-http cachelines = iter(cachefile) except IOError: cachefile = None cacherev = None cachenode = None cachehash = None if cachefile: try: validline = next(cachelines) validline = validline.split() cacherev = int(validline[0]) cachenode = bin(validline[1]) if len(validline) > 2: cachehash = bin(validline[2]) except Exception: # corruption of the cache, just recompute it. pass tipnode = repo.changelog.tip() tiprev = len(repo.changelog) - 1 # Case 1 (common): tip is the same, so nothing has changed. # (Unchanged tip trivially means no changesets have been added. # But, thanks to localrepository.destroyed(), it also means none # have been destroyed by strip or rollback.) if (cacherev == tiprev and cachenode == tipnode and cachehash == scmutil.filteredhash(repo, tiprev)): tags = _readtags(ui, repo, cachelines, cachefile.name) cachefile.close() return (None, None, None, tags, False) if cachefile: cachefile.close() # ignore rest of file valid = (tiprev, tipnode, scmutil.filteredhash(repo, tiprev)) repoheads = repo.heads() # Case 2 (uncommon): empty repo; get out quickly and don't bother # writing an empty cache. if repoheads == [nullid]: return ([], {}, valid, {}, False) # Case 3 (uncommon): cache file missing or empty. # Case 4 (uncommon): tip rev decreased. This should only happen # when we're called from localrepository.destroyed(). Refresh the # cache so future invocations will not see disappeared heads in the # cache. # Case 5 (common): tip has changed, so we've added/replaced heads. # As it happens, the code to handle cases 3, 4, 5 is the same. # N.B. in case 4 (nodes destroyed), "new head" really means "newly # exposed". if not len(repo.file('.hgtags')): # No tags have ever been committed, so we can avoid a # potentially expensive search. return ([], {}, valid, None, True) # Now we have to lookup the .hgtags filenode for every new head. # This is the most expensive part of finding tags, so performance # depends primarily on the size of newheads. Worst case: no cache # file, so newheads == repoheads. cachefnode = _getfnodes(ui, repo, repoheads) # Caller has to iterate over all heads, but can use the filenodes in # cachefnode to get to each .hgtags revision quickly. return (repoheads, cachefnode, valid, None, True) def _getfnodes(ui, repo, nodes): """return .hgtags fnodes for a list of changeset nodes Return value is a {node: fnode} mapping. There will be no entry for nodes without a '.hgtags' file. """ starttime = util.timer() fnodescache = hgtagsfnodescache(repo.unfiltered()) cachefnode = {} for node in reversed(nodes): fnode = fnodescache.getfnode(node) if fnode != nullid: cachefnode[node] = fnode fnodescache.write() duration = util.timer() - starttime ui.log('tagscache', '%d/%d cache hits/lookups in %0.4f ' 'seconds\n', fnodescache.hitcount, fnodescache.lookupcount, duration) return cachefnode def _writetagcache(ui, repo, valid, cachetags): filename = _filename(repo) try: cachefile = repo.cachevfs(filename, 'w', atomictemp=True) except (OSError, IOError): return ui.log('tagscache', 'writing .hg/cache/%s with %d tags\n', filename, len(cachetags)) if valid[2]: cachefile.write('%d %s %s\n' % (valid[0], hex(valid[1]), hex(valid[2]))) else: cachefile.write('%d %s\n' % (valid[0], hex(valid[1]))) # Tag names in the cache are in UTF-8 -- which is the whole reason # we keep them in UTF-8 throughout this module. If we converted # them local encoding on input, we would lose info writing them to # the cache. for (name, (node, hist)) in sorted(cachetags.iteritems()): for n in hist: cachefile.write("%s %s\n" % (hex(n), name)) cachefile.write("%s %s\n" % (hex(node), name)) try: cachefile.close() except (OSError, IOError): pass def tag(repo, names, node, message, local, user, date, editor=False): '''tag a revision with one or more symbolic names. names is a list of strings or, when adding a single tag, names may be a string. if local is True, the tags are stored in a per-repository file. otherwise, they are stored in the .hgtags file, and a new changeset is committed with the change. keyword arguments: local: whether to store tags in non-version-controlled file (default False) message: commit message to use if committing user: name of user to use if committing date: date tuple to use if committing''' if not local: m = matchmod.exact(repo.root, '', ['.hgtags']) if any(repo.status(match=m, unknown=True, ignored=True)): raise error.Abort(_('working copy of .hgtags is changed'), hint=_('please commit .hgtags manually')) with repo.wlock(): repo.tags() # instantiate the cache _tag(repo, names, node, message, local, user, date, editor=editor) def _tag(repo, names, node, message, local, user, date, extra=None, editor=False): if isinstance(names, str): names = (names,) branches = repo.branchmap() for name in names: repo.hook('pretag', throw=True, node=hex(node), tag=name, local=local) if name in branches: repo.ui.warn(_("warning: tag %s conflicts with existing" " branch name\n") % name) def writetags(fp, names, munge, prevtags): fp.seek(0, 2) if prevtags and not prevtags.endswith('\n'): fp.write('\n') for name in names: if munge: m = munge(name) else: m = name if (repo._tagscache.tagtypes and name in repo._tagscache.tagtypes): old = repo.tags().get(name, nullid) fp.write('%s %s\n' % (hex(old), m)) fp.write('%s %s\n' % (hex(node), m)) fp.close() prevtags = '' if local: try: fp = repo.vfs('localtags', 'r+') except IOError: fp = repo.vfs('localtags', 'a') else: prevtags = fp.read() # local tags are stored in the current charset writetags(fp, names, None, prevtags) for name in names: repo.hook('tag', node=hex(node), tag=name, local=local) return try: fp = repo.wvfs('.hgtags', 'rb+') except IOError as e: if e.errno != errno.ENOENT: raise fp = repo.wvfs('.hgtags', 'ab') else: prevtags = fp.read() # committed tags are stored in UTF-8 writetags(fp, names, encoding.fromlocal, prevtags) fp.close() repo.invalidatecaches() if '.hgtags' not in repo.dirstate: repo[None].add(['.hgtags']) m = matchmod.exact(repo.root, '', ['.hgtags']) tagnode = repo.commit(message, user, date, extra=extra, match=m, editor=editor) for name in names: repo.hook('tag', node=hex(node), tag=name, local=local) return tagnode _fnodescachefile = 'hgtagsfnodes1' _fnodesrecsize = 4 + 20 # changeset fragment + filenode _fnodesmissingrec = '\xff' * 24 class hgtagsfnodescache(object): """Persistent cache mapping revisions to .hgtags filenodes. The cache is an array of records. Each item in the array corresponds to a changelog revision. Values in the array contain the first 4 bytes of the node hash and the 20 bytes .hgtags filenode for that revision. The first 4 bytes are present as a form of verification. Repository stripping and rewriting may change the node at a numeric revision in the changelog. The changeset fragment serves as a verifier to detect rewriting. This logic is shared with the rev branch cache (see branchmap.py). The instance holds in memory the full cache content but entries are only parsed on read. Instances behave like lists. ``c[i]`` works where i is a rev or changeset node. Missing indexes are populated automatically on access. """ def __init__(self, repo): assert repo.filtername is None self._repo = repo # Only for reporting purposes. self.lookupcount = 0 self.hitcount = 0 try: data = repo.cachevfs.read(_fnodescachefile) except (OSError, IOError): data = "" self._raw = bytearray(data) # The end state of self._raw is an array that is of the exact length # required to hold a record for every revision in the repository. # We truncate or extend the array as necessary. self._dirtyoffset is # defined to be the start offset at which we need to write the output # file. This offset is also adjusted when new entries are calculated # for array members. cllen = len(repo.changelog) wantedlen = cllen * _fnodesrecsize rawlen = len(self._raw) self._dirtyoffset = None if rawlen < wantedlen: self._dirtyoffset = rawlen self._raw.extend('\xff' * (wantedlen - rawlen)) elif rawlen > wantedlen: # There's no easy way to truncate array instances. This seems # slightly less evil than copying a potentially large array slice. for i in range(rawlen - wantedlen): self._raw.pop() self._dirtyoffset = len(self._raw) def getfnode(self, node, computemissing=True): """Obtain the filenode of the .hgtags file at a specified revision. If the value is in the cache, the entry will be validated and returned. Otherwise, the filenode will be computed and returned unless "computemissing" is False, in which case None will be returned without any potentially expensive computation being performed. If an .hgtags does not exist at the specified revision, nullid is returned. """ ctx = self._repo[node] rev = ctx.rev() self.lookupcount += 1 offset = rev * _fnodesrecsize record = '%s' % self._raw[offset:offset + _fnodesrecsize] properprefix = node[0:4] # Validate and return existing entry. if record != _fnodesmissingrec: fileprefix = record[0:4] if fileprefix == properprefix: self.hitcount += 1 return record[4:] # Fall through. # If we get here, the entry is either missing or invalid. if not computemissing: return None # Populate missing entry. try: fnode = ctx.filenode('.hgtags') except error.LookupError: # No .hgtags file on this revision. fnode = nullid self._writeentry(offset, properprefix, fnode) return fnode def setfnode(self, node, fnode): """Set the .hgtags filenode for a given changeset.""" assert len(fnode) == 20 ctx = self._repo[node] # Do a lookup first to avoid writing if nothing has changed. if self.getfnode(ctx.node(), computemissing=False) == fnode: return self._writeentry(ctx.rev() * _fnodesrecsize, node[0:4], fnode) def _writeentry(self, offset, prefix, fnode): # Slices on array instances only accept other array. entry = bytearray(prefix + fnode) self._raw[offset:offset + _fnodesrecsize] = entry # self._dirtyoffset could be None. self._dirtyoffset = min(self._dirtyoffset or 0, offset or 0) def write(self): """Perform all necessary writes to cache file. This may no-op if no writes are needed or if a write lock could not be obtained. """ if self._dirtyoffset is None: return data = self._raw[self._dirtyoffset:] if not data: return repo = self._repo try: lock = repo.wlock(wait=False) except error.LockError: repo.ui.log('tagscache', 'not writing .hg/cache/%s because ' 'lock cannot be acquired\n' % (_fnodescachefile)) return try: f = repo.cachevfs.open(_fnodescachefile, 'ab') try: # if the file has been truncated actualoffset = f.tell() if actualoffset < self._dirtyoffset: self._dirtyoffset = actualoffset data = self._raw[self._dirtyoffset:] f.seek(self._dirtyoffset) f.truncate() repo.ui.log('tagscache', 'writing %d bytes to cache/%s\n' % ( len(data), _fnodescachefile)) f.write(data) self._dirtyoffset = None finally: f.close() except (IOError, OSError) as inst: repo.ui.log('tagscache', "couldn't write cache/%s: %s\n" % ( _fnodescachefile, stringutil.forcebytestr(inst))) finally: lock.release()