Mercurial > hg
view tests/test-largefiles-cache.t @ 23785:cb99bacb9b4e
branchcache: introduce revbranchcache for caching of revision branch names
It is expensive to retrieve the branch name of a revision. Very expensive when
creating a changectx and calling .branch() every time - slightly less when
using changelog.branchinfo().
Now, to speed things up, provide a way to cache the results on disk in an
efficient format. Each branchname is assigned a number, and for each revision
we store the number of the corresponding branch name. The branch names are
stored in a dedicated file which is strictly append only.
Branch names are usually reused across several revisions, and the total list of
branch names will thus be so small that it is feasible to read the whole set of
names before using the cache. It will however do that it might be more
efficient to use the changelog for retrieving the branch info for a single
revision.
The revision entries are stored in another file. This file is usually append
only, but if the repository has been modified, the file will be truncated and
the relevant parts rewritten on demand.
The entries for each revision are 8 bytes each, and the whole revision file
will thus be 1/8 of 00changelog.i.
Each revision entry contains the first 4 bytes of the corresponding node hash.
This is used as a check sum that always is verified before the entry is used.
That check is relatively expensive but it makes sure history modification is
detected and handled correctly. It will also detect and handle most revision
file corruptions.
This is just a cache. A new format can always be introduced if other
requirements or ideas make that seem like a good idea. Rebuilding the cache is
not really more expensive than it was to run for example 'hg log -b branchname'
before this cache was introduced.
This new method is still unused but promise to make some operations several
times faster once it actually is used.
Abandoning Python 2.4 would make it possible to implement this more efficiently
by using struct classes and pack_into. The Python code could probably also be
micro optimized or it could be implemented very efficiently in C where it would
be easy to control the data access.
author | Mads Kiilerich <madski@unity3d.com> |
---|---|
date | Thu, 08 Jan 2015 00:01:03 +0100 |
parents | 70afc58c32d3 |
children | d8e0c591781c |
line wrap: on
line source
Create user cache directory $ USERCACHE=`pwd`/cache; export USERCACHE $ cat <<EOF >> ${HGRCPATH} > [extensions] > hgext.largefiles= > [largefiles] > usercache=${USERCACHE} > EOF $ mkdir -p ${USERCACHE} Create source repo, and commit adding largefile. $ hg init src $ cd src $ echo large > large $ hg add --large large $ hg commit -m 'add largefile' $ hg rm large $ hg commit -m 'branchhead without largefile' $ hg up -qr 0 $ cd .. Discard all cached largefiles in USERCACHE $ rm -rf ${USERCACHE} Create mirror repo, and pull from source without largefile: "pull" is used instead of "clone" for suppression of (1) updating to tip (= caching largefile from source repo), and (2) recording source repo as "default" path in .hg/hgrc. $ hg init mirror $ cd mirror $ hg pull ../src pulling from ../src requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 1 changes to 1 files (run 'hg update' to get a working copy) Update working directory to "tip", which requires largefile("large"), but there is no cache file for it. So, hg must treat it as "missing"(!) file. $ hg update -r0 getting changed largefiles large: largefile 7f7097b041ccf68cc5561e9600da4655d21c6d18 not available from file:/*/$TESTTMP/mirror (glob) 0 largefiles updated, 0 removed 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ hg status ! large Update working directory to null: this cleanup .hg/largefiles/dirstate $ hg update null getting changed largefiles 0 largefiles updated, 0 removed 0 files updated, 0 files merged, 1 files removed, 0 files unresolved Update working directory to tip, again. $ hg update -r0 getting changed largefiles large: largefile 7f7097b041ccf68cc5561e9600da4655d21c6d18 not available from file:/*/$TESTTMP/mirror (glob) 0 largefiles updated, 0 removed 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ hg status ! large $ cd .. Verify that largefiles from pulled branchheads are fetched, also to an empty repo $ hg init mirror2 $ hg -R mirror2 pull src -r0 pulling from src adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (run 'hg update' to get a working copy) #if unix-permissions Portable way to print file permissions: $ cat > ls-l.py <<EOF > #!/usr/bin/env python > import sys, os > path = sys.argv[1] > print '%03o' % (os.lstat(path).st_mode & 0777) > EOF $ chmod +x ls-l.py Test that files in .hg/largefiles inherit mode from .hg/store, not from file in working copy: $ cd src $ chmod 750 .hg/store $ chmod 660 large $ echo change >> large $ hg commit -m change created new head $ ../ls-l.py .hg/largefiles/e151b474069de4ca6898f67ce2f2a7263adf8fea 640 Test permission of with files in .hg/largefiles created by update: $ cd ../mirror $ rm -r "$USERCACHE" .hg/largefiles # avoid links $ chmod 750 .hg/store $ hg pull ../src --update -q $ ../ls-l.py .hg/largefiles/e151b474069de4ca6898f67ce2f2a7263adf8fea 640 Test permission of files created by push: $ hg serve -R ../src -d -p $HGPORT --pid-file hg.pid \ > --config "web.allow_push=*" --config web.push_ssl=no $ cat hg.pid >> $DAEMON_PIDS $ echo change >> large $ hg commit -m change $ rm -r "$USERCACHE" $ hg push -q http://localhost:$HGPORT/ $ ../ls-l.py ../src/.hg/largefiles/b734e14a0971e370408ab9bce8d56d8485e368a9 640 $ cd .. #endif Test issue 4053 (remove --after on a deleted, uncommitted file shouldn't say it is missing, but a remove on a nonexistant unknown file still should. Same for a forget.) $ cd src $ touch x $ hg add x $ mv x y $ hg remove -A x y ENOENT ENOENT: * (glob) not removing y: file is untracked [1] $ hg add y $ mv y z $ hg forget y z ENOENT ENOENT: * (glob) not removing z: file is already untracked [1]