Thu, 08 Jan 2015 00:01:03 +0100 branchmap: use revbranchcache when updating branch map
Mads Kiilerich <madski@unity3d.com> [Thu, 08 Jan 2015 00:01:03 +0100] rev 23786
branchmap: use revbranchcache when updating branch map The revbranchcache is read on demand before it will be used for updating the branch map. It is written back when the branchmap is written and it will thus use the same locking as branchmap. The revbranchcache instance is short-lived; it is only stored in the branchmap from .update() is invoked and until .write() is invoked. Branchmap already assume that the repo is locked in that case. The use of revbranchcache for branch map updates will make sure that the revbranchcache "always" is kept up-to-date. The perfbranchmap benchmark is somewhat bogus, especially when we can see that the caching makes a significant difference between the realistic case of a first run and the rare case of rerunning it with a full cache. Here are some 'base' numbers on mozilla-central: Before: ! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3) After - initial, cache is empty: ! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3) After - cache is full: ! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4) The overhead when running with empty cache comes from checking, missing and updating it every time. Most of the performance improvement comes from not having to extract the branch info from the changelog. The last doubling of performance comes from no longer having to convert all branch names to local encoding but reuse the few already converted branch names. On the hg repo: Before: ! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14) After: ! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
Thu, 08 Jan 2015 00:01:03 +0100 branchcache: introduce revbranchcache for caching of revision branch names
Mads Kiilerich <madski@unity3d.com> [Thu, 08 Jan 2015 00:01:03 +0100] rev 23785
branchcache: introduce revbranchcache for caching of revision branch names It is expensive to retrieve the branch name of a revision. Very expensive when creating a changectx and calling .branch() every time - slightly less when using changelog.branchinfo(). Now, to speed things up, provide a way to cache the results on disk in an efficient format. Each branchname is assigned a number, and for each revision we store the number of the corresponding branch name. The branch names are stored in a dedicated file which is strictly append only. Branch names are usually reused across several revisions, and the total list of branch names will thus be so small that it is feasible to read the whole set of names before using the cache. It will however do that it might be more efficient to use the changelog for retrieving the branch info for a single revision. The revision entries are stored in another file. This file is usually append only, but if the repository has been modified, the file will be truncated and the relevant parts rewritten on demand. The entries for each revision are 8 bytes each, and the whole revision file will thus be 1/8 of 00changelog.i. Each revision entry contains the first 4 bytes of the corresponding node hash. This is used as a check sum that always is verified before the entry is used. That check is relatively expensive but it makes sure history modification is detected and handled correctly. It will also detect and handle most revision file corruptions. This is just a cache. A new format can always be introduced if other requirements or ideas make that seem like a good idea. Rebuilding the cache is not really more expensive than it was to run for example 'hg log -b branchname' before this cache was introduced. This new method is still unused but promise to make some operations several times faster once it actually is used. Abandoning Python 2.4 would make it possible to implement this more efficiently by using struct classes and pack_into. The Python code could probably also be micro optimized or it could be implemented very efficiently in C where it would be easy to control the data access.
(0) -10000 -3000 -1000 -300 -100 -30 -10 -2 +2 +10 +30 +100 +300 +1000 +3000 +10000 tip