revlog: faster hash computation when one of the parent node is null
Because we often compute sha1(nullid), it's interesting to copy a precomputed
hash of nullid instead of computing everytime the same hash. Similarly, when
one of the parents is null, we can avoid a < comparison (sort).
Overall, this change adds a string equality comparison on each hash() call,
but when p2 is null, we drop one string < comparison, and copy a hash instead
of computing it. Since it is common to have revisions with only one parent,
this change makes hash() 25% faster when cloning a big repository.
--- a/mercurial/revlog.py Mon Mar 23 15:36:30 2009 +0100
+++ b/mercurial/revlog.py Mon Mar 23 15:32:29 2009 +0100
@@ -42,6 +42,8 @@
def offset_type(offset, type):
return long(long(offset) << 16 | type)
+nullhash = _sha(nullid)
+
def hash(text, p1, p2):
"""generate a hash from the given text and its parent hashes
@@ -49,10 +51,17 @@
in a manner that makes it easy to distinguish nodes with the same
content in the revision graph.
"""
- l = [p1, p2]
- l.sort()
- s = _sha(l[0])
- s.update(l[1])
+ # As of now, if one of the parent node is null, p2 is null
+ if p2 == nullid:
+ # deep copy of a hash is faster than creating one
+ s = nullhash.copy()
+ s.update(p1)
+ else:
+ # none of the parent nodes are nullid
+ l = [p1, p2]
+ l.sort()
+ s = _sha(l[0])
+ s.update(l[1])
s.update(text)
return s.digest()