changelog: keep track of file end in appender (
issue5444)
Previously, changelog.appender.end() would compute the end of the file by
joining all the current appended data and checking the length. This is an O(n)
operation.
e240e914d226 introduced a seek call before every revlog write, which
means we are hitting this O(n) behavior n times, which causes changelog writes
during a pull to be n^2.
In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
this fix, it's back to 17s.
--- a/mercurial/changelog.py Thu Dec 15 11:14:00 2016 -0500
+++ b/mercurial/changelog.py Thu Dec 15 11:00:18 2016 -0800
@@ -79,9 +79,10 @@
self.fp = fp
self.offset = fp.tell()
self.size = vfs.fstat(fp).st_size
+ self._end = self.size
def end(self):
- return self.size + len("".join(self.data))
+ return self._end
def tell(self):
return self.offset
def flush(self):
@@ -121,6 +122,7 @@
def write(self, s):
self.data.append(str(s))
self.offset += len(s)
+ self._end += len(s)
def _divertopener(opener, target):
"""build an opener that writes in 'target.a' instead of 'target'"""