changelog: keep track of file end in appender (issue5444)
Previously, changelog.appender.end() would compute the end of the file by
joining all the current appended data and checking the length. This is an O(n)
operation. e240e914d226 introduced a seek call before every revlog write, which
means we are hitting this O(n) behavior n times, which causes changelog writes
during a pull to be n^2.
In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
this fix, it's back to 17s.
--- a/mercurial/changelog.py Thu Dec 15 11:14:00 2016 -0500
+++ b/mercurial/changelog.py Thu Dec 15 11:00:18 2016 -0800
@@ -79,9 +79,10 @@
self.fp = fp
self.offset = fp.tell()
self.size = vfs.fstat(fp).st_size
+ self._end = self.size
def end(self):
- return self.size + len("".join(self.data))
+ return self._end
def tell(self):
return self.offset
def flush(self):
@@ -121,6 +122,7 @@
def write(self, s):
self.data.append(str(s))
self.offset += len(s)
+ self._end += len(s)
def _divertopener(opener, target):
"""build an opener that writes in 'target.a' instead of 'target'"""