tags: create new sortdict for performance reasons stable
authorGregory Szorc <gregory.szorc@gmail.com>
Thu, 12 Nov 2015 13:16:04 -0800
branchstable
changeset 26945 8a256cee72c8
parent 26933 a7eecd021782
child 26946 3309714ded26
child 26959 ed5f20f9c22e
tags: create new sortdict for performance reasons sortdict internally maintains a list of keys in insertion order. When a key is replaced via __setitem__, we .remove() from this list. This involves a linear scan and array adjustment. This is an expensive operation. The tags reading code was calling into sortdict.__setitem__ for each tag in a read .hgtags revision. For repositories with thousands of tags or thousands of .hgtags revisions, the overhead from list.remove() noticeable. This patch creates a new sortdict() so __setitem__ calls don't incur a list.remove. This doesn't appear to have any performance impact on my Firefox repository. But that's only because tags reading doesn't show up in profiles to begin with. I'm still waiting to hear from a user with over 10,000 tags and hundreds of heads on the impact of this patch.
mercurial/tags.py
--- a/mercurial/tags.py	Fri Nov 13 02:36:30 2015 +0900
+++ b/mercurial/tags.py	Thu Nov 12 13:16:04 2015 -0800
@@ -221,9 +221,13 @@
     '''
     filetags, nodelines = _readtaghist(ui, repo, lines, fn, recode=recode,
                                        calcnodelines=calcnodelines)
+    # util.sortdict().__setitem__ is much slower at replacing then inserting
+    # new entries. The difference can matter if there are thousands of tags.
+    # Create a new sortdict to avoid the performance penalty.
+    newtags = util.sortdict()
     for tag, taghist in filetags.items():
-        filetags[tag] = (taghist[-1], taghist[:-1])
-    return filetags
+        newtags[tag] = (taghist[-1], taghist[:-1])
+    return newtags
 
 def _updatetags(filetags, tagtype, alltags, tagtypes):
     '''Incorporate the tag info read from one file into the two