changegroup: port to emitrevisions() (
issue5976)
We now have a unified API for emitting revision data from a storage
backend. It handles sorting nodes and the complicated delta versus
revision decisions for us.
This commit ports changegroup to that API.
There should be no behavior changes for changegroups not using
ellipsis. And lack of test changes seems to confirm that.
There are some changes for ellipsis mode, however.
Before, when sending an ellipsis revision, we would always send a
fulltext revision (as opposed to a delta). There was a TODO tracking
this open item.
One of the things the emitrevisions() API does for us is figure out
whether we can safely emit a delta. So, it is now possible for
ellipsis revisions to be sent as deltas! (It does this by not
assuming parent/ancestor revisions are available and tracking which
revisions have been sent out.)
Because we eliminated the list of revision delta request objects,
performance has improved substantially:
$ hg perfchangegroupchangelog
before: ! wall 24.348077 comb 24.330000 user 24.140000 sys 0.190000 (best of 3)
after: ! wall 18.245911 comb 18.240000 user 18.100000 sys 0.140000 (best of 3)
That's a lot of overhead for creating a few hundred thousand Python
objects!
This is still a little slower than 4.7. Probably due to
23d582ca
introducing a type for the revision/delta results. There is
potentially room to optimize. But at some point we need to abstract
storage in order to support alternate storage backends. Unfortunately
that means using a Python data structure to represent results. And
unfortunately there is overhead with every new Python object created.
Differential Revision: https://phab.mercurial-scm.org/D4725
#!/usr/bin/env python
#
# posplit - split messages in paragraphs on .po/.pot files
#
# license: MIT/X11/Expat
#
from __future__ import absolute_import, print_function
import polib
import re
import sys
def addentry(po, entry, cache):
e = cache.get(entry.msgid)
if e:
e.occurrences.extend(entry.occurrences)
# merge comments from entry
for comment in entry.comment.split('\n'):
if comment and comment not in e.comment:
if not e.comment:
e.comment = comment
else:
e.comment += '\n' + comment
else:
po.append(entry)
cache[entry.msgid] = entry
def mkentry(orig, delta, msgid, msgstr):
entry = polib.POEntry()
entry.merge(orig)
entry.msgid = msgid or orig.msgid
entry.msgstr = msgstr or orig.msgstr
entry.occurrences = [(p, int(l) + delta) for (p, l) in orig.occurrences]
return entry
if __name__ == "__main__":
po = polib.pofile(sys.argv[1])
cache = {}
entries = po[:]
po[:] = []
findd = re.compile(r' *\.\. (\w+)::') # for finding directives
for entry in entries:
msgids = entry.msgid.split(u'\n\n')
if entry.msgstr:
msgstrs = entry.msgstr.split(u'\n\n')
else:
msgstrs = [u''] * len(msgids)
if len(msgids) != len(msgstrs):
# places the whole existing translation as a fuzzy
# translation for each paragraph, to give the
# translator a chance to recover part of the old
# translation - erasing extra paragraphs is
# probably better than retranslating all from start
if 'fuzzy' not in entry.flags:
entry.flags.append('fuzzy')
msgstrs = [entry.msgstr] * len(msgids)
delta = 0
for msgid, msgstr in zip(msgids, msgstrs):
if msgid and msgid != '::':
newentry = mkentry(entry, delta, msgid, msgstr)
mdirective = findd.match(msgid)
if mdirective:
if not msgid[mdirective.end():].rstrip():
# only directive, nothing to translate here
delta += 2
continue
directive = mdirective.group(1)
if directive in ('container', 'include'):
if msgid.rstrip('\n').count('\n') == 0:
# only rst syntax, nothing to translate
delta += 2
continue
else:
# lines following directly, unexpected
print('Warning: text follows line with directive' \
' %s' % directive)
comment = 'do not translate: .. %s::' % directive
if not newentry.comment:
newentry.comment = comment
elif comment not in newentry.comment:
newentry.comment += '\n' + comment
addentry(po, newentry, cache)
delta += 2 + msgid.count('\n')
po.save()