Mercurial > hg
annotate i18n/posplit @ 43271:99394e6c5d12
rust-dirstate-status: add first Rust implementation of `dirstate.status`
Note: This patch also added the rayon crate as a Cargo dependency. It will
help us immensely in making Rust code parallel and easy to maintain. It is
a stable, well-known, and supported crate maintained by people on the Rust
team.
The current `dirstate.status` method has grown over the years through bug
reports and new features to the point where it got too big and too complex.
This series does not yet improve the logic, but adds a Rust fast-path to speed
up certain cases.
Tested on mozilla-try-2019-02-18 with zstd compression:
- `hg diff` on an empty working copy:
- c: 1.64(+-)0.04s
- rust+c before this change: 2.84(+-)0.1s
- rust+c: 849(+-)40ms
- `hg commit` when creating a file:
- c: 5.960s
- rust+c before this change: 5.828s
- rust+c: 4.668s
- `hg commit` when updating a file:
- c: 4.866s
- rust+c before this change: 4.371s
- rust+c: 3.855s
- `hg status -mard`
- c: 1.82(+-)0.04s
- rust+c before this change: 2.64(+-)0.1s
- rust+c: 896(+-)30ms
The numbers are clear: the current Rust `dirstatemap` implementation is super
slow, its performance needs to be addressed.
This will be done in a future series, immediately after this one, with the goal
of getting Rust to be at least to the speed of the Python + C implementation
in all cases before the 5.2 freeze. At worse, we gate dirstatemap to only be used
in those cases.
Cases where the fast-path is not executed:
- for commands that need ignore support (`status`, for example)
- if subrepos are found (should not be hard to add, but winter is coming)
- any other matcher than an `alwaysmatcher`, like patterns, etc.
- with extensions like `sparse` and `fsmonitor`
The next step after this is to rethink the logic to be closer to
Jane Street's Valentin Gatien-Baron's Rust fast-path which does a lot less
work when possible.
Differential Revision: https://phab.mercurial-scm.org/D7058
author | Raphaël Gomès <rgomes@octobus.net> |
---|---|
date | Fri, 11 Oct 2019 13:39:57 +0200 |
parents | aaad36b88298 |
children | 47ef023d0165 |
rev | line source |
---|---|
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
2 # |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
3 # posplit - split messages in paragraphs on .po/.pot files |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
4 # |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
5 # license: MIT/X11/Expat |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
6 # |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
7 |
29153
90d84e1e427a
py3: make i18n/posplit use print_function
Pulkit Goyal <7895pulkit@gmail.com>
parents:
29152
diff
changeset
|
8 from __future__ import absolute_import, print_function |
29152
c5057b7780dc
py3: make i18n/posplit use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
28074
diff
changeset
|
9 |
c5057b7780dc
py3: make i18n/posplit use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
28074
diff
changeset
|
10 import polib |
20359
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
11 import re |
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
12 import sys |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
13 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
14 def addentry(po, entry, cache): |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
15 e = cache.get(entry.msgid) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
16 if e: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
17 e.occurrences.extend(entry.occurrences) |
39269
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
18 |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
19 # merge comments from entry |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
20 for comment in entry.comment.split('\n'): |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
21 if comment and comment not in e.comment: |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
22 if not e.comment: |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
23 e.comment = comment |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
24 else: |
d0e8933d6dad
i18n: merge i18n comments of translatable texts correctly
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
29153
diff
changeset
|
25 e.comment += '\n' + comment |
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
26 else: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
27 po.append(entry) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
28 cache[entry.msgid] = entry |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
29 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
30 def mkentry(orig, delta, msgid, msgstr): |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
31 entry = polib.POEntry() |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
32 entry.merge(orig) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
33 entry.msgid = msgid or orig.msgid |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
34 entry.msgstr = msgstr or orig.msgstr |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
35 entry.occurrences = [(p, int(l) + delta) for (p, l) in orig.occurrences] |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
36 return entry |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
37 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
38 if __name__ == "__main__": |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
39 po = polib.pofile(sys.argv[1]) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
40 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
41 cache = {} |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
42 entries = po[:] |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
43 po[:] = [] |
20359
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
44 findd = re.compile(r' *\.\. (\w+)::') # for finding directives |
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
45 for entry in entries: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
46 msgids = entry.msgid.split(u'\n\n') |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
47 if entry.msgstr: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
48 msgstrs = entry.msgstr.split(u'\n\n') |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
49 else: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
50 msgstrs = [u''] * len(msgids) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
51 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
52 if len(msgids) != len(msgstrs): |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
53 # places the whole existing translation as a fuzzy |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
54 # translation for each paragraph, to give the |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
55 # translator a chance to recover part of the old |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
56 # translation - erasing extra paragraphs is |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
57 # probably better than retranslating all from start |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
58 if 'fuzzy' not in entry.flags: |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
59 entry.flags.append('fuzzy') |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
60 msgstrs = [entry.msgstr] * len(msgids) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
61 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
62 delta = 0 |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
63 for msgid, msgstr in zip(msgids, msgstrs): |
20361
3fe079d3a2b4
i18n: posplit removes the entry "::" from the pot file
Simon Heimberg <simohe@besonet.ch>
parents:
20359
diff
changeset
|
64 if msgid and msgid != '::': |
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
65 newentry = mkentry(entry, delta, msgid, msgstr) |
20359
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
66 mdirective = findd.match(msgid) |
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
67 if mdirective: |
20362
1bce1078501d
i18n: leave out entries which contain only a rst directive
Simon Heimberg <simohe@besonet.ch>
parents:
20361
diff
changeset
|
68 if not msgid[mdirective.end():].rstrip(): |
1bce1078501d
i18n: leave out entries which contain only a rst directive
Simon Heimberg <simohe@besonet.ch>
parents:
20361
diff
changeset
|
69 # only directive, nothing to translate here |
28074
a1924bc6e267
i18n: calculate correct line number in source of messages to be translated
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
20363
diff
changeset
|
70 delta += 2 |
20362
1bce1078501d
i18n: leave out entries which contain only a rst directive
Simon Heimberg <simohe@besonet.ch>
parents:
20361
diff
changeset
|
71 continue |
20359
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
72 directive = mdirective.group(1) |
20363
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
73 if directive in ('container', 'include'): |
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
74 if msgid.rstrip('\n').count('\n') == 0: |
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
75 # only rst syntax, nothing to translate |
28074
a1924bc6e267
i18n: calculate correct line number in source of messages to be translated
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents:
20363
diff
changeset
|
76 delta += 2 |
20363
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
77 continue |
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
78 else: |
e3ee7ec85a15
i18n: leave out entries which contain only rst syntax
Simon Heimberg <simohe@besonet.ch>
parents:
20362
diff
changeset
|
79 # lines following directly, unexpected |
41759
aaad36b88298
cleanup: use () to wrap long lines instead of \
Augie Fackler <augie@google.com>
parents:
39269
diff
changeset
|
80 print('Warning: text follows line with directive' |
29153
90d84e1e427a
py3: make i18n/posplit use print_function
Pulkit Goyal <7895pulkit@gmail.com>
parents:
29152
diff
changeset
|
81 ' %s' % directive) |
20359
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
82 comment = 'do not translate: .. %s::' % directive |
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
83 if not newentry.comment: |
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
84 newentry.comment = comment |
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
85 elif comment not in newentry.comment: |
ff6ab0b2ebf7
i18n: posplit writes a warning for translators before rst directives
Simon Heimberg <simohe@besonet.ch>
parents:
11389
diff
changeset
|
86 newentry.comment += '\n' + comment |
11389
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
87 addentry(po, newentry, cache) |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
88 delta += 2 + msgid.count('\n') |
4fd49329a1b5
i18n: script for splitting large messages on .po/.pot files
Wagner Bruna <wbruna@yahoo.com>
parents:
diff
changeset
|
89 po.save() |