changeset 6206:2ec9c87e8574

topic: allow unicode symbols in names as long as they are alphanumeric I decided to relax this logic just a little bit to allow unicode "word characters" (i.e. everything that a unicode regex \w matches). This is still limiting cases that core allows (for branches and bookmarks): core only forbids certain byte values (like null byte, see scmutil.checknewlabel function). This extra check for topic names could be dropped altogether and we could rely solely on checknewlabel(), but I don't know if there isn't some corner case that topics can't handle. Needs more investigation (and tests).
author Anton Shestakov <av6@dwimlabs.net>
date Sat, 19 Mar 2022 19:13:00 +0300
parents 9d81041f735f
children aa9b0d8f268e
files CHANGELOG hgext3rd/topic/__init__.py tests/test-topic.t
diffstat 3 files changed, 36 insertions(+), 4 deletions(-) [+]
line wrap: on
line diff
--- a/CHANGELOG	Sun Mar 13 19:42:10 2022 +0300
+++ b/CHANGELOG	Sat Mar 19 19:13:00 2022 +0300
@@ -1,6 +1,13 @@
 Changelog
 =========
 
+10.6.0 - in progress
+--------------------
+
+topic (0.25.0)
+
+  * topic: allow Unicode word characters in topic names
+
 10.5.0 -- 2022-02-23
 --------------------
 
--- a/hgext3rd/topic/__init__.py	Sun Mar 13 19:42:10 2022 +0300
+++ b/hgext3rd/topic/__init__.py	Sat Mar 19 19:13:00 2022 +0300
@@ -169,6 +169,7 @@
     cmdutil,
     commands,
     context,
+    encoding,
     error,
     exchange,
     extensions,
@@ -827,10 +828,16 @@
         # Have some restrictions on the topic name just like bookmark name
         scmutil.checknewlabel(repo, topic, b'topic')
 
-        rmatch = re.match(br'[-_.\w]+', topic)
-        if not rmatch or rmatch.group(0) != topic:
-            helptxt = _(b"topic names can only consist of alphanumeric, '-'"
-                        b" '_' and '.' characters")
+        helptxt = _(b"topic names can only consist of alphanumeric, '-'"
+                    b" '_' and '.' characters")
+        try:
+            utopic = encoding.unifromlocal(topic)
+        except error.Abort:
+            # Maybe we should allow these topic names as well, as long as they
+            # don't break any other rules
+            utopic = ''
+        rmatch = re.match(r'[-_.\w]+', utopic, re.UNICODE)
+        if not utopic or not rmatch or rmatch.group(0) != utopic:
             raise error.Abort(_(b"invalid topic name: '%s'") % topic, hint=helptxt)
 
     if list:
--- a/tests/test-topic.t	Sun Mar 13 19:42:10 2022 +0300
+++ b/tests/test-topic.t	Sat Mar 19 19:13:00 2022 +0300
@@ -317,6 +317,24 @@
   $ hg topic
    * topicflag (0 changesets)
 
+Non-ascii topic name
+
+  $ hg --encoding utf-8 topic æ
+  $ hg topics
+   * \xc3\xa6 (0 changesets) (esc)
+  $ hg --encoding latin1 topics
+   * \xc3\xa6 (0 changesets) (esc)
+
+  $ hg --encoding utf-8 topic ©
+  abort: invalid topic name: '\xc2\xa9' (esc)
+  (topic names can only consist of alphanumeric, '-' '_' and '.' characters)
+  [255]
+
+  $ hg --encoding latin1 topic æ
+  abort: invalid topic name: '\xc3\xa6' (esc)
+  (topic names can only consist of alphanumeric, '-' '_' and '.' characters)
+  [255]
+
 Make a topic
 
   $ hg topic narf