changeset 43394:d359dfc15aca stable

fsmonitor: handle unicode keys in tuples In Python 3, keys in the bset tuple are typically str, not bytes. PyBytes_AsString() would return NULL. But we weren't checking the return value and this would lead to a segfault. This commit makes the code type and Python version aware. The Python version specific code is to allow us to utilize a modern API for converting str -> char* without having to allocate an extra PyObject. FWIW I wanted to assume that keys were always str. However, there appear to be some bytes keys in some cases. I haven't debugged this further. Differential Revision: https://phab.mercurial-scm.org/D7210
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 02 Nov 2019 14:17:48 -0700
parents bdebc7b54dca
children 2b8be670dcb6
files hgext/fsmonitor/pywatchman/bser.c
diffstat 1 files changed, 16 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- a/hgext/fsmonitor/pywatchman/bser.c	Sat Nov 02 13:39:23 2019 -0700
+++ b/hgext/fsmonitor/pywatchman/bser.c	Sat Nov 02 14:17:48 2019 -0700
@@ -175,7 +175,22 @@
     const char* item_name = NULL;
     PyObject* key = PyTuple_GET_ITEM(obj->keys, i);
 
-    item_name = PyBytes_AsString(key);
+    if (PyUnicode_Check(key)) {
+#if PY_MAJOR_VERSION >= 3
+      item_name = PyUnicode_AsUTF8(key);
+#else
+      PyObject* utf = PyUnicode_AsEncodedString(key, "utf-8", "ignore");
+      if (utf == NULL) {
+        goto bail;
+      }
+      item_name = PyBytes_AsString(utf);
+#endif
+    } else {
+      item_name = PyBytes_AsString(key);
+    }
+    if (item_name == NULL) {
+      goto bail;
+    }
     if (!strcmp(item_name, namestr)) {
       ret = PySequence_GetItem(obj->values, i);
       goto bail;