Mercurial > hg
changeset 43394:d359dfc15aca stable
fsmonitor: handle unicode keys in tuples
In Python 3, keys in the bset tuple are typically str, not
bytes. PyBytes_AsString() would return NULL. But we weren't
checking the return value and this would lead to a segfault.
This commit makes the code type and Python version aware. The
Python version specific code is to allow us to utilize a
modern API for converting str -> char* without having to
allocate an extra PyObject.
FWIW I wanted to assume that keys were always str. However,
there appear to be some bytes keys in some cases. I haven't
debugged this further.
Differential Revision: https://phab.mercurial-scm.org/D7210
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sat, 02 Nov 2019 14:17:48 -0700 |
parents | bdebc7b54dca |
children | 2b8be670dcb6 |
files | hgext/fsmonitor/pywatchman/bser.c |
diffstat | 1 files changed, 16 insertions(+), 1 deletions(-) [+] |
line wrap: on
line diff
--- a/hgext/fsmonitor/pywatchman/bser.c Sat Nov 02 13:39:23 2019 -0700 +++ b/hgext/fsmonitor/pywatchman/bser.c Sat Nov 02 14:17:48 2019 -0700 @@ -175,7 +175,22 @@ const char* item_name = NULL; PyObject* key = PyTuple_GET_ITEM(obj->keys, i); - item_name = PyBytes_AsString(key); + if (PyUnicode_Check(key)) { +#if PY_MAJOR_VERSION >= 3 + item_name = PyUnicode_AsUTF8(key); +#else + PyObject* utf = PyUnicode_AsEncodedString(key, "utf-8", "ignore"); + if (utf == NULL) { + goto bail; + } + item_name = PyBytes_AsString(utf); +#endif + } else { + item_name = PyBytes_AsString(key); + } + if (item_name == NULL) { + goto bail; + } if (!strcmp(item_name, namestr)) { ret = PySequence_GetItem(obj->values, i); goto bail;