Gentoo Archives: gentoo-portage-dev

From: "Anthony G. Basile" <basile@××××××××××××××.edu>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
Date: Mon, 23 May 2016 12:08:28
Message-Id: a7b70e82-c2c6-d479-84fa-1781ec51006c@opensource.dyc.edu
In Reply to: Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale by "Michał Górny"
1 On 5/23/16 2:44 AM, Michał Górny wrote:
2 > On Sun, 22 May 2016 13:04:40 -0400
3 > "Anthony G. Basile" <basile@××××××××××××××.edu> wrote:
4 >
5 >> From: "Anthony G. Basile" <blueness@g.o>
6 >>
7 >> The current method to check for a sane system locale is to use python's
8 >> ctypes.util.find_library() to construct a full library path to the
9 >> system libc.so and pass that path to ctypes.CDLL() so we can call
10 >> toupper() and tolower() directly. However, this gets bogged down in
11 >> implementation details and fails with musl.
12 >>
13 >> We work around this design flaw in ctypes with a small python module
14 >> written in C which provides thin wrappers to toupper() and tolower(),
15 >> and only fall back on the current ctypes-based check when this module
16 >> is not available.
17 >>
18 >> This has been tested on glibc, uClibc and musl systems.
19 >>
20 >> X-Gentoo-bug: 571444
21 >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
22 >>
23 >> Signed-off-by: Anthony G. Basile <blueness@g.o>
24 >> ---
25 >> pym/portage/util/locale.py | 32 ++++++++++-----
26 >> setup.py | 6 ++-
27 >> src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
28 >> 3 files changed, 121 insertions(+), 11 deletions(-)
29 >> create mode 100644 src/portage_c_convert_case.c
30 >>
31 >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
32 >> index 2a15ea1..85ddd2b 100644
33 >> --- a/pym/portage/util/locale.py
34 >> +++ b/pym/portage/util/locale.py
35 >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
36 >> import locale
37 >> import logging
38 >> import os
39 >> +import sys
40 >> import textwrap
41 >> import traceback
42 >>
43 >> @@ -34,18 +35,26 @@ def _check_locale(silent):
44 >> """
45 >> The inner locale check function.
46 >> """
47 >> -
48 >> - libc_fn = find_library("c")
49 >> - if libc_fn is None:
50 >> - return None
51 >> - libc = LoadLibrary(libc_fn)
52 >> - if libc is None:
53 >> - return None
54 >> + try:
55 >> + from portage_c_convert_case import _c_toupper, _c_tolower
56 >> + libc_tolower = _c_tolower
57 >> + libc_toupper = _c_toupper
58 >
59 > Now I'm being picky... but if you named the functions toupper()
60 > and tolower(), you could actually import the whole module as 'libc'
61 > and have less code!
62
63 I see what you're saying, and its tempting because its elegant, but I'm
64 afraid of a clash of names. I've got a bad feeling this will get us
65 into trouble later.
66
67 Let me play with this and see what happens.
68
69 >
70 > Also it would be nice to actually make the module more generic. There
71 > are more places where we use CDLL, and all of them could eventually be
72 > supported by the module (unshare() would be much better done in C, for
73 > example).
74
75 Yeah I get your point here. Let me convince myself first.
76
77 >
78 >> + except ImportError:
79 >> + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
80 >> + level=logging.WARNING, noiselevel=-1)
81 >
82 > Do we really want to warn verbosely about this? I think it'd be
83 > a pretty common case for people running the git checkout.
84
85 This should stay. Its good to know that the module is not being
86 imported and silently falling back on the ctypes stuff.
87
88 1) its only going to happen in the rare occasion that you're using
89 something like a turkish locale and can't import the module.
90
91 2) people who do a git checkout should add
92 PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
93 I can add something to testpath. Users will have to be instructed to
94 run `./setup build` and then the script shoudl read something like this
95
96 unamem=$(uname -m)
97
98 pythonversion=$(python --version 2>&1 | cut -c8-)
99 pythonversion=${pythonversion%\.*}
100
101 portagedir=$(dirname ${BASH_SOURCE[0]})
102
103 export PATH="${portagedir}/bin:${PATH}"
104
105 export
106 PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"
107
108 export PYTHONWARNINGS=d,i::ImportWarning
109
110
111 BTW, the original code must have a bug in it. It reads
112
113 export PYTHONPATH=PYTHONPATH="$(dirname
114 $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"
115
116 The double PYTHONPATH=PYTHONPATH= can't be right.
117
118 >
119 >> + libc_fn = find_library("c")
120 >> + if libc_fn is None:
121 >> + return None
122 >> + libc = LoadLibrary(libc_fn)
123 >> + if libc is None:
124 >> + return None
125 >> + libc_tolower = libc.tolower
126 >> + libc_toupper = libc.toupper
127 >>
128 >> lc = list(range(ord('a'), ord('z')+1))
129 >> uc = list(range(ord('A'), ord('Z')+1))
130 >> - rlc = [libc.tolower(c) for c in uc]
131 >> - ruc = [libc.toupper(c) for c in lc]
132 >> + rlc = [libc_tolower(c) for c in uc]
133 >> + ruc = [libc_toupper(c) for c in lc]
134 >>
135 >> if lc != rlc or uc != ruc:
136 >> if silent:
137 >> @@ -62,7 +71,10 @@ def _check_locale(silent):
138 >> "as LC_CTYPE in make.conf.")
139 >> msg = [l for l in textwrap.wrap(msg, 70)]
140 >> msg.append("")
141 >> - chars = lambda l: ''.join(chr(x) for x in l)
142 >> + if sys.version_info.major >= 3:
143 >
144 > Portage uses hexversion for comparisons. Please be consistent.
145
146 I see that. Yuck, why use sys.hexversion? version_info is easier to
147 read. Anyhow, I'll change it because consistency is important.
148
149 >
150 >> + chars = lambda l: ''.join(chr(x) for x in l)
151 >> + else:
152 >> + chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)
153 >
154 > This looks like an unrelated change. Was the original code buggy? If
155 > this is the case, then please fix it in a separate commit.
156
157 Yes the original code is buggy. Either via the module or without, if x
158 is outside the ascii range, chr(x) fails to convert it properly and
159 throws an error. This code was not tested:
160
161 Traceback (most recent call last):
162 File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
163 124, in check_locale
164 ret = _check_locale(silent)
165 File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
166 81, in _check_locale
167 " %s -> %s" % (chars(lc), chars(ruc)),
168 File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
169 74, in <lambda>
170 chars = lambda l: ''.join(chr(x) for x in l)
171 UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
172 ordinal not in range(128)
173
174 I'll separate out that commit.
175
176 >
177 >> if uc != ruc:
178 >> msg.extend([
179 >> " %s -> %s" % (chars(lc), chars(ruc)),
180 >> diff --git a/setup.py b/setup.py
181 >> index 25429bc..8b6b408 100755
182 >> --- a/setup.py
183 >> +++ b/setup.py
184 >> @@ -47,7 +47,11 @@ x_scripts = {
185 >> # Dictionary custom modules written in C/C++ here. The structure is
186 >> # key = module name
187 >> # value = list of C/C++ source code, path relative to top source directory
188 >> -x_c_helpers = {}
189 >> +x_c_helpers = {
190 >> + 'portage_c_convert_case' : [
191 >> + 'src/portage_c_convert_case.c',
192 >> + ],
193 >> +}
194 >>
195 >> class x_build(build):
196 >> """ Build command with extra build_man call. """
197 >> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
198 >> new file mode 100644
199 >> index 0000000..f60b0c2
200 >> --- /dev/null
201 >> +++ b/src/portage_c_convert_case.c
202 >> @@ -0,0 +1,94 @@
203 >> +/* Copyright 2005-2016 Gentoo Foundation
204 >> + * Distributed under the terms of the GNU General Public License v2
205 >> + */
206 >> +
207 >> +#include <Python.h>
208 >> +#include <ctype.h>
209 >> +
210 >> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
211 >> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
212 >> +
213 >> +static PyMethodDef ConvertCaseMethods[] = {
214 >> + {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
215 >> + {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
216 >> + {NULL, NULL, 0, NULL}
217 >
218 > You should include stdlib.h or stdio.h for NULL.
219
220 yep.
221
222 >
223 >> +};
224 >> +
225 >> +#if PY_MAJOR_VERSION >= 3
226 >> +static struct PyModuleDef moduledef = {
227 >> + PyModuleDef_HEAD_INIT,
228 >> + "portage_c_convert_case", /* m_name */
229 >> + "Module for converting case using the system locale", /* m_doc */
230 >> + -1, /* m_size */
231 >> + ConvertCaseMethods, /* m_methods */
232 >> + NULL, /* m_reload */
233 >> + NULL, /* m_traverse */
234 >> + NULL, /* m_clear */
235 >> + NULL, /* m_free */
236 >> +};
237 >> +#endif
238 >> +
239 >> +static PyObject *ConvertCaseError;
240 >> +
241 >> +PyMODINIT_FUNC
242 >> +#if PY_MAJOR_VERSION >= 3
243 >> +PyInit_portage_c_convert_case(void)
244 >> +#else
245 >> +initportage_c_convert_case(void)
246 >> +#endif
247 >> +{
248 >> + PyObject *m;
249 >> +
250 >> +#if PY_MAJOR_VERSION >= 3
251 >> + m = PyModule_Create(&moduledef);
252 >> +#else
253 >> + m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
254 >> +#endif
255 >> +
256 >> + if (m == NULL)
257 >> +#if PY_MAJOR_VERSION >= 3
258 >> + return NULL;
259 >> +#else
260 >> + return;
261 >> +#endif
262 >> +
263 >> + ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
264 >> + Py_INCREF(ConvertCaseError);
265 >> + PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
266 >> +
267 >> +#if PY_MAJOR_VERSION >= 3
268 >> + return m;
269 >> +#else
270 >> + return;
271 >> +#endif
272 >> +}
273 >
274 > To be honest, I think we'd be happier having two big #ifdefs for init
275 > funcs rather than one function with a lot of #ifdefs, and the common
276 > ConvertCaseError part in a separate static function.
277
278 I can move things around and reduce the #ifdefs, especially if we drop
279 the custom exception.
280
281 >
282 >> +
283 >> +
284 >> +static PyObject *
285 >> +portage_c_tolower(PyObject *self, PyObject *args)
286 >> +{
287 >> + int c;
288 >> +
289 >> + if (!PyArg_ParseTuple(args, "i", &c))
290 >> + {
291 >> + PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");
292 >
293 > From PyArg_ParseTuple() [1]:
294 >
295 > | on failure, it returns false and raises the appropriate exception.
296 >
297 > So I don't think you need or should use a custom exception here.
298 >
299 > [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple
300
301 I am aware, but then you loose the specificity of the error, whether it
302 occurred in _c_tolower or _c_toupper. Its small, so I'll remove it
303 because it also cleans up the #ifdef spaghetti, but if we were to expand
304 this module for other libc functions then we should consider re-adding it.
305
306 >
307 >> + return NULL;
308 >> + }
309 >> +
310 >> + return Py_BuildValue("i", tolower(c));
311 >> +}
312 >> +
313 >> +
314 >> +static PyObject *
315 >> +portage_c_toupper(PyObject *self, PyObject *args)
316 >> +{
317 >> + int c;
318 >> +
319 >> + if (!PyArg_ParseTuple(args, "i", &c))
320 >> + {
321 >> + PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
322 >> + return NULL;
323 >> + }
324 >> +
325 >> + return Py_BuildValue("i", toupper(c));
326 >> +}
327 >
328 > Thanks a lot for your effort. This is really getting in shape.
329
330 I'm doing this because check_locale() broke portage on musl. I'm not a
331 frequent contributor to portage so I'd ask the portage team to free free
332 style it up for me. vapier does that to my code all the time :)
333
334
335 --
336 Anthony G. Basile, Ph. D.
337 Chair of Information Technology
338 D'Youville College
339 Buffalo, NY 14201
340 (716) 829-8197

Replies