Gentoo Archives: gentoo-portage-dev

From: "Michał Górny" <mgorny@g.o>
To: "Anthony G. Basile" <basile@××××××××××××××.edu>
Cc: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale
Date: Mon, 23 May 2016 14:26:16
Message-Id: 20160523162557.4c6087ad.mgorny@gentoo.org
In Reply to: Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale by "Anthony G. Basile"
1 On Mon, 23 May 2016 08:08:18 -0400
2 "Anthony G. Basile" <basile@××××××××××××××.edu> wrote:
3
4 > On 5/23/16 2:44 AM, Michał Górny wrote:
5 > > On Sun, 22 May 2016 13:04:40 -0400
6 > > "Anthony G. Basile" <basile@××××××××××××××.edu> wrote:
7 > >
8 > >> From: "Anthony G. Basile" <blueness@g.o>
9 > >>
10 > >> The current method to check for a sane system locale is to use python's
11 > >> ctypes.util.find_library() to construct a full library path to the
12 > >> system libc.so and pass that path to ctypes.CDLL() so we can call
13 > >> toupper() and tolower() directly. However, this gets bogged down in
14 > >> implementation details and fails with musl.
15 > >>
16 > >> We work around this design flaw in ctypes with a small python module
17 > >> written in C which provides thin wrappers to toupper() and tolower(),
18 > >> and only fall back on the current ctypes-based check when this module
19 > >> is not available.
20 > >>
21 > >> This has been tested on glibc, uClibc and musl systems.
22 > >>
23 > >> X-Gentoo-bug: 571444
24 > >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444
25 > >>
26 > >> Signed-off-by: Anthony G. Basile <blueness@g.o>
27 > >> ---
28 > >> pym/portage/util/locale.py | 32 ++++++++++-----
29 > >> setup.py | 6 ++-
30 > >> src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++
31 > >> 3 files changed, 121 insertions(+), 11 deletions(-)
32 > >> create mode 100644 src/portage_c_convert_case.c
33 > >>
34 > >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py
35 > >> index 2a15ea1..85ddd2b 100644
36 > >> --- a/pym/portage/util/locale.py
37 > >> +++ b/pym/portage/util/locale.py
38 > >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals
39 > >> import locale
40 > >> import logging
41 > >> import os
42 > >> +import sys
43 > >> import textwrap
44 > >> import traceback
45 > >>
46 > >> @@ -34,18 +35,26 @@ def _check_locale(silent):
47 > >> """
48 > >> The inner locale check function.
49 > >> """
50 > >> -
51 > >> - libc_fn = find_library("c")
52 > >> - if libc_fn is None:
53 > >> - return None
54 > >> - libc = LoadLibrary(libc_fn)
55 > >> - if libc is None:
56 > >> - return None
57 > >> + try:
58 > >> + from portage_c_convert_case import _c_toupper, _c_tolower
59 > >> + libc_tolower = _c_tolower
60 > >> + libc_toupper = _c_toupper
61 > >
62 > > Now I'm being picky... but if you named the functions toupper()
63 > > and tolower(), you could actually import the whole module as 'libc'
64 > > and have less code!
65 >
66 > I see what you're saying, and its tempting because its elegant, but I'm
67 > afraid of a clash of names. I've got a bad feeling this will get us
68 > into trouble later.
69 >
70 > Let me play with this and see what happens.
71
72 I don't think this will be problematic since things like this happen
73 in Python all the time ;-). And after all, C function names can be
74 different than Python function names.
75
76 > > Also it would be nice to actually make the module more generic. There
77 > > are more places where we use CDLL, and all of them could eventually be
78 > > supported by the module (unshare() would be much better done in C, for
79 > > example).
80 >
81 > Yeah I get your point here. Let me convince myself first.
82
83 I've got a killer argument: right now we hardcode constants from Linux
84 headers in the Python code!
85
86 Not that I'm asking you to actually add code for that as well. Just
87 rename the module to something more generic like portage.util.libc ;-).
88
89 > >> + except ImportError:
90 > >> + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n",
91 > >> + level=logging.WARNING, noiselevel=-1)
92 > >
93 > > Do we really want to warn verbosely about this? I think it'd be
94 > > a pretty common case for people running the git checkout.
95 >
96 > This should stay. Its good to know that the module is not being
97 > imported and silently falling back on the ctypes stuff.
98 >
99 > 1) its only going to happen in the rare occasion that you're using
100 > something like a turkish locale and can't import the module.
101
102 Wrong. This happens before the check is done, so it will be output
103 every time Portage is started, also with good locale.
104
105 > 2) people who do a git checkout should add
106 > PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module.
107 > I can add something to testpath. Users will have to be instructed to
108 > run `./setup build` and then the script shoudl read something like this
109 >
110 > unamem=$(uname -m)
111 >
112 > pythonversion=$(python --version 2>&1 | cut -c8-)
113 > pythonversion=${pythonversion%\.*}
114 >
115 > portagedir=$(dirname ${BASH_SOURCE[0]})
116 >
117 > export PATH="${portagedir}/bin:${PATH}"
118 >
119 > export
120 > PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}"
121 >
122 > export PYTHONWARNINGS=d,i::ImportWarning
123 >
124 >
125 > BTW, the original code must have a bug in it. It reads
126 >
127 > export PYTHONPATH=PYTHONPATH="$(dirname
128 > $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}"
129 >
130 > The double PYTHONPATH=PYTHONPATH= can't be right.
131
132 You are probably right. However:
133
134 1. Since bin/ scripts are setting PYTHONPATH appropriately, you should
135 probably update all of them as well to work out-of-the-box.
136
137 2. Don't do bash hackery and guessing. Take correct paths directly out
138 of Python modules (distutils should provide necessary means to get
139 the defaults).
140
141 3. Remember that something.cfg can be used to override those paths.
142 Though if you use distutils well enough, it should be able to figure
143 that out for oyu.
144
145 > >> + libc_fn = find_library("c")
146 > >> + if libc_fn is None:
147 > >> + return None
148 > >> + libc = LoadLibrary(libc_fn)
149 > >> + if libc is None:
150 > >> + return None
151 > >> + libc_tolower = libc.tolower
152 > >> + libc_toupper = libc.toupper
153 > >>
154 > >> lc = list(range(ord('a'), ord('z')+1))
155 > >> uc = list(range(ord('A'), ord('Z')+1))
156 > >> - rlc = [libc.tolower(c) for c in uc]
157 > >> - ruc = [libc.toupper(c) for c in lc]
158 > >> + rlc = [libc_tolower(c) for c in uc]
159 > >> + ruc = [libc_toupper(c) for c in lc]
160 > >>
161 > >> if lc != rlc or uc != ruc:
162 > >> if silent:
163 > >> @@ -62,7 +71,10 @@ def _check_locale(silent):
164 > >> "as LC_CTYPE in make.conf.")
165 > >> msg = [l for l in textwrap.wrap(msg, 70)]
166 > >> msg.append("")
167 > >> - chars = lambda l: ''.join(chr(x) for x in l)
168 > >> + if sys.version_info.major >= 3:
169 > >
170 > > Portage uses hexversion for comparisons. Please be consistent.
171 >
172 > I see that. Yuck, why use sys.hexversion? version_info is easier to
173 > read. Anyhow, I'll change it because consistency is important.
174
175 1. I think old versions of Python did not support named properties in
176 sys.version_info, back when Portage was written.
177
178 2. hexversion makes it easy to quickly check for specific versions like
179 3.2 (yep, Portage does that in some places because Python upstream
180 can't keep API sane).
181
182 > >> + chars = lambda l: ''.join(chr(x) for x in l)
183 > >> + else:
184 > >> + chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l)
185 > >
186 > > This looks like an unrelated change. Was the original code buggy? If
187 > > this is the case, then please fix it in a separate commit.
188 >
189 > Yes the original code is buggy. Either via the module or without, if x
190 > is outside the ascii range, chr(x) fails to convert it properly and
191 > throws an error. This code was not tested:
192 >
193 > Traceback (most recent call last):
194 > File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
195 > 124, in check_locale
196 > ret = _check_locale(silent)
197 > File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
198 > 81, in _check_locale
199 > " %s -> %s" % (chars(lc), chars(ruc)),
200 > File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line
201 > 74, in <lambda>
202 > chars = lambda l: ''.join(chr(x) for x in l)
203 > UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0:
204 > ordinal not in range(128)
205 >
206 > I'll separate out that commit.
207
208 Thanks. Also please try if you can get a single code path for py2/py3.
209 You may want to take a look at portage.util for decoding/encoding
210 wrappers.
211
212 > >> if uc != ruc:
213 > >> msg.extend([
214 > >> " %s -> %s" % (chars(lc), chars(ruc)),
215 > >> diff --git a/setup.py b/setup.py
216 > >> index 25429bc..8b6b408 100755
217 > >> --- a/setup.py
218 > >> +++ b/setup.py
219 > >> @@ -47,7 +47,11 @@ x_scripts = {
220 > >> # Dictionary custom modules written in C/C++ here. The structure is
221 > >> # key = module name
222 > >> # value = list of C/C++ source code, path relative to top source directory
223 > >> -x_c_helpers = {}
224 > >> +x_c_helpers = {
225 > >> + 'portage_c_convert_case' : [
226 > >> + 'src/portage_c_convert_case.c',
227 > >> + ],
228 > >> +}
229 > >>
230 > >> class x_build(build):
231 > >> """ Build command with extra build_man call. """
232 > >> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c
233 > >> new file mode 100644
234 > >> index 0000000..f60b0c2
235 > >> --- /dev/null
236 > >> +++ b/src/portage_c_convert_case.c
237 > >> @@ -0,0 +1,94 @@
238 > >> +/* Copyright 2005-2016 Gentoo Foundation
239 > >> + * Distributed under the terms of the GNU General Public License v2
240 > >> + */
241 > >> +
242 > >> +#include <Python.h>
243 > >> +#include <ctype.h>
244 > >> +
245 > >> +static PyObject * portage_c_tolower(PyObject *, PyObject *);
246 > >> +static PyObject * portage_c_toupper(PyObject *, PyObject *);
247 > >> +
248 > >> +static PyMethodDef ConvertCaseMethods[] = {
249 > >> + {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."},
250 > >> + {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."},
251 > >> + {NULL, NULL, 0, NULL}
252 > >
253 > > You should include stdlib.h or stdio.h for NULL.
254 >
255 > yep.
256 >
257 > >
258 > >> +};
259 > >> +
260 > >> +#if PY_MAJOR_VERSION >= 3
261 > >> +static struct PyModuleDef moduledef = {
262 > >> + PyModuleDef_HEAD_INIT,
263 > >> + "portage_c_convert_case", /* m_name */
264 > >> + "Module for converting case using the system locale", /* m_doc */
265 > >> + -1, /* m_size */
266 > >> + ConvertCaseMethods, /* m_methods */
267 > >> + NULL, /* m_reload */
268 > >> + NULL, /* m_traverse */
269 > >> + NULL, /* m_clear */
270 > >> + NULL, /* m_free */
271 > >> +};
272 > >> +#endif
273 > >> +
274 > >> +static PyObject *ConvertCaseError;
275 > >> +
276 > >> +PyMODINIT_FUNC
277 > >> +#if PY_MAJOR_VERSION >= 3
278 > >> +PyInit_portage_c_convert_case(void)
279 > >> +#else
280 > >> +initportage_c_convert_case(void)
281 > >> +#endif
282 > >> +{
283 > >> + PyObject *m;
284 > >> +
285 > >> +#if PY_MAJOR_VERSION >= 3
286 > >> + m = PyModule_Create(&moduledef);
287 > >> +#else
288 > >> + m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods);
289 > >> +#endif
290 > >> +
291 > >> + if (m == NULL)
292 > >> +#if PY_MAJOR_VERSION >= 3
293 > >> + return NULL;
294 > >> +#else
295 > >> + return;
296 > >> +#endif
297 > >> +
298 > >> + ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL);
299 > >> + Py_INCREF(ConvertCaseError);
300 > >> + PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError);
301 > >> +
302 > >> +#if PY_MAJOR_VERSION >= 3
303 > >> + return m;
304 > >> +#else
305 > >> + return;
306 > >> +#endif
307 > >> +}
308 > >
309 > > To be honest, I think we'd be happier having two big #ifdefs for init
310 > > funcs rather than one function with a lot of #ifdefs, and the common
311 > > ConvertCaseError part in a separate static function.
312 >
313 > I can move things around and reduce the #ifdefs, especially if we drop
314 > the custom exception.
315
316 Thanks.
317
318 > >> +
319 > >> +
320 > >> +static PyObject *
321 > >> +portage_c_tolower(PyObject *self, PyObject *args)
322 > >> +{
323 > >> + int c;
324 > >> +
325 > >> + if (!PyArg_ParseTuple(args, "i", &c))
326 > >> + {
327 > >> + PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed");
328 > >
329 > > From PyArg_ParseTuple() [1]:
330 > >
331 > > | on failure, it returns false and raises the appropriate exception.
332 > >
333 > > So I don't think you need or should use a custom exception here.
334 > >
335 > > [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple
336 >
337 > I am aware, but then you loose the specificity of the error, whether it
338 > occurred in _c_tolower or _c_toupper. Its small, so I'll remove it
339 > because it also cleans up the #ifdef spaghetti, but if we were to expand
340 > this module for other libc functions then we should consider re-adding it.
341
342 Are you really sure about that? I'd rather see what kind of wrong
343 argument has been passed rather than 'PyArg_ParseTuple failed' which
344 tells me nothing.
345
346 > >> + return NULL;
347 > >> + }
348 > >> +
349 > >> + return Py_BuildValue("i", tolower(c));
350 > >> +}
351 > >> +
352 > >> +
353 > >> +static PyObject *
354 > >> +portage_c_toupper(PyObject *self, PyObject *args)
355 > >> +{
356 > >> + int c;
357 > >> +
358 > >> + if (!PyArg_ParseTuple(args, "i", &c))
359 > >> + {
360 > >> + PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed");
361 > >> + return NULL;
362 > >> + }
363 > >> +
364 > >> + return Py_BuildValue("i", toupper(c));
365 > >> +}
366 > >
367 > > Thanks a lot for your effort. This is really getting in shape.
368 >
369 > I'm doing this because check_locale() broke portage on musl. I'm not a
370 > frequent contributor to portage so I'd ask the portage team to free free
371 > style it up for me. vapier does that to my code all the time :)
372
373 --
374 Best regards,
375 Michał Górny
376 <http://dev.gentoo.org/~mgorny/>

Replies

Subject Author
Re: [gentoo-portage-dev] [PATCH 2/2] pym/portage/util/locale.py: add a C module to help check locale "Anthony G. Basile" <basile@××××××××××××××.edu>