1 |
On 5/23/16 2:44 AM, Michał Górny wrote: |
2 |
> On Sun, 22 May 2016 13:04:40 -0400 |
3 |
> "Anthony G. Basile" <basile@××××××××××××××.edu> wrote: |
4 |
> |
5 |
>> From: "Anthony G. Basile" <blueness@g.o> |
6 |
>> |
7 |
>> The current method to check for a sane system locale is to use python's |
8 |
>> ctypes.util.find_library() to construct a full library path to the |
9 |
>> system libc.so and pass that path to ctypes.CDLL() so we can call |
10 |
>> toupper() and tolower() directly. However, this gets bogged down in |
11 |
>> implementation details and fails with musl. |
12 |
>> |
13 |
>> We work around this design flaw in ctypes with a small python module |
14 |
>> written in C which provides thin wrappers to toupper() and tolower(), |
15 |
>> and only fall back on the current ctypes-based check when this module |
16 |
>> is not available. |
17 |
>> |
18 |
>> This has been tested on glibc, uClibc and musl systems. |
19 |
>> |
20 |
>> X-Gentoo-bug: 571444 |
21 |
>> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 |
22 |
>> |
23 |
>> Signed-off-by: Anthony G. Basile <blueness@g.o> |
24 |
>> --- |
25 |
>> pym/portage/util/locale.py | 32 ++++++++++----- |
26 |
>> setup.py | 6 ++- |
27 |
>> src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++ |
28 |
>> 3 files changed, 121 insertions(+), 11 deletions(-) |
29 |
>> create mode 100644 src/portage_c_convert_case.c |
30 |
>> |
31 |
>> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py |
32 |
>> index 2a15ea1..85ddd2b 100644 |
33 |
>> --- a/pym/portage/util/locale.py |
34 |
>> +++ b/pym/portage/util/locale.py |
35 |
>> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals |
36 |
>> import locale |
37 |
>> import logging |
38 |
>> import os |
39 |
>> +import sys |
40 |
>> import textwrap |
41 |
>> import traceback |
42 |
>> |
43 |
>> @@ -34,18 +35,26 @@ def _check_locale(silent): |
44 |
>> """ |
45 |
>> The inner locale check function. |
46 |
>> """ |
47 |
>> - |
48 |
>> - libc_fn = find_library("c") |
49 |
>> - if libc_fn is None: |
50 |
>> - return None |
51 |
>> - libc = LoadLibrary(libc_fn) |
52 |
>> - if libc is None: |
53 |
>> - return None |
54 |
>> + try: |
55 |
>> + from portage_c_convert_case import _c_toupper, _c_tolower |
56 |
>> + libc_tolower = _c_tolower |
57 |
>> + libc_toupper = _c_toupper |
58 |
> |
59 |
> Now I'm being picky... but if you named the functions toupper() |
60 |
> and tolower(), you could actually import the whole module as 'libc' |
61 |
> and have less code! |
62 |
|
63 |
I see what you're saying, and its tempting because its elegant, but I'm |
64 |
afraid of a clash of names. I've got a bad feeling this will get us |
65 |
into trouble later. |
66 |
|
67 |
Let me play with this and see what happens. |
68 |
|
69 |
> |
70 |
> Also it would be nice to actually make the module more generic. There |
71 |
> are more places where we use CDLL, and all of them could eventually be |
72 |
> supported by the module (unshare() would be much better done in C, for |
73 |
> example). |
74 |
|
75 |
Yeah I get your point here. Let me convince myself first. |
76 |
|
77 |
> |
78 |
>> + except ImportError: |
79 |
>> + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n", |
80 |
>> + level=logging.WARNING, noiselevel=-1) |
81 |
> |
82 |
> Do we really want to warn verbosely about this? I think it'd be |
83 |
> a pretty common case for people running the git checkout. |
84 |
|
85 |
This should stay. Its good to know that the module is not being |
86 |
imported and silently falling back on the ctypes stuff. |
87 |
|
88 |
1) its only going to happen in the rare occasion that you're using |
89 |
something like a turkish locale and can't import the module. |
90 |
|
91 |
2) people who do a git checkout should add |
92 |
PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. |
93 |
I can add something to testpath. Users will have to be instructed to |
94 |
run `./setup build` and then the script shoudl read something like this |
95 |
|
96 |
unamem=$(uname -m) |
97 |
|
98 |
pythonversion=$(python --version 2>&1 | cut -c8-) |
99 |
pythonversion=${pythonversion%\.*} |
100 |
|
101 |
portagedir=$(dirname ${BASH_SOURCE[0]}) |
102 |
|
103 |
export PATH="${portagedir}/bin:${PATH}" |
104 |
|
105 |
export |
106 |
PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" |
107 |
|
108 |
export PYTHONWARNINGS=d,i::ImportWarning |
109 |
|
110 |
|
111 |
BTW, the original code must have a bug in it. It reads |
112 |
|
113 |
export PYTHONPATH=PYTHONPATH="$(dirname |
114 |
$BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}" |
115 |
|
116 |
The double PYTHONPATH=PYTHONPATH= can't be right. |
117 |
|
118 |
> |
119 |
>> + libc_fn = find_library("c") |
120 |
>> + if libc_fn is None: |
121 |
>> + return None |
122 |
>> + libc = LoadLibrary(libc_fn) |
123 |
>> + if libc is None: |
124 |
>> + return None |
125 |
>> + libc_tolower = libc.tolower |
126 |
>> + libc_toupper = libc.toupper |
127 |
>> |
128 |
>> lc = list(range(ord('a'), ord('z')+1)) |
129 |
>> uc = list(range(ord('A'), ord('Z')+1)) |
130 |
>> - rlc = [libc.tolower(c) for c in uc] |
131 |
>> - ruc = [libc.toupper(c) for c in lc] |
132 |
>> + rlc = [libc_tolower(c) for c in uc] |
133 |
>> + ruc = [libc_toupper(c) for c in lc] |
134 |
>> |
135 |
>> if lc != rlc or uc != ruc: |
136 |
>> if silent: |
137 |
>> @@ -62,7 +71,10 @@ def _check_locale(silent): |
138 |
>> "as LC_CTYPE in make.conf.") |
139 |
>> msg = [l for l in textwrap.wrap(msg, 70)] |
140 |
>> msg.append("") |
141 |
>> - chars = lambda l: ''.join(chr(x) for x in l) |
142 |
>> + if sys.version_info.major >= 3: |
143 |
> |
144 |
> Portage uses hexversion for comparisons. Please be consistent. |
145 |
|
146 |
I see that. Yuck, why use sys.hexversion? version_info is easier to |
147 |
read. Anyhow, I'll change it because consistency is important. |
148 |
|
149 |
> |
150 |
>> + chars = lambda l: ''.join(chr(x) for x in l) |
151 |
>> + else: |
152 |
>> + chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l) |
153 |
> |
154 |
> This looks like an unrelated change. Was the original code buggy? If |
155 |
> this is the case, then please fix it in a separate commit. |
156 |
|
157 |
Yes the original code is buggy. Either via the module or without, if x |
158 |
is outside the ascii range, chr(x) fails to convert it properly and |
159 |
throws an error. This code was not tested: |
160 |
|
161 |
Traceback (most recent call last): |
162 |
File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
163 |
124, in check_locale |
164 |
ret = _check_locale(silent) |
165 |
File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
166 |
81, in _check_locale |
167 |
" %s -> %s" % (chars(lc), chars(ruc)), |
168 |
File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
169 |
74, in <lambda> |
170 |
chars = lambda l: ''.join(chr(x) for x in l) |
171 |
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0: |
172 |
ordinal not in range(128) |
173 |
|
174 |
I'll separate out that commit. |
175 |
|
176 |
> |
177 |
>> if uc != ruc: |
178 |
>> msg.extend([ |
179 |
>> " %s -> %s" % (chars(lc), chars(ruc)), |
180 |
>> diff --git a/setup.py b/setup.py |
181 |
>> index 25429bc..8b6b408 100755 |
182 |
>> --- a/setup.py |
183 |
>> +++ b/setup.py |
184 |
>> @@ -47,7 +47,11 @@ x_scripts = { |
185 |
>> # Dictionary custom modules written in C/C++ here. The structure is |
186 |
>> # key = module name |
187 |
>> # value = list of C/C++ source code, path relative to top source directory |
188 |
>> -x_c_helpers = {} |
189 |
>> +x_c_helpers = { |
190 |
>> + 'portage_c_convert_case' : [ |
191 |
>> + 'src/portage_c_convert_case.c', |
192 |
>> + ], |
193 |
>> +} |
194 |
>> |
195 |
>> class x_build(build): |
196 |
>> """ Build command with extra build_man call. """ |
197 |
>> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c |
198 |
>> new file mode 100644 |
199 |
>> index 0000000..f60b0c2 |
200 |
>> --- /dev/null |
201 |
>> +++ b/src/portage_c_convert_case.c |
202 |
>> @@ -0,0 +1,94 @@ |
203 |
>> +/* Copyright 2005-2016 Gentoo Foundation |
204 |
>> + * Distributed under the terms of the GNU General Public License v2 |
205 |
>> + */ |
206 |
>> + |
207 |
>> +#include <Python.h> |
208 |
>> +#include <ctype.h> |
209 |
>> + |
210 |
>> +static PyObject * portage_c_tolower(PyObject *, PyObject *); |
211 |
>> +static PyObject * portage_c_toupper(PyObject *, PyObject *); |
212 |
>> + |
213 |
>> +static PyMethodDef ConvertCaseMethods[] = { |
214 |
>> + {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."}, |
215 |
>> + {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."}, |
216 |
>> + {NULL, NULL, 0, NULL} |
217 |
> |
218 |
> You should include stdlib.h or stdio.h for NULL. |
219 |
|
220 |
yep. |
221 |
|
222 |
> |
223 |
>> +}; |
224 |
>> + |
225 |
>> +#if PY_MAJOR_VERSION >= 3 |
226 |
>> +static struct PyModuleDef moduledef = { |
227 |
>> + PyModuleDef_HEAD_INIT, |
228 |
>> + "portage_c_convert_case", /* m_name */ |
229 |
>> + "Module for converting case using the system locale", /* m_doc */ |
230 |
>> + -1, /* m_size */ |
231 |
>> + ConvertCaseMethods, /* m_methods */ |
232 |
>> + NULL, /* m_reload */ |
233 |
>> + NULL, /* m_traverse */ |
234 |
>> + NULL, /* m_clear */ |
235 |
>> + NULL, /* m_free */ |
236 |
>> +}; |
237 |
>> +#endif |
238 |
>> + |
239 |
>> +static PyObject *ConvertCaseError; |
240 |
>> + |
241 |
>> +PyMODINIT_FUNC |
242 |
>> +#if PY_MAJOR_VERSION >= 3 |
243 |
>> +PyInit_portage_c_convert_case(void) |
244 |
>> +#else |
245 |
>> +initportage_c_convert_case(void) |
246 |
>> +#endif |
247 |
>> +{ |
248 |
>> + PyObject *m; |
249 |
>> + |
250 |
>> +#if PY_MAJOR_VERSION >= 3 |
251 |
>> + m = PyModule_Create(&moduledef); |
252 |
>> +#else |
253 |
>> + m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods); |
254 |
>> +#endif |
255 |
>> + |
256 |
>> + if (m == NULL) |
257 |
>> +#if PY_MAJOR_VERSION >= 3 |
258 |
>> + return NULL; |
259 |
>> +#else |
260 |
>> + return; |
261 |
>> +#endif |
262 |
>> + |
263 |
>> + ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL); |
264 |
>> + Py_INCREF(ConvertCaseError); |
265 |
>> + PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError); |
266 |
>> + |
267 |
>> +#if PY_MAJOR_VERSION >= 3 |
268 |
>> + return m; |
269 |
>> +#else |
270 |
>> + return; |
271 |
>> +#endif |
272 |
>> +} |
273 |
> |
274 |
> To be honest, I think we'd be happier having two big #ifdefs for init |
275 |
> funcs rather than one function with a lot of #ifdefs, and the common |
276 |
> ConvertCaseError part in a separate static function. |
277 |
|
278 |
I can move things around and reduce the #ifdefs, especially if we drop |
279 |
the custom exception. |
280 |
|
281 |
> |
282 |
>> + |
283 |
>> + |
284 |
>> +static PyObject * |
285 |
>> +portage_c_tolower(PyObject *self, PyObject *args) |
286 |
>> +{ |
287 |
>> + int c; |
288 |
>> + |
289 |
>> + if (!PyArg_ParseTuple(args, "i", &c)) |
290 |
>> + { |
291 |
>> + PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed"); |
292 |
> |
293 |
> From PyArg_ParseTuple() [1]: |
294 |
> |
295 |
> | on failure, it returns false and raises the appropriate exception. |
296 |
> |
297 |
> So I don't think you need or should use a custom exception here. |
298 |
> |
299 |
> [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple |
300 |
|
301 |
I am aware, but then you loose the specificity of the error, whether it |
302 |
occurred in _c_tolower or _c_toupper. Its small, so I'll remove it |
303 |
because it also cleans up the #ifdef spaghetti, but if we were to expand |
304 |
this module for other libc functions then we should consider re-adding it. |
305 |
|
306 |
> |
307 |
>> + return NULL; |
308 |
>> + } |
309 |
>> + |
310 |
>> + return Py_BuildValue("i", tolower(c)); |
311 |
>> +} |
312 |
>> + |
313 |
>> + |
314 |
>> +static PyObject * |
315 |
>> +portage_c_toupper(PyObject *self, PyObject *args) |
316 |
>> +{ |
317 |
>> + int c; |
318 |
>> + |
319 |
>> + if (!PyArg_ParseTuple(args, "i", &c)) |
320 |
>> + { |
321 |
>> + PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed"); |
322 |
>> + return NULL; |
323 |
>> + } |
324 |
>> + |
325 |
>> + return Py_BuildValue("i", toupper(c)); |
326 |
>> +} |
327 |
> |
328 |
> Thanks a lot for your effort. This is really getting in shape. |
329 |
|
330 |
I'm doing this because check_locale() broke portage on musl. I'm not a |
331 |
frequent contributor to portage so I'd ask the portage team to free free |
332 |
style it up for me. vapier does that to my code all the time :) |
333 |
|
334 |
|
335 |
-- |
336 |
Anthony G. Basile, Ph. D. |
337 |
Chair of Information Technology |
338 |
D'Youville College |
339 |
Buffalo, NY 14201 |
340 |
(716) 829-8197 |