1 |
On Mon, 23 May 2016 08:08:18 -0400 |
2 |
"Anthony G. Basile" <basile@××××××××××××××.edu> wrote: |
3 |
|
4 |
> On 5/23/16 2:44 AM, Michał Górny wrote: |
5 |
> > On Sun, 22 May 2016 13:04:40 -0400 |
6 |
> > "Anthony G. Basile" <basile@××××××××××××××.edu> wrote: |
7 |
> > |
8 |
> >> From: "Anthony G. Basile" <blueness@g.o> |
9 |
> >> |
10 |
> >> The current method to check for a sane system locale is to use python's |
11 |
> >> ctypes.util.find_library() to construct a full library path to the |
12 |
> >> system libc.so and pass that path to ctypes.CDLL() so we can call |
13 |
> >> toupper() and tolower() directly. However, this gets bogged down in |
14 |
> >> implementation details and fails with musl. |
15 |
> >> |
16 |
> >> We work around this design flaw in ctypes with a small python module |
17 |
> >> written in C which provides thin wrappers to toupper() and tolower(), |
18 |
> >> and only fall back on the current ctypes-based check when this module |
19 |
> >> is not available. |
20 |
> >> |
21 |
> >> This has been tested on glibc, uClibc and musl systems. |
22 |
> >> |
23 |
> >> X-Gentoo-bug: 571444 |
24 |
> >> X-Gentoo-bug-url: https://bugs.gentoo.org/show_bug.cgi?id=571444 |
25 |
> >> |
26 |
> >> Signed-off-by: Anthony G. Basile <blueness@g.o> |
27 |
> >> --- |
28 |
> >> pym/portage/util/locale.py | 32 ++++++++++----- |
29 |
> >> setup.py | 6 ++- |
30 |
> >> src/portage_c_convert_case.c | 94 ++++++++++++++++++++++++++++++++++++++++++++ |
31 |
> >> 3 files changed, 121 insertions(+), 11 deletions(-) |
32 |
> >> create mode 100644 src/portage_c_convert_case.c |
33 |
> >> |
34 |
> >> diff --git a/pym/portage/util/locale.py b/pym/portage/util/locale.py |
35 |
> >> index 2a15ea1..85ddd2b 100644 |
36 |
> >> --- a/pym/portage/util/locale.py |
37 |
> >> +++ b/pym/portage/util/locale.py |
38 |
> >> @@ -11,6 +11,7 @@ from __future__ import absolute_import, unicode_literals |
39 |
> >> import locale |
40 |
> >> import logging |
41 |
> >> import os |
42 |
> >> +import sys |
43 |
> >> import textwrap |
44 |
> >> import traceback |
45 |
> >> |
46 |
> >> @@ -34,18 +35,26 @@ def _check_locale(silent): |
47 |
> >> """ |
48 |
> >> The inner locale check function. |
49 |
> >> """ |
50 |
> >> - |
51 |
> >> - libc_fn = find_library("c") |
52 |
> >> - if libc_fn is None: |
53 |
> >> - return None |
54 |
> >> - libc = LoadLibrary(libc_fn) |
55 |
> >> - if libc is None: |
56 |
> >> - return None |
57 |
> >> + try: |
58 |
> >> + from portage_c_convert_case import _c_toupper, _c_tolower |
59 |
> >> + libc_tolower = _c_tolower |
60 |
> >> + libc_toupper = _c_toupper |
61 |
> > |
62 |
> > Now I'm being picky... but if you named the functions toupper() |
63 |
> > and tolower(), you could actually import the whole module as 'libc' |
64 |
> > and have less code! |
65 |
> |
66 |
> I see what you're saying, and its tempting because its elegant, but I'm |
67 |
> afraid of a clash of names. I've got a bad feeling this will get us |
68 |
> into trouble later. |
69 |
> |
70 |
> Let me play with this and see what happens. |
71 |
|
72 |
I don't think this will be problematic since things like this happen |
73 |
in Python all the time ;-). And after all, C function names can be |
74 |
different than Python function names. |
75 |
|
76 |
> > Also it would be nice to actually make the module more generic. There |
77 |
> > are more places where we use CDLL, and all of them could eventually be |
78 |
> > supported by the module (unshare() would be much better done in C, for |
79 |
> > example). |
80 |
> |
81 |
> Yeah I get your point here. Let me convince myself first. |
82 |
|
83 |
I've got a killer argument: right now we hardcode constants from Linux |
84 |
headers in the Python code! |
85 |
|
86 |
Not that I'm asking you to actually add code for that as well. Just |
87 |
rename the module to something more generic like portage.util.libc ;-). |
88 |
|
89 |
> >> + except ImportError: |
90 |
> >> + writemsg_level("!!! Unable to import portage_c_convert_case\n!!!\n", |
91 |
> >> + level=logging.WARNING, noiselevel=-1) |
92 |
> > |
93 |
> > Do we really want to warn verbosely about this? I think it'd be |
94 |
> > a pretty common case for people running the git checkout. |
95 |
> |
96 |
> This should stay. Its good to know that the module is not being |
97 |
> imported and silently falling back on the ctypes stuff. |
98 |
> |
99 |
> 1) its only going to happen in the rare occasion that you're using |
100 |
> something like a turkish locale and can't import the module. |
101 |
|
102 |
Wrong. This happens before the check is done, so it will be output |
103 |
every time Portage is started, also with good locale. |
104 |
|
105 |
> 2) people who do a git checkout should add |
106 |
> PYTHONPATH=build/lib.linux-x86_64-3.4 to their env to test the module. |
107 |
> I can add something to testpath. Users will have to be instructed to |
108 |
> run `./setup build` and then the script shoudl read something like this |
109 |
> |
110 |
> unamem=$(uname -m) |
111 |
> |
112 |
> pythonversion=$(python --version 2>&1 | cut -c8-) |
113 |
> pythonversion=${pythonversion%\.*} |
114 |
> |
115 |
> portagedir=$(dirname ${BASH_SOURCE[0]}) |
116 |
> |
117 |
> export PATH="${portagedir}/bin:${PATH}" |
118 |
> |
119 |
> export |
120 |
> PYTHONPATH="${portagedir}/build/lib.linux-${unamem}-${pythonversion}:${portagedir}/pym:${PYTHONPATH:+:}${PYTHONPATH}" |
121 |
> |
122 |
> export PYTHONWARNINGS=d,i::ImportWarning |
123 |
> |
124 |
> |
125 |
> BTW, the original code must have a bug in it. It reads |
126 |
> |
127 |
> export PYTHONPATH=PYTHONPATH="$(dirname |
128 |
> $BASH_SOURCE[0])/pym:${PYTHONPATH:+:}${PYTHONPATH}" |
129 |
> |
130 |
> The double PYTHONPATH=PYTHONPATH= can't be right. |
131 |
|
132 |
You are probably right. However: |
133 |
|
134 |
1. Since bin/ scripts are setting PYTHONPATH appropriately, you should |
135 |
probably update all of them as well to work out-of-the-box. |
136 |
|
137 |
2. Don't do bash hackery and guessing. Take correct paths directly out |
138 |
of Python modules (distutils should provide necessary means to get |
139 |
the defaults). |
140 |
|
141 |
3. Remember that something.cfg can be used to override those paths. |
142 |
Though if you use distutils well enough, it should be able to figure |
143 |
that out for oyu. |
144 |
|
145 |
> >> + libc_fn = find_library("c") |
146 |
> >> + if libc_fn is None: |
147 |
> >> + return None |
148 |
> >> + libc = LoadLibrary(libc_fn) |
149 |
> >> + if libc is None: |
150 |
> >> + return None |
151 |
> >> + libc_tolower = libc.tolower |
152 |
> >> + libc_toupper = libc.toupper |
153 |
> >> |
154 |
> >> lc = list(range(ord('a'), ord('z')+1)) |
155 |
> >> uc = list(range(ord('A'), ord('Z')+1)) |
156 |
> >> - rlc = [libc.tolower(c) for c in uc] |
157 |
> >> - ruc = [libc.toupper(c) for c in lc] |
158 |
> >> + rlc = [libc_tolower(c) for c in uc] |
159 |
> >> + ruc = [libc_toupper(c) for c in lc] |
160 |
> >> |
161 |
> >> if lc != rlc or uc != ruc: |
162 |
> >> if silent: |
163 |
> >> @@ -62,7 +71,10 @@ def _check_locale(silent): |
164 |
> >> "as LC_CTYPE in make.conf.") |
165 |
> >> msg = [l for l in textwrap.wrap(msg, 70)] |
166 |
> >> msg.append("") |
167 |
> >> - chars = lambda l: ''.join(chr(x) for x in l) |
168 |
> >> + if sys.version_info.major >= 3: |
169 |
> > |
170 |
> > Portage uses hexversion for comparisons. Please be consistent. |
171 |
> |
172 |
> I see that. Yuck, why use sys.hexversion? version_info is easier to |
173 |
> read. Anyhow, I'll change it because consistency is important. |
174 |
|
175 |
1. I think old versions of Python did not support named properties in |
176 |
sys.version_info, back when Portage was written. |
177 |
|
178 |
2. hexversion makes it easy to quickly check for specific versions like |
179 |
3.2 (yep, Portage does that in some places because Python upstream |
180 |
can't keep API sane). |
181 |
|
182 |
> >> + chars = lambda l: ''.join(chr(x) for x in l) |
183 |
> >> + else: |
184 |
> >> + chars = lambda l: ''.join(chr(x).decode('utf-8', 'replace') for x in l) |
185 |
> > |
186 |
> > This looks like an unrelated change. Was the original code buggy? If |
187 |
> > this is the case, then please fix it in a separate commit. |
188 |
> |
189 |
> Yes the original code is buggy. Either via the module or without, if x |
190 |
> is outside the ascii range, chr(x) fails to convert it properly and |
191 |
> throws an error. This code was not tested: |
192 |
> |
193 |
> Traceback (most recent call last): |
194 |
> File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
195 |
> 124, in check_locale |
196 |
> ret = _check_locale(silent) |
197 |
> File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
198 |
> 81, in _check_locale |
199 |
> " %s -> %s" % (chars(lc), chars(ruc)), |
200 |
> File "/usr/lib64/python2.7/site-packages/portage/util/locale.py", line |
201 |
> 74, in <lambda> |
202 |
> chars = lambda l: ''.join(chr(x) for x in l) |
203 |
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xdd in position 0: |
204 |
> ordinal not in range(128) |
205 |
> |
206 |
> I'll separate out that commit. |
207 |
|
208 |
Thanks. Also please try if you can get a single code path for py2/py3. |
209 |
You may want to take a look at portage.util for decoding/encoding |
210 |
wrappers. |
211 |
|
212 |
> >> if uc != ruc: |
213 |
> >> msg.extend([ |
214 |
> >> " %s -> %s" % (chars(lc), chars(ruc)), |
215 |
> >> diff --git a/setup.py b/setup.py |
216 |
> >> index 25429bc..8b6b408 100755 |
217 |
> >> --- a/setup.py |
218 |
> >> +++ b/setup.py |
219 |
> >> @@ -47,7 +47,11 @@ x_scripts = { |
220 |
> >> # Dictionary custom modules written in C/C++ here. The structure is |
221 |
> >> # key = module name |
222 |
> >> # value = list of C/C++ source code, path relative to top source directory |
223 |
> >> -x_c_helpers = {} |
224 |
> >> +x_c_helpers = { |
225 |
> >> + 'portage_c_convert_case' : [ |
226 |
> >> + 'src/portage_c_convert_case.c', |
227 |
> >> + ], |
228 |
> >> +} |
229 |
> >> |
230 |
> >> class x_build(build): |
231 |
> >> """ Build command with extra build_man call. """ |
232 |
> >> diff --git a/src/portage_c_convert_case.c b/src/portage_c_convert_case.c |
233 |
> >> new file mode 100644 |
234 |
> >> index 0000000..f60b0c2 |
235 |
> >> --- /dev/null |
236 |
> >> +++ b/src/portage_c_convert_case.c |
237 |
> >> @@ -0,0 +1,94 @@ |
238 |
> >> +/* Copyright 2005-2016 Gentoo Foundation |
239 |
> >> + * Distributed under the terms of the GNU General Public License v2 |
240 |
> >> + */ |
241 |
> >> + |
242 |
> >> +#include <Python.h> |
243 |
> >> +#include <ctype.h> |
244 |
> >> + |
245 |
> >> +static PyObject * portage_c_tolower(PyObject *, PyObject *); |
246 |
> >> +static PyObject * portage_c_toupper(PyObject *, PyObject *); |
247 |
> >> + |
248 |
> >> +static PyMethodDef ConvertCaseMethods[] = { |
249 |
> >> + {"_c_tolower", portage_c_tolower, METH_VARARGS, "Convert to lower case using system locale."}, |
250 |
> >> + {"_c_toupper", portage_c_toupper, METH_VARARGS, "Convert to upper case using system locale."}, |
251 |
> >> + {NULL, NULL, 0, NULL} |
252 |
> > |
253 |
> > You should include stdlib.h or stdio.h for NULL. |
254 |
> |
255 |
> yep. |
256 |
> |
257 |
> > |
258 |
> >> +}; |
259 |
> >> + |
260 |
> >> +#if PY_MAJOR_VERSION >= 3 |
261 |
> >> +static struct PyModuleDef moduledef = { |
262 |
> >> + PyModuleDef_HEAD_INIT, |
263 |
> >> + "portage_c_convert_case", /* m_name */ |
264 |
> >> + "Module for converting case using the system locale", /* m_doc */ |
265 |
> >> + -1, /* m_size */ |
266 |
> >> + ConvertCaseMethods, /* m_methods */ |
267 |
> >> + NULL, /* m_reload */ |
268 |
> >> + NULL, /* m_traverse */ |
269 |
> >> + NULL, /* m_clear */ |
270 |
> >> + NULL, /* m_free */ |
271 |
> >> +}; |
272 |
> >> +#endif |
273 |
> >> + |
274 |
> >> +static PyObject *ConvertCaseError; |
275 |
> >> + |
276 |
> >> +PyMODINIT_FUNC |
277 |
> >> +#if PY_MAJOR_VERSION >= 3 |
278 |
> >> +PyInit_portage_c_convert_case(void) |
279 |
> >> +#else |
280 |
> >> +initportage_c_convert_case(void) |
281 |
> >> +#endif |
282 |
> >> +{ |
283 |
> >> + PyObject *m; |
284 |
> >> + |
285 |
> >> +#if PY_MAJOR_VERSION >= 3 |
286 |
> >> + m = PyModule_Create(&moduledef); |
287 |
> >> +#else |
288 |
> >> + m = Py_InitModule("portage_c_convert_case", ConvertCaseMethods); |
289 |
> >> +#endif |
290 |
> >> + |
291 |
> >> + if (m == NULL) |
292 |
> >> +#if PY_MAJOR_VERSION >= 3 |
293 |
> >> + return NULL; |
294 |
> >> +#else |
295 |
> >> + return; |
296 |
> >> +#endif |
297 |
> >> + |
298 |
> >> + ConvertCaseError = PyErr_NewException("portage_c_convert_case.ConvertCaseError", NULL, NULL); |
299 |
> >> + Py_INCREF(ConvertCaseError); |
300 |
> >> + PyModule_AddObject(m, "ConvertCaseError", ConvertCaseError); |
301 |
> >> + |
302 |
> >> +#if PY_MAJOR_VERSION >= 3 |
303 |
> >> + return m; |
304 |
> >> +#else |
305 |
> >> + return; |
306 |
> >> +#endif |
307 |
> >> +} |
308 |
> > |
309 |
> > To be honest, I think we'd be happier having two big #ifdefs for init |
310 |
> > funcs rather than one function with a lot of #ifdefs, and the common |
311 |
> > ConvertCaseError part in a separate static function. |
312 |
> |
313 |
> I can move things around and reduce the #ifdefs, especially if we drop |
314 |
> the custom exception. |
315 |
|
316 |
Thanks. |
317 |
|
318 |
> >> + |
319 |
> >> + |
320 |
> >> +static PyObject * |
321 |
> >> +portage_c_tolower(PyObject *self, PyObject *args) |
322 |
> >> +{ |
323 |
> >> + int c; |
324 |
> >> + |
325 |
> >> + if (!PyArg_ParseTuple(args, "i", &c)) |
326 |
> >> + { |
327 |
> >> + PyErr_SetString(ConvertCaseError, "_c_tolower: PyArg_ParseTuple failed"); |
328 |
> > |
329 |
> > From PyArg_ParseTuple() [1]: |
330 |
> > |
331 |
> > | on failure, it returns false and raises the appropriate exception. |
332 |
> > |
333 |
> > So I don't think you need or should use a custom exception here. |
334 |
> > |
335 |
> > [1]:https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple |
336 |
> |
337 |
> I am aware, but then you loose the specificity of the error, whether it |
338 |
> occurred in _c_tolower or _c_toupper. Its small, so I'll remove it |
339 |
> because it also cleans up the #ifdef spaghetti, but if we were to expand |
340 |
> this module for other libc functions then we should consider re-adding it. |
341 |
|
342 |
Are you really sure about that? I'd rather see what kind of wrong |
343 |
argument has been passed rather than 'PyArg_ParseTuple failed' which |
344 |
tells me nothing. |
345 |
|
346 |
> >> + return NULL; |
347 |
> >> + } |
348 |
> >> + |
349 |
> >> + return Py_BuildValue("i", tolower(c)); |
350 |
> >> +} |
351 |
> >> + |
352 |
> >> + |
353 |
> >> +static PyObject * |
354 |
> >> +portage_c_toupper(PyObject *self, PyObject *args) |
355 |
> >> +{ |
356 |
> >> + int c; |
357 |
> >> + |
358 |
> >> + if (!PyArg_ParseTuple(args, "i", &c)) |
359 |
> >> + { |
360 |
> >> + PyErr_SetString(ConvertCaseError, "_c_toupper: PyArg_ParseTuple failed"); |
361 |
> >> + return NULL; |
362 |
> >> + } |
363 |
> >> + |
364 |
> >> + return Py_BuildValue("i", toupper(c)); |
365 |
> >> +} |
366 |
> > |
367 |
> > Thanks a lot for your effort. This is really getting in shape. |
368 |
> |
369 |
> I'm doing this because check_locale() broke portage on musl. I'm not a |
370 |
> frequent contributor to portage so I'd ask the portage team to free free |
371 |
> style it up for me. vapier does that to my code all the time :) |
372 |
|
373 |
-- |
374 |
Best regards, |
375 |
Michał Górny |
376 |
<http://dev.gentoo.org/~mgorny/> |