1 |
On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote: |
2 |
> --- python.eclass |
3 |
> +++ python.eclass |
4 |
> @@ -355,6 +355,8 @@ |
5 |
> # Check if phase is pkg_setup(). |
6 |
> [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase" |
7 |
> |
8 |
> + local locale |
9 |
> + |
10 |
> if [[ "$#" -ne 0 ]]; then |
11 |
> die "${FUNCNAME}() does not accept arguments" |
12 |
> fi |
13 |
> @@ -407,6 +409,16 @@ |
14 |
> unset -f python_pkg_setup_check_USE_flags |
15 |
> fi |
16 |
> |
17 |
> + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')" |
18 |
|
19 |
You're using python to get the exported env. Don't. Use bash (you're |
20 |
invoking python from freaking bash after all)... |
21 |
|
22 |
> + if [[ "${locale}" != *.UTF-8 ]]; then |
23 |
> + eerror |
24 |
> + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time" |
25 |
> + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale" |
26 |
> + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems." |
27 |
> + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale." |
28 |
> + eerror |
29 |
|
30 |
For cases such as this, ewarn, not eerror. It's not an actual error, |
31 |
it's a potential source of problems people may see. |
32 |
|
33 |
The more I look into this issue, the more I'm convinced it's not user |
34 |
settings that are problem- the problem is in the code, not in user |
35 |
env. You've stated in a couple of places that "C/Posix locales are |
36 |
not supported", which frankly is very whacked- that's not really a |
37 |
proclamation you can make on your own for python, and you're actually |
38 |
ignoring that this problem would just as easily rear it's head with a |
39 |
latin-1 encoded file. |
40 |
|
41 |
|
42 |
Take a look at 302425; the traceback in that is a classic example of |
43 |
where they *should* be using bytes mode (they don't need to interpret |
44 |
the data, just write the script across, thus bytes). |
45 |
|
46 |
bug 328047 is induced by a patch we add (it's not in upstream python). |
47 |
The code in question also is invoking fricking ldd a few steps prior |
48 |
which is questionable in multiple ways: either way, relevant chunk is |
49 |
+ os.system("ldd %s > %s" % (do_readline, tmpfile)) |
50 |
+ fp = open(tmpfile) |
51 |
+ for ln in fp: |
52 |
|
53 |
So... roughly, it invokes os.system, which will pass the environment |
54 |
straight through to it, meaning locale gets passed down. |
55 |
|
56 |
Then it open's the file. Note it specifes *NO ENCODING* nor is their |
57 |
actually an enforced locale best I can tell , thus ascii being the |
58 |
default. The screwup here is in our patches- said patches should be |
59 |
forcing posix locale for the ldd call (resulting in ascii). If you |
60 |
think through this bug, we've seen this multiple times in grep/sed |
61 |
calls- this is literally no different. |
62 |
|
63 |
bug 287439 is a screw up in the programs source... should've been |
64 |
using bytes (non arguable). Matter of fact, while generally I think |
65 |
Tarek knows what the hell he's doing, the skip they added to the |
66 |
tests ignored an actual valid bug in setuptools/distribute- shebangs |
67 |
from the standpoint of the kernel need to be consistant. Thus reading |
68 |
the shebang line itself should be done in bytes, than converted to |
69 |
ascii and interpretted- they tried opening the file (in whole) in |
70 |
bytes, meaning they tried enforcing ascii across the whole buffer- |
71 |
not just the first line. Program bug. |
72 |
|
73 |
These bugs I got via searching for 'ALL python locale', and |
74 |
identifying the ones that were actually locale related. I've at this |
75 |
point looked into the source of 3 bugs- meaning literally, 3 bugs |
76 |
checked into, 3 instances where the code was wrong. |
77 |
|
78 |
I'll leave it as an exercise for others to keep digging, but the point |
79 |
here is that the programs themselves screwup their locale handling- |
80 |
trying to force all systems to use a utf-8 locale for the env is just |
81 |
a hack instead of fixing the actual issue. A pretty bad hack |
82 |
considering I've spent all of 30 minutes digging into this and rooting |
83 |
out the actual flaws in the src I might add. |
84 |
|
85 |
For shits and giggles, lets add one more bug in- one that has the |
86 |
potential of rearing its head in random consuming pkgs, bug 322425 |
87 |
(docutils's build_html being flawed), their encoding handling is |
88 |
intrinsically flawed. The encoding of a file their |
89 |
installing/parsing should be determined by the file itself- not |
90 |
attempting to arbitrarily force it to whatever locale the user happens |
91 |
to be running (which is exactly the first thing buildhtml.py attempts, |
92 |
literally `locale.setlocale(locale.LC_ALL, '')` at line 20). The |
93 |
issue is not people using ascii locales, the issue is that these tools |
94 |
do not handle encoding correctly. |
95 |
|
96 |
Recall, one of the purposes of py3k going bytes vs text (aka unicode) |
97 |
was to make clear that textual data's encoding need be known. All of |
98 |
this code isn't actually forcing/handling the encoding for the data |
99 |
they deal in- meaning these are literal bugs, exposed purely due to |
100 |
py3k actually enforcing encoding in normal file opens. |
101 |
|
102 |
So... this is a big -1 on adding such a warning (especially |
103 |
considering it doesn't actually resolve the raw issues, it just |
104 |
sidesteps a couple of cases). |
105 |
|
106 |
Fix the actual problem instead... |
107 |
|
108 |
Finally, cc'ing QA since this is a class of bugs they should be aware |
109 |
of with py3k. This is a bit of a sign that a lot of source isn't |
110 |
really py3k ready yet either imo, but so it goes... |
111 |
|
112 |
~harring |