Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>
Cc: gentoo-dev@l.g.o, qa@g.o
Subject: Re: [gentoo-dev] Locale check in python_pkg_setup()
Date: Fri, 30 Jul 2010 02:38:50
Message-Id: 20100730023622.GB15031@hrair
In Reply to: [gentoo-dev] Locale check in python_pkg_setup() by Arfrever Frehtes Taifersar Arahesis
1 On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
2 > --- python.eclass
3 > +++ python.eclass
4 > @@ -355,6 +355,8 @@
5 > # Check if phase is pkg_setup().
6 > [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
7 >
8 > + local locale
9 > +
10 > if [[ "$#" -ne 0 ]]; then
11 > die "${FUNCNAME}() does not accept arguments"
12 > fi
13 > @@ -407,6 +409,16 @@
14 > unset -f python_pkg_setup_check_USE_flags
15 > fi
16 >
17 > + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
18
19 You're using python to get the exported env. Don't. Use bash (you're
20 invoking python from freaking bash after all)...
21
22 > + if [[ "${locale}" != *.UTF-8 ]]; then
23 > + eerror
24 > + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
25 > + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
26 > + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
27 > + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
28 > + eerror
29
30 For cases such as this, ewarn, not eerror. It's not an actual error,
31 it's a potential source of problems people may see.
32
33 The more I look into this issue, the more I'm convinced it's not user
34 settings that are problem- the problem is in the code, not in user
35 env. You've stated in a couple of places that "C/Posix locales are
36 not supported", which frankly is very whacked- that's not really a
37 proclamation you can make on your own for python, and you're actually
38 ignoring that this problem would just as easily rear it's head with a
39 latin-1 encoded file.
40
41
42 Take a look at 302425; the traceback in that is a classic example of
43 where they *should* be using bytes mode (they don't need to interpret
44 the data, just write the script across, thus bytes).
45
46 bug 328047 is induced by a patch we add (it's not in upstream python).
47 The code in question also is invoking fricking ldd a few steps prior
48 which is questionable in multiple ways: either way, relevant chunk is
49 + os.system("ldd %s > %s" % (do_readline, tmpfile))
50 + fp = open(tmpfile)
51 + for ln in fp:
52
53 So... roughly, it invokes os.system, which will pass the environment
54 straight through to it, meaning locale gets passed down.
55
56 Then it open's the file. Note it specifes *NO ENCODING* nor is their
57 actually an enforced locale best I can tell , thus ascii being the
58 default. The screwup here is in our patches- said patches should be
59 forcing posix locale for the ldd call (resulting in ascii). If you
60 think through this bug, we've seen this multiple times in grep/sed
61 calls- this is literally no different.
62
63 bug 287439 is a screw up in the programs source... should've been
64 using bytes (non arguable). Matter of fact, while generally I think
65 Tarek knows what the hell he's doing, the skip they added to the
66 tests ignored an actual valid bug in setuptools/distribute- shebangs
67 from the standpoint of the kernel need to be consistant. Thus reading
68 the shebang line itself should be done in bytes, than converted to
69 ascii and interpretted- they tried opening the file (in whole) in
70 bytes, meaning they tried enforcing ascii across the whole buffer-
71 not just the first line. Program bug.
72
73 These bugs I got via searching for 'ALL python locale', and
74 identifying the ones that were actually locale related. I've at this
75 point looked into the source of 3 bugs- meaning literally, 3 bugs
76 checked into, 3 instances where the code was wrong.
77
78 I'll leave it as an exercise for others to keep digging, but the point
79 here is that the programs themselves screwup their locale handling-
80 trying to force all systems to use a utf-8 locale for the env is just
81 a hack instead of fixing the actual issue. A pretty bad hack
82 considering I've spent all of 30 minutes digging into this and rooting
83 out the actual flaws in the src I might add.
84
85 For shits and giggles, lets add one more bug in- one that has the
86 potential of rearing its head in random consuming pkgs, bug 322425
87 (docutils's build_html being flawed), their encoding handling is
88 intrinsically flawed. The encoding of a file their
89 installing/parsing should be determined by the file itself- not
90 attempting to arbitrarily force it to whatever locale the user happens
91 to be running (which is exactly the first thing buildhtml.py attempts,
92 literally `locale.setlocale(locale.LC_ALL, '')` at line 20). The
93 issue is not people using ascii locales, the issue is that these tools
94 do not handle encoding correctly.
95
96 Recall, one of the purposes of py3k going bytes vs text (aka unicode)
97 was to make clear that textual data's encoding need be known. All of
98 this code isn't actually forcing/handling the encoding for the data
99 they deal in- meaning these are literal bugs, exposed purely due to
100 py3k actually enforcing encoding in normal file opens.
101
102 So... this is a big -1 on adding such a warning (especially
103 considering it doesn't actually resolve the raw issues, it just
104 sidesteps a couple of cases).
105
106 Fix the actual problem instead...
107
108 Finally, cc'ing QA since this is a class of bugs they should be aware
109 of with py3k. This is a bit of a sign that a lot of source isn't
110 really py3k ready yet either imo, but so it goes...
111
112 ~harring

Replies

Subject Author
Re: [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>