Gentoo Archives: gentoo-dev

From:	Brian Harring <ferringb@×××××.com>
To:	Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>
Cc:	gentoo-dev@l.g.o, qa@g.o
Subject:	Re: [gentoo-dev] Locale check in python_pkg_setup()
Date:	Fri, 30 Jul 2010 02:38:50
Message-Id:	`20100730023622.GB15031@hrair`
In Reply to:	[gentoo-dev] Locale check in python_pkg_setup() by Arfrever Frehtes Taifersar Arahesis

1	On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
2	> --- python.eclass
3	> +++ python.eclass
4	> @@ -355,6 +355,8 @@
5	> # Check if phase is pkg_setup().
6	> [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase"
7	>
8	> + local locale
9	> +
10	> if [[ "$#" -ne 0 ]]; then
11	> die "${FUNCNAME}() does not accept arguments"
12	> fi
13	> @@ -407,6 +409,16 @@
14	> unset -f python_pkg_setup_check_USE_flags
15	> fi
16	>
17	> + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
18
19	You're using python to get the exported env. Don't. Use bash (you're
20	invoking python from freaking bash after all)...
21
22	> + if [[ "${locale}" != *.UTF-8 ]]; then
23	> + eerror
24	> + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time"
25	> + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale"
26	> + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems."
27	> + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale."
28	> + eerror
29
30	For cases such as this, ewarn, not eerror. It's not an actual error,
31	it's a potential source of problems people may see.
32
33	The more I look into this issue, the more I'm convinced it's not user
34	settings that are problem- the problem is in the code, not in user
35	env. You've stated in a couple of places that "C/Posix locales are
36	not supported", which frankly is very whacked- that's not really a
37	proclamation you can make on your own for python, and you're actually
38	ignoring that this problem would just as easily rear it's head with a
39	latin-1 encoded file.
40
41
42	Take a look at 302425; the traceback in that is a classic example of
43	where they should be using bytes mode (they don't need to interpret
44	the data, just write the script across, thus bytes).
45
46	bug 328047 is induced by a patch we add (it's not in upstream python).
47	The code in question also is invoking fricking ldd a few steps prior
48	which is questionable in multiple ways: either way, relevant chunk is
49	+ os.system("ldd %s > %s" % (do_readline, tmpfile))
50	+ fp = open(tmpfile)
51	+ for ln in fp:
52
53	So... roughly, it invokes os.system, which will pass the environment
54	straight through to it, meaning locale gets passed down.
55
56	Then it open's the file. Note it specifes NO ENCODING nor is their
57	actually an enforced locale best I can tell , thus ascii being the
58	default. The screwup here is in our patches- said patches should be
59	forcing posix locale for the ldd call (resulting in ascii). If you
60	think through this bug, we've seen this multiple times in grep/sed
61	calls- this is literally no different.
62
63	bug 287439 is a screw up in the programs source... should've been
64	using bytes (non arguable). Matter of fact, while generally I think
65	Tarek knows what the hell he's doing, the skip they added to the
66	tests ignored an actual valid bug in setuptools/distribute- shebangs
67	from the standpoint of the kernel need to be consistant. Thus reading
68	the shebang line itself should be done in bytes, than converted to
69	ascii and interpretted- they tried opening the file (in whole) in
70	bytes, meaning they tried enforcing ascii across the whole buffer-
71	not just the first line. Program bug.
72
73	These bugs I got via searching for 'ALL python locale', and
74	identifying the ones that were actually locale related. I've at this
75	point looked into the source of 3 bugs- meaning literally, 3 bugs
76	checked into, 3 instances where the code was wrong.
77
78	I'll leave it as an exercise for others to keep digging, but the point
79	here is that the programs themselves screwup their locale handling-
80	trying to force all systems to use a utf-8 locale for the env is just
81	a hack instead of fixing the actual issue. A pretty bad hack
82	considering I've spent all of 30 minutes digging into this and rooting
83	out the actual flaws in the src I might add.
84
85	For shits and giggles, lets add one more bug in- one that has the
86	potential of rearing its head in random consuming pkgs, bug 322425
87	(docutils's build_html being flawed), their encoding handling is
88	intrinsically flawed. The encoding of a file their
89	installing/parsing should be determined by the file itself- not
90	attempting to arbitrarily force it to whatever locale the user happens
91	to be running (which is exactly the first thing buildhtml.py attempts,
92	literally `locale.setlocale(locale.LC_ALL, '')` at line 20). The
93	issue is not people using ascii locales, the issue is that these tools
94	do not handle encoding correctly.
95
96	Recall, one of the purposes of py3k going bytes vs text (aka unicode)
97	was to make clear that textual data's encoding need be known. All of
98	this code isn't actually forcing/handling the encoding for the data
99	they deal in- meaning these are literal bugs, exposed purely due to
100	py3k actually enforcing encoding in normal file opens.
101
102	So... this is a big -1 on adding such a warning (especially
103	considering it doesn't actually resolve the raw issues, it just
104	sidesteps a couple of cases).
105
106	Fix the actual problem instead...
107
108	Finally, cc'ing QA since this is a class of bugs they should be aware
109	of with py3k. This is a bit of a sign that a lot of source isn't
110	really py3k ready yet either imo, but so it goes...
111
112	~harring

Replies

Subject	Author
Re: [gentoo-dev] Locale check in python_pkg_setup()	Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>

Report Message

Find on MARC Find on Google Groups