Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>
Cc: gentoo-dev@l.g.o, qa@g.o
Subject: Re: [gentoo-dev] Locale check in python_pkg_setup()
Date: Fri, 30 Jul 2010 02:38:50
Message-Id: 20100730023622.GB15031@hrair
In Reply to: [gentoo-dev] Locale check in python_pkg_setup() by Arfrever Frehtes Taifersar Arahesis
On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote:
> --- python.eclass > +++ python.eclass > @@ -355,6 +355,8 @@ > # Check if phase is pkg_setup(). > [[ "${EBUILD_PHASE}" != "setup" ]] && die "${FUNCNAME}() can be used only in pkg_setup() phase" > > + local locale > + > if [[ "$#" -ne 0 ]]; then > die "${FUNCNAME}() does not accept arguments" > fi > @@ -407,6 +409,16 @@ > unset -f python_pkg_setup_check_USE_flags > fi > > + locale="$(python -c 'import os; print(os.environ.get("LC_ALL", os.environ.get("LC_CTYPE", os.environ.get("LANG", "POSIX"))))')"
You're using python to get the exported env. Don't. Use bash (you're invoking python from freaking bash after all)...
> + if [[ "${locale}" != *.UTF-8 ]]; then > + eerror > + eerror "Currently used locale '${locale}' is unsupported and can cause build-time or run-time" > + eerror "problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale" > + eerror "will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems." > + eerror "See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale." > + eerror
For cases such as this, ewarn, not eerror. It's not an actual error, it's a potential source of problems people may see. The more I look into this issue, the more I'm convinced it's not user settings that are problem- the problem is in the code, not in user env. You've stated in a couple of places that "C/Posix locales are not supported", which frankly is very whacked- that's not really a proclamation you can make on your own for python, and you're actually ignoring that this problem would just as easily rear it's head with a latin-1 encoded file. Take a look at 302425; the traceback in that is a classic example of where they *should* be using bytes mode (they don't need to interpret the data, just write the script across, thus bytes). bug 328047 is induced by a patch we add (it's not in upstream python). The code in question also is invoking fricking ldd a few steps prior which is questionable in multiple ways: either way, relevant chunk is + os.system("ldd %s > %s" % (do_readline, tmpfile)) + fp = open(tmpfile) + for ln in fp: So... roughly, it invokes os.system, which will pass the environment straight through to it, meaning locale gets passed down. Then it open's the file. Note it specifes *NO ENCODING* nor is their actually an enforced locale best I can tell , thus ascii being the default. The screwup here is in our patches- said patches should be forcing posix locale for the ldd call (resulting in ascii). If you think through this bug, we've seen this multiple times in grep/sed calls- this is literally no different. bug 287439 is a screw up in the programs source... should've been using bytes (non arguable). Matter of fact, while generally I think Tarek knows what the hell he's doing, the skip they added to the tests ignored an actual valid bug in setuptools/distribute- shebangs from the standpoint of the kernel need to be consistant. Thus reading the shebang line itself should be done in bytes, than converted to ascii and interpretted- they tried opening the file (in whole) in bytes, meaning they tried enforcing ascii across the whole buffer- not just the first line. Program bug. These bugs I got via searching for 'ALL python locale', and identifying the ones that were actually locale related. I've at this point looked into the source of 3 bugs- meaning literally, 3 bugs checked into, 3 instances where the code was wrong. I'll leave it as an exercise for others to keep digging, but the point here is that the programs themselves screwup their locale handling- trying to force all systems to use a utf-8 locale for the env is just a hack instead of fixing the actual issue. A pretty bad hack considering I've spent all of 30 minutes digging into this and rooting out the actual flaws in the src I might add. For shits and giggles, lets add one more bug in- one that has the potential of rearing its head in random consuming pkgs, bug 322425 (docutils's build_html being flawed), their encoding handling is intrinsically flawed. The encoding of a file their installing/parsing should be determined by the file itself- not attempting to arbitrarily force it to whatever locale the user happens to be running (which is exactly the first thing buildhtml.py attempts, literally `locale.setlocale(locale.LC_ALL, '')` at line 20). The issue is not people using ascii locales, the issue is that these tools do not handle encoding correctly. Recall, one of the purposes of py3k going bytes vs text (aka unicode) was to make clear that textual data's encoding need be known. All of this code isn't actually forcing/handling the encoding for the data they deal in- meaning these are literal bugs, exposed purely due to py3k actually enforcing encoding in normal file opens. So... this is a big -1 on adding such a warning (especially considering it doesn't actually resolve the raw issues, it just sidesteps a couple of cases). Fix the actual problem instead... Finally, cc'ing QA since this is a class of bugs they should be aware of with py3k. This is a bit of a sign that a lot of source isn't really py3k ready yet either imo, but so it goes... ~harring

Replies

Subject Author
Re: [gentoo-dev] Locale check in python_pkg_setup() Arfrever Frehtes Taifersar Arahesis <Arfrever@g.o>