Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Locale check in python_pkg_setup()
Date: Fri, 30 Jul 2010 18:47:44
Message-Id: 20100730184518.GA32513@hrair
In Reply to: Re: [gentoo-dev] Locale check in python_pkg_setup() by "Paweł Hajdan
1 On Fri, Jul 30, 2010 at 09:49:21AM -0700, "Paweee Hajdan, Jr." wrote:
2 > On 7/29/10 8:48 PM, Brian Harring wrote:
3 > > It's basically annoying people into changing to partially
4 > > sidestep a couple of bugs, instead of fixing the issue- and that's the
5 > > wrong course of action.
6 >
7 > I think that with python earlier than python-3 unicode handling is quite
8 > complicated, and I'm not surprised there are problems with that.
9
10 encoding handling wasn't that bad under py2k. Py3k just enforces the
11 boundaries- meaning you can't just skid by.
12
13 > Arfrever, does python-3 have the same problem with non-UTF8 locales?
14
15 ascii is a subset of utf-8 and ascii is a subset of latin-1; latin-1
16 and utf-8 aren't compatible in encoded form however.
17
18 What this means is that the same set of bugs I ran down still will go
19 boom if you have a utf-8 locale and the code in question was dealing
20 w/ a latin-1 encoded file.
21
22
23 > Another thing we can consider is making UTF8 the default setup in
24 > Gentoo. I think most people (including me) don't care whether it's C or
25 > UTF8 as long as it works.
26
27 "as long as it works" in this case means "fix the code" as I've laid
28 out. Forcing locale's to sidestep it leaves the latin-1/utf8
29 incompatibility to go 'boom'.
30
31 Basically, forcing utf8 doesn't "make it work". It reduces the cases
32 breakage will show up while leaving those issues still there- frankly
33 this is worse, can't fix those screwups without them breaking (for
34 better or worse, and preferably breaking in a testcase). We've got 4
35 bugs, and only one of them is semi complex fix (dodcutils needs to
36 require that html it's fed is utf8 compatible- valid enough req
37 anyways since html shouldn't be latin-1, it should be ascii or utf8).
38
39 So.. get fixing, instead of dodging the work imo. ;)
40
41 ~brian