Gentoo Archives: gentoo-user

From: Kerin Millar <kerframil@×××××××××××.uk>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8
Date: Tue, 06 Aug 2013 14:53:23
Message-Id: 52010DD0.70709@fastmail.co.uk
In Reply to: Re: [gentoo-user] export LC_CTYPE=en_US.UTF-8 by Bruce Hill
1 On 06/08/2013 15:26, Bruce Hill wrote:
2 > On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote:
3 >>
4 >> Apparently, "utf8" is the canonical representation in glibc (which
5 >> provides the locale tool):
6 >>
7 >> http://lists.debian.org/debian-glibc/2004/12/msg00028.html
8 >>
9 >> That eselect enumerates the locale twice when the alternate form is
10 >> specified in /etc/env.d/02locale could be considered as a minor bug.
11 >>
12 >> --Kerin
13 >
14 > RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and
15 > yes, I understand that's not official:
16 >
17 > Other descriptions that omit the hyphen or replace it with a space, such as
18 > "utf8" or "UTF 8", are not accepted as correct by the governing standards.[14]
19 > Despite this, most agents such as browsers can understand them, and so
20 > standards intended to describe existing practice (such as HTML5) may
21 > effectively require their recognition.
22 >
23 > [14] http://www.ietf.org/rfc/rfc3629.txt
24
25 Internally, glibc may use whatever representation it pleases.
26
27 > I was only mildly curious seeing utf8 show up, because on numberous occasions
28 > in #gentoo on FreeNode there have been different reports of incorrect
29 > characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I
30 > just made it a habit to always use the standard (UTF-8).
31
32 Probably due to buggy applications. According to a glibc maintainer,
33 they should be using the nl_langinfo() function but some try to read the
34 locale name itself. The response of both of these commands is the same:
35
36 # LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap
37 # LC_ALL=en_US.utf8 locale -k LC_CTYPE | grep charmap
38
39 Ergo, applications that use the correct interface will be informed that
40 the character encoding is "UTF-8", irrespective of the format of the
41 locale name.
42
43 Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems
44 wise.
45
46 >
47 > Having read the remainder of the Debian ML thread you referenced, I have a
48 > headache. Debian did that to me when I used it for ~3 months in 2003. :-)
49 >
50 > Cheers,
51 > Bruce
52 >