1 |
On 06/08/2013 15:26, Bruce Hill wrote: |
2 |
> On Tue, Aug 06, 2013 at 02:40:04PM +0100, Kerin Millar wrote: |
3 |
>> |
4 |
>> Apparently, "utf8" is the canonical representation in glibc (which |
5 |
>> provides the locale tool): |
6 |
>> |
7 |
>> http://lists.debian.org/debian-glibc/2004/12/msg00028.html |
8 |
>> |
9 |
>> That eselect enumerates the locale twice when the alternate form is |
10 |
>> specified in /etc/env.d/02locale could be considered as a minor bug. |
11 |
>> |
12 |
>> --Kerin |
13 |
> |
14 |
> RFC 3629 does not mention utf8, but I did see this notation in Wikipedia, and |
15 |
> yes, I understand that's not official: |
16 |
> |
17 |
> Other descriptions that omit the hyphen or replace it with a space, such as |
18 |
> "utf8" or "UTF 8", are not accepted as correct by the governing standards.[14] |
19 |
> Despite this, most agents such as browsers can understand them, and so |
20 |
> standards intended to describe existing practice (such as HTML5) may |
21 |
> effectively require their recognition. |
22 |
> |
23 |
> [14] http://www.ietf.org/rfc/rfc3629.txt |
24 |
|
25 |
Internally, glibc may use whatever representation it pleases. |
26 |
|
27 |
> I was only mildly curious seeing utf8 show up, because on numberous occasions |
28 |
> in #gentoo on FreeNode there have been different reports of incorrect |
29 |
> characters displayed with utf8, then fixed with UTF-8. Having read RFC 3629, I |
30 |
> just made it a habit to always use the standard (UTF-8). |
31 |
|
32 |
Probably due to buggy applications. According to a glibc maintainer, |
33 |
they should be using the nl_langinfo() function but some try to read the |
34 |
locale name itself. The response of both of these commands is the same: |
35 |
|
36 |
# LC_ALL=en_US.UTF-8 locale -k LC_CTYPE | grep charmap |
37 |
# LC_ALL=en_US.utf8 locale -k LC_CTYPE | grep charmap |
38 |
|
39 |
Ergo, applications that use the correct interface will be informed that |
40 |
the character encoding is "UTF-8", irrespective of the format of the |
41 |
locale name. |
42 |
|
43 |
Given the above, sticking to the "<lang>_<territory>.UTF-8" format seems |
44 |
wise. |
45 |
|
46 |
> |
47 |
> Having read the remainder of the Debian ML thread you referenced, I have a |
48 |
> headache. Debian did that to me when I used it for ~3 months in 2003. :-) |
49 |
> |
50 |
> Cheers, |
51 |
> Bruce |
52 |
> |