Gentoo Archives: gentoo-dev

From: Kerin Millar <kerframil@×××××.com>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: LANG=en_GB.UTF-8 by default
Date: Sun, 19 Feb 2012 19:15:16
Message-Id: jhrhm9$icn$1@dough.gmane.org
In Reply to: Re: [gentoo-dev] Re: LANG=en_GB.UTF-8 by default by James Cloos
1 On 19/02/2012 01:00, James Cloos wrote:
2 >>>>>> "KM" == Kerin Millar<kerframil@×××××.com> writes:
3 >
4 > KM> Arch also used to define LC_COLLATE="C" by default, probably to
5 > KM> mitigate unpredictable behaviour in some applications, but have
6 > KM> since dropped this additional variable so they must have deemed it
7 > KM> no longer necessary.
8 >
9 > Without LC_COLLATE="C" things like [a-z]* gets a false=positive match
10 > on files like Makefile.
11
12 Indeed, character classes are a potential minefield. Incidentally, I
13 just tested Ubuntu and Arch with only LANG set to a UTF-8 locale:-
14
15 $ echo Makefile | sed -re 's/[a-z]//g' # collation rules ignored
16 M
17
18 $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
19 akefile
20
21 In neither case are the collation rules being obeyed. In Gentoo, however:-
22
23 $ echo Makefile | sed -re 's/[a-z]//g' # collation rules obeyed
24
25 $ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
26 akefile
27
28 Obeying the collation rules is ostensibly the correct thing to do but,
29 until everyone starts using named character classes (which will never
30 happen), it's not safe. The thing that worries me here is the
31 inconsistency in Gentoo. LC_COLLATE="C" is sufficient to work around the
32 issue but the above makes me wonder why we still need it.
33
34 --Kerin