Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o, Ulrich Mueller <ulm@g.o>
Subject: Re: [gentoo-dev] RFC: BCP 47 for L10N? (was: News item: LINGUAS USE_EXPAND renamed to L10N)
Date: Fri, 10 Jun 2016 10:27:03
Message-Id: 8F7A467A-C42B-401D-9857-2B79850BFD95@gentoo.org
In Reply to: [gentoo-dev] RFC: BCP 47 for L10N? (was: News item: LINGUAS USE_EXPAND renamed to L10N) by Ulrich Mueller
1 Dnia 10 czerwca 2016 11:29:41 CEST, Ulrich Mueller <ulm@g.o> napisał(a):
2 >>>>>> On Tue, 7 Jun 2016, Chí-Thanh Christopher Nguyễn wrote:
3 >
4 >>> 4. According to Gettext documentation, "'@VARIANT' can denote any
5 >>> kind of characteristics that is not already implied by the language
6 >>> LL and the country CC." (So IIUC the BCP-47 variant "valencia"
7 >>> would become "@valencia".)
8 >
9 >> This I think is wrong and collides with POSIX.
10 >> POSIX modifiers are not allowed for LANG or LC_ALL in
11 >> POSIX.1-2008[1] Section 8.2 says you can have at most one modifier
12 >> field to "select a specific instance of localization data within a
13 >> single category", which I don't think applies because it is its own
14 >> locale, not an instance of an existing one. Furthermore (but that
15 >> doesn't apply in our use case), POSIX spec lists the example
16 >> LC_COLLATE=De_DE@dict
17 >> So what if you want Catalan Valencian with dictionary order? Or if
18 >> someone hypothetically came up with a different script?
19 >
20 >>> I haven't found any mention or usage of ISO 3166-2 region
21 >>> subdivisions in the context of locale. Can you provide any
22 >>> references for this?
23 >
24 >> As I wrote before, it is not used. But I think it is the only
25 >> spec-compliant way to marry POSIX locales with Catalan Valencian.
26 >> BCP-47 does it in a more natural way.
27 >
28 >So, trying to summarise: We cannot follow strict POSIX syntax, so our
29 >two choices are either to stick to Gettext LL_CC@VARIANT syntax or
30 >to change to BCP 47.
31 >
32 >Using BCP 47 would have some advantages:
33 >- It is a well defined standard [1] and tools for validation of
34 > language tags exist, e.g. [2].
35 >- The L10N USE_EXPAND could follow usual USE flag syntax, as BCP 47
36 > tags contain neither underscores (which are supposed to be reserved
37 > as USE_EXPAND separators) nor @ signs (which PMS explicitly
38 > mentions as an exception for LINGUAS).
39 >- Gettext's @VARIANT is ill-defined and conflates different
40 > characteristics like script and variant. There is no further
41 > subdivision within @VARIANT, which leads to locale names like
42 > sr@ijekavianlatin. Also different upstreams use different
43 > conventions, like @latin and @Latn for the latin script.
44 >- For the vast majority of languages, identifiers are either identical
45 > ("de" -> "de") or they can be converted by simple shell substitution
46 > ("pt-BR" -> "pt_BR").
47 >- IIUC, L10N is primarily intended to control things like additional
48 > language bundles of packages. Some upstreams like libreoffice
49 > already use BCP 47 for these.
50 >
51 >On the other hand, there will be some cost:
52 >- If BCP 47 tags containing a script or a variant should be used to
53 > generate LINGUAS, they will require explicit mapping. (OTOH, such
54 > mapping will also be needed if we stick to Gettext syntax but unify
55 > variants like "sr@latin" and "sr@Latn".)
56 >- Different syntax for LINGUAS and L10N might be confusing to users,
57 > so additional documentation will be needed.
58 >
59 >Comments?
60
61 I'd say BCP-47. The gettext tags aren't 100% defined anyway, so we'd end up having to choose between one upstream and another eventually, and map to the other.
62
63 Also, when it makes mapping L10N to LINGUAS harder, it will discourage people from abusing the latter.
64
65 >
66 >Ulrich
67 >
68 >[1] https://tools.ietf.org/html/bcp47
69 >[2] http://schneegans.de/lv/
70
71
72 --
73 Best regards,
74 Michał Górny (by phone)

Replies

Subject Author
Re: [gentoo-dev] RFC: BCP 47 for L10N? "Chí-Thanh Christopher Nguyễn" <chithanh@g.o>