1 |
>>>>> On Tue, 7 Jun 2016, Chí-Thanh Christopher Nguyễn wrote: |
2 |
|
3 |
>> 4. According to Gettext documentation, "'@VARIANT' can denote any |
4 |
>> kind of characteristics that is not already implied by the language |
5 |
>> LL and the country CC." (So IIUC the BCP-47 variant "valencia" |
6 |
>> would become "@valencia".) |
7 |
|
8 |
> This I think is wrong and collides with POSIX. |
9 |
> POSIX modifiers are not allowed for LANG or LC_ALL in |
10 |
> POSIX.1-2008[1] Section 8.2 says you can have at most one modifier |
11 |
> field to "select a specific instance of localization data within a |
12 |
> single category", which I don't think applies because it is its own |
13 |
> locale, not an instance of an existing one. Furthermore (but that |
14 |
> doesn't apply in our use case), POSIX spec lists the example |
15 |
> LC_COLLATE=De_DE@dict |
16 |
> So what if you want Catalan Valencian with dictionary order? Or if |
17 |
> someone hypothetically came up with a different script? |
18 |
|
19 |
>> I haven't found any mention or usage of ISO 3166-2 region |
20 |
>> subdivisions in the context of locale. Can you provide any |
21 |
>> references for this? |
22 |
|
23 |
> As I wrote before, it is not used. But I think it is the only |
24 |
> spec-compliant way to marry POSIX locales with Catalan Valencian. |
25 |
> BCP-47 does it in a more natural way. |
26 |
|
27 |
So, trying to summarise: We cannot follow strict POSIX syntax, so our |
28 |
two choices are either to stick to Gettext LL_CC@VARIANT syntax or |
29 |
to change to BCP 47. |
30 |
|
31 |
Using BCP 47 would have some advantages: |
32 |
- It is a well defined standard [1] and tools for validation of |
33 |
language tags exist, e.g. [2]. |
34 |
- The L10N USE_EXPAND could follow usual USE flag syntax, as BCP 47 |
35 |
tags contain neither underscores (which are supposed to be reserved |
36 |
as USE_EXPAND separators) nor @ signs (which PMS explicitly |
37 |
mentions as an exception for LINGUAS). |
38 |
- Gettext's @VARIANT is ill-defined and conflates different |
39 |
characteristics like script and variant. There is no further |
40 |
subdivision within @VARIANT, which leads to locale names like |
41 |
sr@ijekavianlatin. Also different upstreams use different |
42 |
conventions, like @latin and @Latn for the latin script. |
43 |
- For the vast majority of languages, identifiers are either identical |
44 |
("de" -> "de") or they can be converted by simple shell substitution |
45 |
("pt-BR" -> "pt_BR"). |
46 |
- IIUC, L10N is primarily intended to control things like additional |
47 |
language bundles of packages. Some upstreams like libreoffice |
48 |
already use BCP 47 for these. |
49 |
|
50 |
On the other hand, there will be some cost: |
51 |
- If BCP 47 tags containing a script or a variant should be used to |
52 |
generate LINGUAS, they will require explicit mapping. (OTOH, such |
53 |
mapping will also be needed if we stick to Gettext syntax but unify |
54 |
variants like "sr@latin" and "sr@Latn".) |
55 |
- Different syntax for LINGUAS and L10N might be confusing to users, |
56 |
so additional documentation will be needed. |
57 |
|
58 |
Comments? |
59 |
|
60 |
Ulrich |
61 |
|
62 |
[1] https://tools.ietf.org/html/bcp47 |
63 |
[2] http://schneegans.de/lv/ |