1 |
Dnia 10 czerwca 2016 11:29:41 CEST, Ulrich Mueller <ulm@g.o> napisał(a): |
2 |
>>>>>> On Tue, 7 Jun 2016, Chí-Thanh Christopher Nguyễn wrote: |
3 |
> |
4 |
>>> 4. According to Gettext documentation, "'@VARIANT' can denote any |
5 |
>>> kind of characteristics that is not already implied by the language |
6 |
>>> LL and the country CC." (So IIUC the BCP-47 variant "valencia" |
7 |
>>> would become "@valencia".) |
8 |
> |
9 |
>> This I think is wrong and collides with POSIX. |
10 |
>> POSIX modifiers are not allowed for LANG or LC_ALL in |
11 |
>> POSIX.1-2008[1] Section 8.2 says you can have at most one modifier |
12 |
>> field to "select a specific instance of localization data within a |
13 |
>> single category", which I don't think applies because it is its own |
14 |
>> locale, not an instance of an existing one. Furthermore (but that |
15 |
>> doesn't apply in our use case), POSIX spec lists the example |
16 |
>> LC_COLLATE=De_DE@dict |
17 |
>> So what if you want Catalan Valencian with dictionary order? Or if |
18 |
>> someone hypothetically came up with a different script? |
19 |
> |
20 |
>>> I haven't found any mention or usage of ISO 3166-2 region |
21 |
>>> subdivisions in the context of locale. Can you provide any |
22 |
>>> references for this? |
23 |
> |
24 |
>> As I wrote before, it is not used. But I think it is the only |
25 |
>> spec-compliant way to marry POSIX locales with Catalan Valencian. |
26 |
>> BCP-47 does it in a more natural way. |
27 |
> |
28 |
>So, trying to summarise: We cannot follow strict POSIX syntax, so our |
29 |
>two choices are either to stick to Gettext LL_CC@VARIANT syntax or |
30 |
>to change to BCP 47. |
31 |
> |
32 |
>Using BCP 47 would have some advantages: |
33 |
>- It is a well defined standard [1] and tools for validation of |
34 |
> language tags exist, e.g. [2]. |
35 |
>- The L10N USE_EXPAND could follow usual USE flag syntax, as BCP 47 |
36 |
> tags contain neither underscores (which are supposed to be reserved |
37 |
> as USE_EXPAND separators) nor @ signs (which PMS explicitly |
38 |
> mentions as an exception for LINGUAS). |
39 |
>- Gettext's @VARIANT is ill-defined and conflates different |
40 |
> characteristics like script and variant. There is no further |
41 |
> subdivision within @VARIANT, which leads to locale names like |
42 |
> sr@ijekavianlatin. Also different upstreams use different |
43 |
> conventions, like @latin and @Latn for the latin script. |
44 |
>- For the vast majority of languages, identifiers are either identical |
45 |
> ("de" -> "de") or they can be converted by simple shell substitution |
46 |
> ("pt-BR" -> "pt_BR"). |
47 |
>- IIUC, L10N is primarily intended to control things like additional |
48 |
> language bundles of packages. Some upstreams like libreoffice |
49 |
> already use BCP 47 for these. |
50 |
> |
51 |
>On the other hand, there will be some cost: |
52 |
>- If BCP 47 tags containing a script or a variant should be used to |
53 |
> generate LINGUAS, they will require explicit mapping. (OTOH, such |
54 |
> mapping will also be needed if we stick to Gettext syntax but unify |
55 |
> variants like "sr@latin" and "sr@Latn".) |
56 |
>- Different syntax for LINGUAS and L10N might be confusing to users, |
57 |
> so additional documentation will be needed. |
58 |
> |
59 |
>Comments? |
60 |
|
61 |
I'd say BCP-47. The gettext tags aren't 100% defined anyway, so we'd end up having to choose between one upstream and another eventually, and map to the other. |
62 |
|
63 |
Also, when it makes mapping L10N to LINGUAS harder, it will discourage people from abusing the latter. |
64 |
|
65 |
> |
66 |
>Ulrich |
67 |
> |
68 |
>[1] https://tools.ietf.org/html/bcp47 |
69 |
>[2] http://schneegans.de/lv/ |
70 |
|
71 |
|
72 |
-- |
73 |
Best regards, |
74 |
Michał Górny (by phone) |