1 |
On Thu, Jun 01, 2006 at 02:41:27PM +0000, Wiktor Wandachowicz wrote: |
2 |
> Respectful Gentoo developers, |
3 |
> |
4 |
> I would like to ask what do you think about UTF-8 encoded manual pages? |
5 |
> I mean, the files like ls.1.gz, which are used by honorable "man" program. |
6 |
> Recently I attacked the problem a little and before submitting any |
7 |
> patches/proposals to Gentoo bugzilla I'd like to know your opinions first. |
8 |
> |
9 |
> Disclaimer: for daily use I have LANG="pl_PL.UTF-8" and LC_ALL="pl_PL.UTF-8", |
10 |
> but the original issue is of a more universal nature. |
11 |
> |
12 |
> Back on subject. ISO-8859-* 8-bit encodings are fine and most localized |
13 |
> manuals use them. However, there are some examples where UTF-8 manuals are |
14 |
> installed as well. Namely, newest portage uses "linguas_pl" by this means: |
15 |
> |
16 |
> $ emerge -pv portage |
17 |
> [ebuild R ] sys-apps/portage-2.1_rc3-r3 USE="-build -doc" LINGUAS="pl" |
18 |
> |
19 |
> In effect, a translated manual pages are added to the system. The problem |
20 |
> is that they use UTF-8 encoding. Having both man-pages-pl and this version |
21 |
> of portage installed gives unexpected results. This way "man ls" prints all |
22 |
> the letters with correct encoding, but "man emerge" does not. On the other |
23 |
> hand, if "man" is configured to display UTF-8 encoded manuals correctly, |
24 |
> all the other manuals print funny characters instead of desired output. |
25 |
> |
26 |
> I wrote a simple script [1] which checks all installed Polish manuals by |
27 |
> using "file" program. For "pl" locale it produces currently about ~70kB |
28 |
> of text, and for default locale it's about 458kB. After grepping for all |
29 |
> occurences of "UTF" I've found out that only the newest portage's manuals |
30 |
> are in UTF-8 ("pl"), plus: flow.1, gnome-keyring-manager.1, ImageMagick.1, |
31 |
> Encode::Unicode::UTF7.3pm (but I think they are false positives, anyway). |
32 |
> |
33 |
> While it's easy to contact Polish translators of the portage's manuals so |
34 |
> they could correct them, the problem will have to be solved sooner or later. |
35 |
> UTF-8 encoded manuals will probably occur with higher frequency, and some |
36 |
> general resolution should be made. |
37 |
> |
38 |
> After some discussion on the Polish forum [2] I've learnt about groff |
39 |
> deficiencies with UTF-8 handling. However, a wrapper exists [3] that helps |
40 |
> somewhat in that matter. But it also requires that all manuals be unified |
41 |
> wrt. encoding: *all* ISO-8859-* or *all* UTF-8, no compromise. |
42 |
> So I don't know what course to take. |
43 |
> |
44 |
> Summing up: |
45 |
> * UTF-8 manuals: good or bad? |
46 |
|
47 |
Bad if they're the only option. It means manpages will no longer be |
48 |
available for non-UTF-8 users. Also, forcing everything in |
49 |
/usr/share/man/pl to be UTF-8 will require users to emerge -e world. |
50 |
|
51 |
> * how to handle mixed encodings of manuals? |
52 |
|
53 |
The same way it's done now: install latin2 pl manpages in |
54 |
/usr/share/man/pl |
55 |
and utf8 pl manpages in |
56 |
/usr/share/man/pl.UTF-8 |
57 |
If anything installs utf8 manpages in /usr/share/man/pl, fix the ebuild. |
58 |
|
59 |
> * should man and/or groff handle UTF-8 better? |
60 |
|
61 |
Yes, but it's not required to get this problem sorted out. |
62 |
|
63 |
> * should an eclass function be created to aid in correcting the encoding |
64 |
> of manual pages while installing them? |
65 |
|
66 |
Maybe, but it's not required to get this problem sorted out. |
67 |
-- |
68 |
gentoo-dev@g.o mailing list |