1 |
Respectful Gentoo developers, |
2 |
|
3 |
I would like to ask what do you think about UTF-8 encoded manual pages? |
4 |
I mean, the files like ls.1.gz, which are used by honorable "man" program. |
5 |
Recently I attacked the problem a little and before submitting any |
6 |
patches/proposals to Gentoo bugzilla I'd like to know your opinions first. |
7 |
|
8 |
Disclaimer: for daily use I have LANG="pl_PL.UTF-8" and LC_ALL="pl_PL.UTF-8", |
9 |
but the original issue is of a more universal nature. |
10 |
|
11 |
Back on subject. ISO-8859-* 8-bit encodings are fine and most localized |
12 |
manuals use them. However, there are some examples where UTF-8 manuals are |
13 |
installed as well. Namely, newest portage uses "linguas_pl" by this means: |
14 |
|
15 |
$ emerge -pv portage |
16 |
[ebuild R ] sys-apps/portage-2.1_rc3-r3 USE="-build -doc" LINGUAS="pl" |
17 |
|
18 |
In effect, a translated manual pages are added to the system. The problem |
19 |
is that they use UTF-8 encoding. Having both man-pages-pl and this version |
20 |
of portage installed gives unexpected results. This way "man ls" prints all |
21 |
the letters with correct encoding, but "man emerge" does not. On the other |
22 |
hand, if "man" is configured to display UTF-8 encoded manuals correctly, |
23 |
all the other manuals print funny characters instead of desired output. |
24 |
|
25 |
I wrote a simple script [1] which checks all installed Polish manuals by |
26 |
using "file" program. For "pl" locale it produces currently about ~70kB |
27 |
of text, and for default locale it's about 458kB. After grepping for all |
28 |
occurences of "UTF" I've found out that only the newest portage's manuals |
29 |
are in UTF-8 ("pl"), plus: flow.1, gnome-keyring-manager.1, ImageMagick.1, |
30 |
Encode::Unicode::UTF7.3pm (but I think they are false positives, anyway). |
31 |
|
32 |
While it's easy to contact Polish translators of the portage's manuals so |
33 |
they could correct them, the problem will have to be solved sooner or later. |
34 |
UTF-8 encoded manuals will probably occur with higher frequency, and some |
35 |
general resolution should be made. |
36 |
|
37 |
After some discussion on the Polish forum [2] I've learnt about groff |
38 |
deficiencies with UTF-8 handling. However, a wrapper exists [3] that helps |
39 |
somewhat in that matter. But it also requires that all manuals be unified |
40 |
wrt. encoding: *all* ISO-8859-* or *all* UTF-8, no compromise. |
41 |
So I don't know what course to take. |
42 |
|
43 |
Summing up: |
44 |
* UTF-8 manuals: good or bad? |
45 |
* how to handle mixed encodings of manuals? |
46 |
* should man and/or groff handle UTF-8 better? |
47 |
* should an eclass function be created to aid in correcting the encoding |
48 |
of manual pages while installing them? |
49 |
|
50 |
Any constructive comments are more than welcome! |
51 |
|
52 |
Best regards, |
53 |
Wiktor Wandachowicz |
54 |
(SirYes) |
55 |
|
56 |
[1] http://ics.p.lodz.pl/~wiktorw/gentoo/checkman |
57 |
[2] http://forums.gentoo.org/viewtopic-p-3352287.html |
58 |
[3] http://hoth.amu.edu.pl/~d_szeluga/groff-utf8.tar.bz2 |
59 |
|
60 |
|
61 |
-- |
62 |
gentoo-dev@g.o mailing list |