Gentoo Archives: gentoo-dev

From: "Harald van Dijk" <truedfx@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] UTF-8 encoding and file format of manuals
Date: Fri, 02 Jun 2006 03:14:23
Message-Id: 20060602031021.GA12310@gentoo.org
In Reply to: [gentoo-dev] UTF-8 encoding and file format of manuals by Wiktor Wandachowicz
1 On Thu, Jun 01, 2006 at 02:41:27PM +0000, Wiktor Wandachowicz wrote:
2 > Respectful Gentoo developers,
3 >
4 > I would like to ask what do you think about UTF-8 encoded manual pages?
5 > I mean, the files like ls.1.gz, which are used by honorable "man" program.
6 > Recently I attacked the problem a little and before submitting any
7 > patches/proposals to Gentoo bugzilla I'd like to know your opinions first.
8 >
9 > Disclaimer: for daily use I have LANG="pl_PL.UTF-8" and LC_ALL="pl_PL.UTF-8",
10 > but the original issue is of a more universal nature.
11 >
12 > Back on subject. ISO-8859-* 8-bit encodings are fine and most localized
13 > manuals use them. However, there are some examples where UTF-8 manuals are
14 > installed as well. Namely, newest portage uses "linguas_pl" by this means:
15 >
16 > $ emerge -pv portage
17 > [ebuild R ] sys-apps/portage-2.1_rc3-r3 USE="-build -doc" LINGUAS="pl"
18 >
19 > In effect, a translated manual pages are added to the system. The problem
20 > is that they use UTF-8 encoding. Having both man-pages-pl and this version
21 > of portage installed gives unexpected results. This way "man ls" prints all
22 > the letters with correct encoding, but "man emerge" does not. On the other
23 > hand, if "man" is configured to display UTF-8 encoded manuals correctly,
24 > all the other manuals print funny characters instead of desired output.
25 >
26 > I wrote a simple script [1] which checks all installed Polish manuals by
27 > using "file" program. For "pl" locale it produces currently about ~70kB
28 > of text, and for default locale it's about 458kB. After grepping for all
29 > occurences of "UTF" I've found out that only the newest portage's manuals
30 > are in UTF-8 ("pl"), plus: flow.1, gnome-keyring-manager.1, ImageMagick.1,
31 > Encode::Unicode::UTF7.3pm (but I think they are false positives, anyway).
32 >
33 > While it's easy to contact Polish translators of the portage's manuals so
34 > they could correct them, the problem will have to be solved sooner or later.
35 > UTF-8 encoded manuals will probably occur with higher frequency, and some
36 > general resolution should be made.
37 >
38 > After some discussion on the Polish forum [2] I've learnt about groff
39 > deficiencies with UTF-8 handling. However, a wrapper exists [3] that helps
40 > somewhat in that matter. But it also requires that all manuals be unified
41 > wrt. encoding: *all* ISO-8859-* or *all* UTF-8, no compromise.
42 > So I don't know what course to take.
43 >
44 > Summing up:
45 > * UTF-8 manuals: good or bad?
46
47 Bad if they're the only option. It means manpages will no longer be
48 available for non-UTF-8 users. Also, forcing everything in
49 /usr/share/man/pl to be UTF-8 will require users to emerge -e world.
50
51 > * how to handle mixed encodings of manuals?
52
53 The same way it's done now: install latin2 pl manpages in
54 /usr/share/man/pl
55 and utf8 pl manpages in
56 /usr/share/man/pl.UTF-8
57 If anything installs utf8 manpages in /usr/share/man/pl, fix the ebuild.
58
59 > * should man and/or groff handle UTF-8 better?
60
61 Yes, but it's not required to get this problem sorted out.
62
63 > * should an eclass function be created to aid in correcting the encoding
64 > of manual pages while installing them?
65
66 Maybe, but it's not required to get this problem sorted out.
67 --
68 gentoo-dev@g.o mailing list