Gentoo Archives: gentoo-user

From:	Andreas Claesson <andreas.claesson@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: recompiling vim linked to libncursesw
Date:	Wed, 27 Jul 2005 12:45:32
Message-Id:	`72bdee2f050727053914a749c7@mail.gmail.com`
In Reply to:	[gentoo-user] Re: recompiling vim linked to libncursesw by Moshe Kaminsky

1	On 7/27/05, Moshe Kaminsky <kaminsky@××××××××××××.il> wrote:
2	> Hi,
3	> * Fernando Canizo <conan@××××××××××.ar> [27/07/05 14:14]:
4	<snip>
5	> > I investigate what was in the archives, so i saved a copy (using 'C'
6	> > command from mutt) of the first message (the one i receive from me)
7	> > and file says: 'UTF-8 Unicode mail text', check what's inside with
8	> > hexedit and see that LATIN SMALL LETTER A WITH ACUTE is encoded with
9	> > this hex: C3 A1 (which is not 00 E1 from unicode chart from
10	> > http://www.unicode.org/charts/)
11	>
12	> I think this is just the way these characters are represented in utf-8.
13
14	Yes, it is.
15
16	00E1 hex is '0000000 11100001' in binary.
17
18	When encoding this as UTF-8 this value is stored in two bytes.
19
20	The last byte will begin with '10' followed by the last 6 bits of data.
21
22	'10 100001' binary or 'A1' in hex.
23
24	The first byte will begin with '110' to indicate that it is a two byte
25	character followed by the remaining significant data.
26
27	'110 00011' binary or 'C3' hex.
28
29	This is correct.
30
31	The problem seem to be that mutt(?) takes this UTF-8 encoded data
32	and encodes as UTF-8 again as if the data was two 8 bit characters.
33
34	'C3' then becomes 'C3 83' and 'A1' becomes 'C2 A1'
35
36
37	/Andreas
38
39	--
40	gentoo-user@g.o mailing list

Subject	Author
Re: [gentoo-user] Re: recompiling vim linked to libncursesw	Fernando Canizo <conan@××××××××××.ar>