Gentoo Archives: gentoo-user

From: daid kahl <daidxor@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long
Date: Sun, 06 Dec 2009 01:59:18
Message-Id: 3ac129340912051758p101d0576q11d26187d0616b82@mail.gmail.com
In Reply to: [gentoo-user] [OT] Need advice from people who use non-ascii all day long by felix@crowfix.com
1 > I have a project which requires normalizing names, and by that, I mean
2 > converting to lower case etc, whatever eliminates redundancies.  I
3 > know Unicode has a different "normalize" meaning, but for my purposes,
4 > that has already been done.  Maybe I should call it standardization or
5 > make up a new cromulent word.
6 >
7 > By which I really mean I am confused by a lot of advice I have gotten
8 > from USAians who get by with the good old 7 bit ASCII character set on
9 > a daily basis, whether it be written in Unicode or not.
10
11 > Yes, I am something of an ignorant American.  I know some Japanese,
12 > French, and Spanish, but not the details of everyday usage.  I'd like
13 > to learn.
14
15 Your project sounds interesting, but I have little to contribute on
16 the technical side.
17
18 I'm curious about your handling of Japanese, just because I'm living
19 outside Tokyo these days. My grasp on Japanese is basically rubbish,
20 but I can at least claim to know a thing or two.
21
22 To keep this in line with your stated application, I actually wonder
23 how you handle "Tokyo." For pronunciation purposes, if you put it in
24 hiragana and literally romanized it, you'd probably get Toukyou. In
25 Japanese a double-vowel just extends the sound and isn't a dipthong
26 (and usually o is extended by u and only rarely another o). For a lot
27 of cases on the double 'oo' they'll Romanize the second 'o' as an 'h',
28 since other wise someone will pronounce it like (a) "fool." So, take
29 a family name Ohshiro. Probably it should be Romanized "Oohshiro,"
30 but then people would say something like seeing fireworks.
31
32 Tokyo is Romanized this way, according to one culture book I read,
33 because everyone knows both the o's are extended! I'm sure all these
34 people also know that "kyo" is a single syllable, too! So it's not
35 "To-key-oh" it's just "To-kyo" where both syllables are extended from
36 the double oo.
37
38 Osaka is also an extended O at the beginning as I recall, and Kyoto is
39 the same case as Tokyo (incidentally, the Chinese characters for those
40 two cities are the same and just reversed in order!).
41
42 Again to speak to the original application, I don't know who types
43 Tookyoo or Tohkyoh or Toukyou. Probably no one because it's generally
44 Romanized as we all know it. But for typing purposes, Japanese type
45 the pronunciation of words via hiragana and then a little list pops up
46 and they select the word they want. So in this sense, they are typing
47 "Toukyou" into the keyboard...just it's in hiragana.
48
49 If you had any questions about Japanese things, I could ask a
50 colleague. They are all happy to answer questions.
51
52 Regards,
53 daid

Replies