1 |
> I have a project which requires normalizing names, and by that, I mean |
2 |
> converting to lower case etc, whatever eliminates redundancies. I |
3 |
> know Unicode has a different "normalize" meaning, but for my purposes, |
4 |
> that has already been done. Maybe I should call it standardization or |
5 |
> make up a new cromulent word. |
6 |
> |
7 |
> By which I really mean I am confused by a lot of advice I have gotten |
8 |
> from USAians who get by with the good old 7 bit ASCII character set on |
9 |
> a daily basis, whether it be written in Unicode or not. |
10 |
|
11 |
> Yes, I am something of an ignorant American. I know some Japanese, |
12 |
> French, and Spanish, but not the details of everyday usage. I'd like |
13 |
> to learn. |
14 |
|
15 |
Your project sounds interesting, but I have little to contribute on |
16 |
the technical side. |
17 |
|
18 |
I'm curious about your handling of Japanese, just because I'm living |
19 |
outside Tokyo these days. My grasp on Japanese is basically rubbish, |
20 |
but I can at least claim to know a thing or two. |
21 |
|
22 |
To keep this in line with your stated application, I actually wonder |
23 |
how you handle "Tokyo." For pronunciation purposes, if you put it in |
24 |
hiragana and literally romanized it, you'd probably get Toukyou. In |
25 |
Japanese a double-vowel just extends the sound and isn't a dipthong |
26 |
(and usually o is extended by u and only rarely another o). For a lot |
27 |
of cases on the double 'oo' they'll Romanize the second 'o' as an 'h', |
28 |
since other wise someone will pronounce it like (a) "fool." So, take |
29 |
a family name Ohshiro. Probably it should be Romanized "Oohshiro," |
30 |
but then people would say something like seeing fireworks. |
31 |
|
32 |
Tokyo is Romanized this way, according to one culture book I read, |
33 |
because everyone knows both the o's are extended! I'm sure all these |
34 |
people also know that "kyo" is a single syllable, too! So it's not |
35 |
"To-key-oh" it's just "To-kyo" where both syllables are extended from |
36 |
the double oo. |
37 |
|
38 |
Osaka is also an extended O at the beginning as I recall, and Kyoto is |
39 |
the same case as Tokyo (incidentally, the Chinese characters for those |
40 |
two cities are the same and just reversed in order!). |
41 |
|
42 |
Again to speak to the original application, I don't know who types |
43 |
Tookyoo or Tohkyoh or Toukyou. Probably no one because it's generally |
44 |
Romanized as we all know it. But for typing purposes, Japanese type |
45 |
the pronunciation of words via hiragana and then a little list pops up |
46 |
and they select the word they want. So in this sense, they are typing |
47 |
"Toukyou" into the keyboard...just it's in hiragana. |
48 |
|
49 |
If you had any questions about Japanese things, I could ask a |
50 |
colleague. They are all happy to answer questions. |
51 |
|
52 |
Regards, |
53 |
daid |