1 |
On 07/04/2010 17:21, luis jure wrote: |
2 |
> |
3 |
> hello list. |
4 |
> |
5 |
> i have a bunch of files with accented characters in their names, both |
6 |
> upper- and lower case. i want to rename them using the non-accented |
7 |
> equivalent. i thought that would be easy to do using something like tr. |
8 |
> big mistake. confronted with accented characters, tr outputs garbage. |
9 |
> |
10 |
> searching the web, i found this: "Although the tr command respects C |
11 |
> locale environment variables, don't expect it to do anything sensible |
12 |
> with UTF-8 documents, such as being able to replace lower-case accented |
13 |
> characters with appropriate upper-case characters. The tr command works |
14 |
> best with ASCII and the other standard C locales." |
15 |
> |
16 |
> i'm using es_UY.UTF8 and i can't make tr do anything useful. |
17 |
|
18 |
It can be done with Perl. For example: |
19 |
|
20 |
$ echo "El castellano es la lengua española oficial del Estado. Las |
21 |
demás lenguas españolas serán también oficiales en las respectivas |
22 |
Comunidades Autónomas" | perl -M'encoding utf8' -MUnicode::Normalize -pe |
23 |
'$_=NFKD($_);s/\pM//og' |
24 |
|
25 |
The following output should be seen: |
26 |
|
27 |
El castellano es la lengua espanola oficial del Estado. Las demas |
28 |
lenguas espanolas seran tambien oficiales en las respectivas Comunidades |
29 |
Autonomas |
30 |
|
31 |
Cheers, |
32 |
|
33 |
--Kerin |