1 |
On 2020-12-29, Walter Dnes <waltdnes@××××××××.org> wrote: |
2 |
> On Tue, Dec 29, 2020 at 05:11:36PM +0200, Andreas K. Huettel wrote |
3 |
>> Hi Walter, |
4 |
>> |
5 |
>> > "-pch -roaming -sendmail -spell -tcpd -udev -udisks -unicode -upower |
6 |
>> > -xinerama" |
7 |
>> |
8 |
>> mostly out of curiosity, why do you want to disable unicode support |
9 |
>> here? |
10 |
>> |
11 |
>> This feels odd to me since utf8 has effectively become the standard |
12 |
>> encoding over the past years. |
13 |
> |
14 |
> I don't know if this has improved over the years, but my initial |
15 |
> experience with unicode was rather negative. The fact that text |
16 |
> files were twice as large wasn't a major problem in itself. The |
17 |
> real showstopper was that importing text files into spreadsheets |
18 |
> and text-editors and word processors failed miseraby. |
19 |
|
20 |
You must be talking about some sort of weird "wide" encoding (is there |
21 |
such a thing as UTF-16?). I've never seen a file like that. Everybody |
22 |
and everything uses UTF-8 these days and has for years. UTF-8 is a |
23 |
superset of ASCII, and doesn't increase size of the file unless |
24 |
non-ascii characters are used. Converting an ASCII file to UTF-8 |
25 |
encoding is a noop. An ASCII file _is_ UTF-8. |
26 |
|
27 |
> I looked at a unicode text file with a binary viewer. It turns out |
28 |
> that a simple text string like "1234" was actually... |
29 |
> |
30 |
> "1" binary-zero "2" binary-zero "3" binary-zero "4" binary zero, etc. |
31 |
> |
32 |
> This padding explains why the file was twice as large, and also why |
33 |
> "a simple textfile import" failed miserably. |
34 |
|
35 |
I've never seen a file like that. All the Unicode I run into is UTF-8, |
36 |
and a UTF-8 file with the string "1234" is the same exact 4 bytes as |
37 |
an ASCII file with the string "1234". |
38 |
|
39 |
> On top of that Cyrillic letters like "m", "i", "c", and "o" are |
40 |
> considered different from their English equivalants. Security experts |
41 |
> showed proof-of-cocept attacks where clicking on "microsoft.com" can |
42 |
> take you to a hostile domain (queue the jokes). I don't speak or read |
43 |
> or write any languages which have thousands of unique characters. |
44 |
> Seeing Chinese spam "as it was intended to be seen", is not a priority |
45 |
> for me. |