Gentoo Archives: gentoo-user

From: Michael <confabulate@××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] locating non utf-8 characters
Date: Tue, 03 Nov 2020 10:04:49
Message-Id: 12637716.uLZWGnKmhe@lenovo.localdomain
In Reply to: [gentoo-user] locating non utf-8 characters by thelma@sys-concept.com
1 On Tuesday, 3 November 2020 05:46:41 GMT thelma@×××××××××××.com wrote:
2 > I'm using sql-ledger and while making backup it uses stardard gzip program:
3 > $gzip = "gzip -S .gz";
4 >
5 > The backup works with some dataset but one data set us giving me an
6 > error while trying to perform backup:
7 >
8 > Wide character in print at SL/AM.pm line 2044.
9 > Content-Type: application/file; Content-Disposition: attachment;
10 > filename=dataset_3-3.2.6-20201101.sql.gz
11 > ãù&ü_1604265628.dataset_3-3.2.6-20201101.sqlÏ\Yo;ñ~œØ –
12 >
13 > Since sql-ledger file are standard utf-8 files, I was thinking using:
14 > grep -axv '.*' file
15 >
16 > would find all not utf-8 characters. And it did. I use "nano" to remove
17 > them but I'm still getting the same error while performing backup.
18 >
19 > Any ideas?
20
21 I have not used sql-ledger, but have come across the following two symptoms
22 which may be relevant to your problem.
23
24 1. A SQL database which was created with an MSWindows application was using
25 UTF-16 instead of UTF-8. This added some UTF-16 null character at the start
26 of the SQL dump which messed up the output. The offending character was
27 obvious as a block when inspecting the dump with 'less' in Linux with its
28 default UTF-8 character encoding and could be deleted with a text editor. I
29 don't think this relates to your problem, but I am mentioning it for
30 completeness.
31
32 2. The word "print" in the error reported gives a hint you should follow up.
33 Perl which is used by sql-ledger, converts bytes to characters and can be set
34 to use UTF-8 encoding. However, it's conversion algorithm does not get things
35 right every time and when it concatenates strings it can mistranslate them.
36 You could fix this by setting both input *and* output encoding characters to
37 UTF-8. A good explanation of the problem and suitable solutions are described
38 here:
39
40 https://www.ahinea.com/en/tech/perl-unicode-struggle.html

Attachments

File name MIME type
signature.asc application/pgp-signature