Gentoo Archives: gentoo-dev

From: Ulrich Mueller <ulm@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v4]
Date: Wed, 22 Nov 2017 20:42:08
Message-Id: 23061.57620.868522.993612@a1i15.kph.uni-mainz.de
In Reply to: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v4] by "Michał Górny"
1 >>>>> On Wed, 22 Nov 2017, Michał Górny wrote:
2
3 > Path and filename encoding
4 > --------------------------
5
6 > The path fields in the Manifest file must consist of characters
7 > corresponding to valid UTF-8 code points excluding the NULL character
8 > (``U+0000``), the backwards slash (``\``) and characters classified
9 > as whitespace in the current version of the Unicode standard
10 > [#UNICODE]_.
11
12 As I said before, all C0 and C1 control characters and DEL should be
13 excluded as well, i.e. 0x00 to 0x1f, 0x7f, and 0x80 to 0x9f. Allowing
14 such characters in what is basically a text file is only asking for
15 trouble.
16
17 > Any of the excluded characters that are present in path must be encoded
18 > using one of the following escape sequences:
19
20 > - characters in the ``U+0000`` to ``U+007F`` range can be encoded
21 > as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal
22 > character code,
23
24 > - characters in the ``U+0000`` to ``U+FFFF`` range can be encoded
25 > as ``\uHHHH`` where ``HHHH`` specifies the zero-padded, hexadecimal
26 > character code,
27
28 > - characters in the UCS-4 range can be encoded as ``\UHHHHHHHH``
29 > where ``HHHHHHHH`` specifies the zero-padded, hexadecimal character
30 > code.
31
32 > It is invalid for backwards slash to be used in any other context,
33 > and a backwards slash present in filename must be encoded. Backwards
34 > slash used as path component separator should be replaced by forward
35 > slash instead.
36
37 This entire section about the escape mechanism should be clearly
38 labelled as being purely optional, as it is not relevant for Gentoo
39 (and would break backwards compatibility with existing package
40 manager implementations). Maybe add a reference to GLEP 31 too?
41
42 > The encoding can be used for other characters as well. In particular,
43 > escaping control characters is recommended to ensure that the file
44 > works correctly in text editors.
45
46 See above, this should not be "recommended", but literal control chars
47 should be strictly forbidden.
48
49 Ulrich