1 |
>>>>> On Wed, 22 Nov 2017, Michał Górny wrote: |
2 |
|
3 |
> Path and filename encoding |
4 |
> -------------------------- |
5 |
|
6 |
> The path fields in the Manifest file must consist of characters |
7 |
> corresponding to valid UTF-8 code points excluding the NULL character |
8 |
> (``U+0000``), the backwards slash (``\``) and characters classified |
9 |
> as whitespace in the current version of the Unicode standard |
10 |
> [#UNICODE]_. |
11 |
|
12 |
As I said before, all C0 and C1 control characters and DEL should be |
13 |
excluded as well, i.e. 0x00 to 0x1f, 0x7f, and 0x80 to 0x9f. Allowing |
14 |
such characters in what is basically a text file is only asking for |
15 |
trouble. |
16 |
|
17 |
> Any of the excluded characters that are present in path must be encoded |
18 |
> using one of the following escape sequences: |
19 |
|
20 |
> - characters in the ``U+0000`` to ``U+007F`` range can be encoded |
21 |
> as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal |
22 |
> character code, |
23 |
|
24 |
> - characters in the ``U+0000`` to ``U+FFFF`` range can be encoded |
25 |
> as ``\uHHHH`` where ``HHHH`` specifies the zero-padded, hexadecimal |
26 |
> character code, |
27 |
|
28 |
> - characters in the UCS-4 range can be encoded as ``\UHHHHHHHH`` |
29 |
> where ``HHHHHHHH`` specifies the zero-padded, hexadecimal character |
30 |
> code. |
31 |
|
32 |
> It is invalid for backwards slash to be used in any other context, |
33 |
> and a backwards slash present in filename must be encoded. Backwards |
34 |
> slash used as path component separator should be replaced by forward |
35 |
> slash instead. |
36 |
|
37 |
This entire section about the escape mechanism should be clearly |
38 |
labelled as being purely optional, as it is not relevant for Gentoo |
39 |
(and would break backwards compatibility with existing package |
40 |
manager implementations). Maybe add a reference to GLEP 31 too? |
41 |
|
42 |
> The encoding can be used for other characters as well. In particular, |
43 |
> escaping control characters is recommended to ensure that the file |
44 |
> works correctly in text editors. |
45 |
|
46 |
See above, this should not be "recommended", but literal control chars |
47 |
should be strictly forbidden. |
48 |
|
49 |
Ulrich |