1 |
>>>>> On Mon, 20 Nov 2017, Ulrich Mueller wrote: |
2 |
|
3 |
>>>>> On Mon, 20 Nov 2017, Michał Górny wrote: |
4 |
>> All paths specified in the Manifest file must consist of characters |
5 |
>> corresponding to valid UTF-8 code points excluding the NULL character |
6 |
>> (``U+0000``) and characters classified as whitespace in the current |
7 |
>> version of the Unicode standard [#UNICODE]_. It is an error to use |
8 |
>> Manifest files in directories containing files whose names contain |
9 |
>> the disallowed characters. |
10 |
|
11 |
> See above. I believe that NUL and ASCII whitespace (i.e. characters |
12 |
> 09 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences |
13 |
> like "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE |
14 |
> MARK") doesn't make sense. |
15 |
|
16 |
Thinking about it, this still looks too complicated. So, exclude only |
17 |
SPACE (0x20) which is used as separator between fields. (NUL can be |
18 |
excluded too, but it won't occur anyway.) |
19 |
|
20 |
In fact, all Manifest files in the tree are ASCII only. |
21 |
So alternatively, filenames could be restricted to printable ASCII. |
22 |
This is also what GLEP 31 [1] says: |
23 |
|
24 |
| Suitable Characters for File and Directory Names |
25 |
| |
26 |
| Characters outside the ASCII 0..127 range cannot safely be used for |
27 |
| file or directory names. (Of course, not all characters inside the |
28 |
| ASCII 0..127 range can be used safely either.) |
29 |
|
30 |
Ulrich |
31 |
|
32 |
|
33 |
[1] Character Sets for Portage Tree Items |
34 |
https://www.gentoo.org/glep/glep-0031.html |