Gentoo Archives: gentoo-commits

From: "Michał Górny" <mgorny@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] data/glep:glep-manifest commit in: /
Date: Thu, 23 Nov 2017 18:45:55
Message-Id: 1511462259.ed111f85c3e7ab98678ee0379589281a2c92380c.mgorny@gentoo
1 commit: ed111f85c3e7ab98678ee0379589281a2c92380c
2 Author: Michał Górny <mgorny <AT> gentoo <DOT> org>
3 AuthorDate: Thu Nov 23 18:37:39 2017 +0000
4 Commit: Michał Górny <mgorny <AT> gentoo <DOT> org>
5 CommitDate: Thu Nov 23 18:37:39 2017 +0000
6 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=ed111f85
7
8 glep-0074: Always exclude control characters
9
10 glep-0074.rst | 24 ++++++++++++------------
11 1 file changed, 12 insertions(+), 12 deletions(-)
12
13 diff --git a/glep-0074.rst b/glep-0074.rst
14 index 8687969..6db6caa 100644
15 --- a/glep-0074.rst
16 +++ b/glep-0074.rst
17 @@ -138,10 +138,9 @@ Path and filename encoding
18 --------------------------
19
20 The path fields in the Manifest file must consist of characters
21 -corresponding to valid UTF-8 code points excluding the NULL character
22 -(``U+0000``), the backwards slash (``\``) and characters classified
23 -as whitespace in the current version of the Unicode standard
24 -[#UNICODE]_.
25 +corresponding to valid UTF-8 code points excluding the backwards slash
26 +(``\``) and characters classified as control characters and whitespace
27 +in the current version of the Unicode standard [#UNICODE]_.
28
29 Any of the excluded characters that are present in path must be encoded
30 using one of the following escape sequences:
31 @@ -164,8 +163,7 @@ slash used as path component separator should be replaced by forward
32 slash instead.
33
34 The encoding can be used for other characters as well. In particular,
35 -escaping control characters is recommended to ensure that the file
36 -works correctly in text editors.
37 +escaping non-printable characters might be desirable.
38
39
40 File verification
41 @@ -593,16 +591,18 @@ This specification aims to avoid arbitrary restrictions. For this
42 reason, filename characters are only restricted by excluding three
43 technically problematic groups:
44
45 -1. The NULL character (``U+0000``) is normally used to indicate the end
46 - of a null-terminated string. Its use could therefore break programs
47 - written using C. Furthermore, it is not allowed in any known
48 - filesystem.
49 -
50 -2. The backwards slash character (``\``) is used as path separator
51 +1. The backwards slash character (``\``) is used as path separator
52 on Windows systems, so it's extremely unlikely to be used in real
53 filenames. For this reason it is used to implement character
54 encoding with minimal risk of breaking backwards compatibility.
55
56 +2. The control characters can trigger special behavior in various
57 + programs and confuse them from recognizing text files. In particular,
58 + the NULL character (``U+0000``) is normally used to indicate the end
59 + of a null-terminated string. Its use could therefore break
60 + implementations written in the C language. Other control characters
61 + could trigger various formatting routines, garbling text output.
62 +
63 3. Whitespace characters are used to separate Manifest fields
64 and entries. While technically it would be enough to restrict space
65 (``U+0020``) character that is normally used as the separator