1 |
commit: ed111f85c3e7ab98678ee0379589281a2c92380c |
2 |
Author: Michał Górny <mgorny <AT> gentoo <DOT> org> |
3 |
AuthorDate: Thu Nov 23 18:37:39 2017 +0000 |
4 |
Commit: Michał Górny <mgorny <AT> gentoo <DOT> org> |
5 |
CommitDate: Thu Nov 23 18:37:39 2017 +0000 |
6 |
URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=ed111f85 |
7 |
|
8 |
glep-0074: Always exclude control characters |
9 |
|
10 |
glep-0074.rst | 24 ++++++++++++------------ |
11 |
1 file changed, 12 insertions(+), 12 deletions(-) |
12 |
|
13 |
diff --git a/glep-0074.rst b/glep-0074.rst |
14 |
index 8687969..6db6caa 100644 |
15 |
--- a/glep-0074.rst |
16 |
+++ b/glep-0074.rst |
17 |
@@ -138,10 +138,9 @@ Path and filename encoding |
18 |
-------------------------- |
19 |
|
20 |
The path fields in the Manifest file must consist of characters |
21 |
-corresponding to valid UTF-8 code points excluding the NULL character |
22 |
-(``U+0000``), the backwards slash (``\``) and characters classified |
23 |
-as whitespace in the current version of the Unicode standard |
24 |
-[#UNICODE]_. |
25 |
+corresponding to valid UTF-8 code points excluding the backwards slash |
26 |
+(``\``) and characters classified as control characters and whitespace |
27 |
+in the current version of the Unicode standard [#UNICODE]_. |
28 |
|
29 |
Any of the excluded characters that are present in path must be encoded |
30 |
using one of the following escape sequences: |
31 |
@@ -164,8 +163,7 @@ slash used as path component separator should be replaced by forward |
32 |
slash instead. |
33 |
|
34 |
The encoding can be used for other characters as well. In particular, |
35 |
-escaping control characters is recommended to ensure that the file |
36 |
-works correctly in text editors. |
37 |
+escaping non-printable characters might be desirable. |
38 |
|
39 |
|
40 |
File verification |
41 |
@@ -593,16 +591,18 @@ This specification aims to avoid arbitrary restrictions. For this |
42 |
reason, filename characters are only restricted by excluding three |
43 |
technically problematic groups: |
44 |
|
45 |
-1. The NULL character (``U+0000``) is normally used to indicate the end |
46 |
- of a null-terminated string. Its use could therefore break programs |
47 |
- written using C. Furthermore, it is not allowed in any known |
48 |
- filesystem. |
49 |
- |
50 |
-2. The backwards slash character (``\``) is used as path separator |
51 |
+1. The backwards slash character (``\``) is used as path separator |
52 |
on Windows systems, so it's extremely unlikely to be used in real |
53 |
filenames. For this reason it is used to implement character |
54 |
encoding with minimal risk of breaking backwards compatibility. |
55 |
|
56 |
+2. The control characters can trigger special behavior in various |
57 |
+ programs and confuse them from recognizing text files. In particular, |
58 |
+ the NULL character (``U+0000``) is normally used to indicate the end |
59 |
+ of a null-terminated string. Its use could therefore break |
60 |
+ implementations written in the C language. Other control characters |
61 |
+ could trigger various formatting routines, garbling text output. |
62 |
+ |
63 |
3. Whitespace characters are used to separate Manifest fields |
64 |
and entries. While technically it would be enough to restrict space |
65 |
(``U+0020``) character that is normally used as the separator |