1 |
commit: 9d819c9a981416936dcda2f55e54ea70e494e59e |
2 |
Author: Michał Górny <mgorny <AT> gentoo <DOT> org> |
3 |
AuthorDate: Mon Nov 20 18:40:41 2017 +0000 |
4 |
Commit: Michał Górny <mgorny <AT> gentoo <DOT> org> |
5 |
CommitDate: Mon Nov 20 18:40:41 2017 +0000 |
6 |
URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=9d819c9a |
7 |
|
8 |
glep-0074: Disallow filenames containing whitespace |
9 |
|
10 |
glep-0074.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ |
11 |
1 file changed, 52 insertions(+) |
12 |
|
13 |
diff --git a/glep-0074.rst b/glep-0074.rst |
14 |
index f96a58e..46ad9fe 100644 |
15 |
--- a/glep-0074.rst |
16 |
+++ b/glep-0074.rst |
17 |
@@ -132,6 +132,13 @@ are not otherwise ignored reside on a different filesystem, or symbolic |
18 |
links point to targets on a different filesystem, they must |
19 |
be explicitly excluded via ``IGNORE``. |
20 |
|
21 |
+All paths specified in the Manifest file must consist of characters |
22 |
+corresponding to valid UTF-8 code points excluding the NULL character |
23 |
+(``U+0000``) and characters classified as whitespace in the current |
24 |
+version of the Unicode standard [#UNICODE]_. It is an error to use |
25 |
+Manifest files in directories containing files whose names contain |
26 |
+the disallowed characters. |
27 |
+ |
28 |
|
29 |
File verification |
30 |
----------------- |
31 |
@@ -542,6 +549,45 @@ In particular, tools might then claim that a file does not exist when |
32 |
it clearly does because it was skipped due to filesystem boundaries. |
33 |
|
34 |
|
35 |
+Filename character set restriction |
36 |
+---------------------------------- |
37 |
+ |
38 |
+The valid set of filename characters for the Gentoo repository |
39 |
+is restricted by the devmanual 'File Naming Rules' section |
40 |
+[#FILE-NAMING-RULES]_, and enforced via a git hook. The valid distfile |
41 |
+names are not restricted explicitly -- however, the PMS dependency |
42 |
+specification syntax [#PMS-FETCH]_ implicitly makes it impossible to use |
43 |
+filenames containing whitespace. |
44 |
+ |
45 |
+This specification aims to avoid arbitrary restrictions. For this |
46 |
+reason, the filename characters are only restricted by excluding two |
47 |
+technically problematic groups: |
48 |
+ |
49 |
+1. The NULL character (``U+0000``) is normally used to indicate the end |
50 |
+ of a null-terminated string. Its use could therefore break programs |
51 |
+ written using C. Furthermore, it is not allowed in any known |
52 |
+ filesystem. |
53 |
+ |
54 |
+2. The whitespace characters are used to separate Manifest fields. While |
55 |
+ technically it would be enough to restrict space (``U+0020``) |
56 |
+ character that is normally used as the separator, all whitespace |
57 |
+ characters are forbidden to avoid confusion and implementation |
58 |
+ errors. |
59 |
+ |
60 |
+While the specification could be extended to allow such filenames |
61 |
+by using some form of escaping, there is currently no apparent need |
62 |
+for such a feature. |
63 |
+ |
64 |
+Historically, Portage attempted to overcome the whitespace limitation |
65 |
+by attempting to locate the size field and take everything before it |
66 |
+as filename. This was terribly fragile and even if it worked, it would |
67 |
+solve the problem only partially. |
68 |
+ |
69 |
+Since the same restrictions apply to ``IGNORE`` rules, it is currently |
70 |
+not possible to either list or ignore the file using whitespace |
71 |
+characters. Therefore, the presence of such files is forbidden entirely. |
72 |
+ |
73 |
+ |
74 |
File verification model |
75 |
----------------------- |
76 |
|
77 |
@@ -880,10 +926,16 @@ References |
78 |
.. [#GLEP61] GLEP 61: Manifest2 compression |
79 |
(https://www.gentoo.org/glep/glep-0061.html) |
80 |
|
81 |
+.. [#UNICODE] The Unicode standard |
82 |
+ (https://unicode.org/versions/latest/) |
83 |
+ |
84 |
.. [#PMS-FETCH] Package Manager Specification: Dependency Specification |
85 |
Format - SRC_URI |
86 |
(https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10) |
87 |
|
88 |
+.. [#FILE-NAMING-RULES] Ebuild File Format -- Gentoo Development Guide |
89 |
+ (https://devmanual.gentoo.org/ebuild-writing/file-format/#file-naming-rules) |
90 |
+ |
91 |
.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm |
92 |
(https://www.ietf.org/rfc/rfc1321.txt) |