Gentoo Archives: gentoo-dev

From: Ulrich Mueller <ulm@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v2]
Date: Mon, 20 Nov 2017 21:37:34
Message-Id: 23059.19206.553582.655241@a1i15.kph.uni-mainz.de
In Reply to: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v2] by "Michał Górny"
1 >>>>> On Mon, 20 Nov 2017, Michał Górny wrote:
2
3 > New changes:
4
5 > 9d819c9 glep-0074: Disallow filenames containing whitespace
6 > 4124b2f glep-0074: Explicitly specify UTF-8 encoding
7 > 7f9bd9f glep-0074: Include suggestions from Daniel Campbell
8
9 Here are a few comments (quoting below only the parts of the text
10 referenced by them):
11
12 > The Manifest files use UTF-8 encoding.
13
14 I don't understand the purpose of that requirement. The only place
15 where bytes outside of the ASCII range can occur are names of
16 distfiles, and these should simply be passed transparently. Otherwise,
17 you would have to reject any sequence of non-ASCII bytes that doesn't
18 form a valid UTF-8 sequence, which looks like an arbitrary restriction
19 to me.
20
21 > It is an error for a single file to be matched by multiple entries
22 > of different semantics, file size or checksum values. It is an error
23 > to specify another entry for a file matching ``IGNORE``, or one of its
24 > subdirectories.
25
26 What about regular files in a directory (or subdirectory) matched by
27 IGNORE? Looks like this case is not covered (?).
28
29 > All paths specified in the Manifest file must consist of characters
30 > corresponding to valid UTF-8 code points excluding the NULL character
31 > (``U+0000``) and characters classified as whitespace in the current
32 > version of the Unicode standard [#UNICODE]_. It is an error to use
33 > Manifest files in directories containing files whose names contain
34 > the disallowed characters.
35
36 See above. I believe that NUL and ASCII whitespace (i.e. characters 09
37 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences like
38 "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE MARK")
39 doesn't make sense.
40
41 > During the verification process, the client should compare the timestamp
42 > against the update time obtained from a local clock or a trusted time
43 > source. If the comparison result indicates that the Manifest at the time
44 > of receiving was already significantly outdated, the client should
45 > either fail the verification or require manual confirmation from user.
46
47 s/from user./from the user./
48
49 > ``TIMESTAMP <iso8601>``
50 > Specifies a timestamp of when the Manifest file was last updated.
51 > The timestamp must be a valid second-precision ISO8601 extended format
52
53 s/ISO8601/ISO 8601/
54
55 > ``IGNORE <path>``
56 > Ignores a subdirectory or file from Manifest checks. If the specified
57 > path is present, it and its contents are omitted from the Manifest
58 > verification (always pass). *Path* must be a plain file or directory
59 > path without a trailing slash, and must not contain wildcards.
60
61 What does that mean? Wildcards are not special (so "foo*" will match
62 literally), or wildcard characters like "*" are not allowed at all?
63
64 > ``AUX <filename> <size> <checksums>...``
65 > Equivalent to the ``DATA`` type, except that the filename is relative
66 > to ``files/`` subdirectory.
67
68 s/to/to the/
69
70 > 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
71 > files according to `file verification`_ section, and include their
72
73 s/according to/according to the/
74
75 > 6. Verify the entries in *covered* set for incompatible duplicates
76
77 s/in *covered* set/in the *covered* set/
78
79 > 7. Verify all the files in the union of the *present* and *covered*
80 > sets, according to `file verification`_ section.
81
82 s/to/to the/
83
84 > a. If a ``IGNORE`` entry in the ``Manifest`` file covers
85 > the *original* directory (or one of the parent directories), stop.
86
87 s/a ``IGNORE`` entry/an ``IGNORE`` entry/
88
89 > An example top-level Manifest file for the Gentoo repository would have
90 > the following content::
91
92 > TIMESTAMP 2017-10-30T10:11:12Z
93 > IGNORE distfiles
94 > IGNORE local
95 > IGNORE lost+found
96 > IGNORE packages
97 > MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
98 > ...
99 > MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
100 > ...
101
102 > An example modern Manifest (disregarding backwards compatibility)
103 > for a package directory would have the following content::
104
105 > DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
106 > DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
107 > DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
108 > DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
109 > DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
110 > DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
111 > DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
112
113 Update hashes to BLAKE2B SHA512?
114
115 > This specification aims to avoid arbitrary restrictions. For this
116 > reason, the filename characters are only restricted by excluding two
117
118 s/the filename characters/filename characters/
119
120 > technically problematic groups:
121
122 > 1. The NULL character (``U+0000``) is normally used to indicate the end
123 > of a null-terminated string. Its use could therefore break programs
124 > written using C. Furthermore, it is not allowed in any known
125 > filesystem.
126
127 > 2. The whitespace characters are used to separate Manifest fields. While
128
129 s/The whitespace characters/Whitespace characters/
130
131 > 2. being able to run update automatically generated files locally
132 > without causing unnecessary verification failures.
133
134 Strike the word "run"?
135
136 > Strictly speaking, this information is already provided by the various
137 > ``metadata/timestamp*`` files that are already present. However,
138
139 Twice "already" in this sentence.
140
141 > The OpenPGP cleartext signature covers the contents of the Manifest,
142 > and is therefore compressed along with them. The possibility of using
143 > detached signature has been considered but it was rejected as
144
145 s/detached signature/a detached signature/
146
147 > The existence of additional entries for uncompressed Manifest checksums
148 > was debated. However, plain entries for the uncompressed file would
149 > be confusing if only the compressed file existed, and conflicting
150 > if both uncompressed and compressed variants existed. Furthermore,
151 > it has been pointed out that ``DIST`` entries do not have uncompressed
152 > variant either.
153
154 s/uncompressed variant/an uncompressed variant/
155
156 > .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
157 > at the time of writing are duplicate, representing a 2 MiB
158 > out of 25 MiB of DIST entries altogether.
159
160 s/a 2 MiB/2 MiB/
161
162 > Copyright
163 > =========
164
165 There should be two blank lines before this section heading (as
166 required by GLEP 2).
167
168 Ulrich

Replies