Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v2]
Date: Tue, 21 Nov 2017 17:14:27
Message-Id: 1511284454.1074.11.camel@gentoo.org
In Reply to: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v2] by Ulrich Mueller
1 W dniu pon, 20.11.2017 o godzinie 22∶37 +0100, użytkownik Ulrich Mueller
2 napisał:
3 > > > > > > On Mon, 20 Nov 2017, Michał Górny wrote:
4 > > New changes:
5 > > 9d819c9 glep-0074: Disallow filenames containing whitespace
6 > > 4124b2f glep-0074: Explicitly specify UTF-8 encoding
7 > > 7f9bd9f glep-0074: Include suggestions from Daniel Campbell
8 >
9 > Here are a few comments (quoting below only the parts of the text
10 > referenced by them):
11 >
12 > > The Manifest files use UTF-8 encoding.
13 >
14 > I don't understand the purpose of that requirement. The only place
15 > where bytes outside of the ASCII range can occur are names of
16 > distfiles, and these should simply be passed transparently. Otherwise,
17 > you would have to reject any sequence of non-ASCII bytes that doesn't
18 > form a valid UTF-8 sequence, which looks like an arbitrary restriction
19 > to me.
20
21 Let me reply in parts.
22
23 Why not plain ASCII? Because the spec tries to avoid entirely arbitrary
24 restrictions, and forcing everyone to use just ASCII entirely counts
25 as such.
26
27 Why not plain bytestring? Mostly because it's really PITA to work
28 on them in Python. Besides, you can't allow arbitrary bytestring since
29 you still need to apply restrictions making it safe to parse in text
30 context, i.e. forbid 0x20, 0x0A, possibly more. Which makes
31 the definition kinda silly in the end. Not to mention transferring files
32 over systems which can recode filenames but will not recode Manifest
33 contents.
34
35 Why UTF-8 then? Because it's quite reliable and widely established.
36 It works for most of the people out of the box. Those who use other
37 encodings can usually transcode reliably. It's what we're using
38 in ebuilds and everywhere else wrt GLEP 31, so I don't think we should
39 make Manifests any different.
40
41 > > It is an error for a single file to be matched by multiple entries
42 > > of different semantics, file size or checksum values. It is an error
43 > > to specify another entry for a file matching ``IGNORE``, or one of its
44 > > subdirectories.
45 >
46 > What about regular files in a directory (or subdirectory) matched by
47 > IGNORE? Looks like this case is not covered (?).
48
49 Ignored regular files must not have any other (e.g. DATA) entries.
50 Otherwise the expected behavior is unclear -- are we supposed to verify
51 the file or ignore it?
52
53 > > All paths specified in the Manifest file must consist of characters
54 > > corresponding to valid UTF-8 code points excluding the NULL character
55 > > (``U+0000``) and characters classified as whitespace in the current
56 > > version of the Unicode standard [#UNICODE]_. It is an error to use
57 > > Manifest files in directories containing files whose names contain
58 > > the disallowed characters.
59 >
60 > See above. I believe that NUL and ASCII whitespace (i.e. characters 09
61 > 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences like
62 > "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE MARK")
63 > doesn't make sense.
64
65 The restriction is meant to be intentionally wider to prevent problems
66 with implementations which e.g. use Python's str.split() or '\S' regular
67 expression character (Portage). When working in Unicode-compliant mode,
68 those can match additional whitespace characters, and I'm rejecting them
69 to be on the safe side.
70
71 > > During the verification process, the client should compare the timestamp
72 > > against the update time obtained from a local clock or a trusted time
73 > > source. If the comparison result indicates that the Manifest at the time
74 > > of receiving was already significantly outdated, the client should
75 > > either fail the verification or require manual confirmation from user.
76 >
77 > s/from user./from the user./
78 >
79 > > ``TIMESTAMP <iso8601>``
80 > > Specifies a timestamp of when the Manifest file was last updated.
81 > > The timestamp must be a valid second-precision ISO8601 extended format
82 >
83 > s/ISO8601/ISO 8601/
84
85 Both done.
86
87 >
88 > > ``IGNORE <path>``
89 > > Ignores a subdirectory or file from Manifest checks. If the specified
90 > > path is present, it and its contents are omitted from the Manifest
91 > > verification (always pass). *Path* must be a plain file or directory
92 > > path without a trailing slash, and must not contain wildcards.
93 >
94 > What does that mean? Wildcards are not special (so "foo*" will match
95 > literally), or wildcard characters like "*" are not allowed at all?
96
97 Not special. Will reword to:
98
99 | Wildcards are not supported and wildcard characters are interpreted
100 | literally.
101
102 >
103 > > ``AUX <filename> <size> <checksums>...``
104 > > Equivalent to the ``DATA`` type, except that the filename is relative
105 > > to ``files/`` subdirectory.
106 >
107 > s/to/to the/
108 >
109 > > 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
110 > > files according to `file verification`_ section, and include their
111 >
112 > s/according to/according to the/
113 >
114 > > 6. Verify the entries in *covered* set for incompatible duplicates
115 >
116 > s/in *covered* set/in the *covered* set/
117 >
118 > > 7. Verify all the files in the union of the *present* and *covered*
119 > > sets, according to `file verification`_ section.
120 >
121 > s/to/to the/
122 >
123 > > a. If a ``IGNORE`` entry in the ``Manifest`` file covers
124 > > the *original* directory (or one of the parent directories), stop.
125 >
126 > s/a ``IGNORE`` entry/an ``IGNORE`` entry/
127
128 All done.
129
130 >
131 > > An example top-level Manifest file for the Gentoo repository would have
132 > > the following content::
133 > > TIMESTAMP 2017-10-30T10:11:12Z
134 > > IGNORE distfiles
135 > > IGNORE local
136 > > IGNORE lost+found
137 > > IGNORE packages
138 > > MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
139 > > ...
140 > > MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
141 > > ...
142 > > An example modern Manifest (disregarding backwards compatibility)
143 > > for a package directory would have the following content::
144 > > DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
145 > > DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
146 > > DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
147 > > DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
148 > > DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
149 > > DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
150 > > DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
151 >
152 > Update hashes to BLAKE2B SHA512?
153
154 I don't really want to go through the hoop of updating the first two
155 bytes of each hash, and I don't think it'd be nice to replace the key
156 while keeping incorrect value. Even that it does not serve any purpose.
157
158 >
159 > > This specification aims to avoid arbitrary restrictions. For this
160 > > reason, the filename characters are only restricted by excluding two
161 >
162 > s/the filename characters/filename characters/
163 >
164 > > technically problematic groups:
165 > > 1. The NULL character (``U+0000``) is normally used to indicate the end
166 > > of a null-terminated string. Its use could therefore break programs
167 > > written using C. Furthermore, it is not allowed in any known
168 > > filesystem.
169 > > 2. The whitespace characters are used to separate Manifest fields. While
170 >
171 > s/The whitespace characters/Whitespace characters/
172 >
173 > > 2. being able to run update automatically generated files locally
174 > > without causing unnecessary verification failures.
175 >
176 > Strike the word "run"?
177 >
178 > > Strictly speaking, this information is already provided by the various
179 > > ``metadata/timestamp*`` files that are already present. However,
180 >
181 > Twice "already" in this sentence.
182 >
183 > > The OpenPGP cleartext signature covers the contents of the Manifest,
184 > > and is therefore compressed along with them. The possibility of using
185 > > detached signature has been considered but it was rejected as
186 >
187 > s/detached signature/a detached signature/
188 >
189 > > The existence of additional entries for uncompressed Manifest checksums
190 > > was debated. However, plain entries for the uncompressed file would
191 > > be confusing if only the compressed file existed, and conflicting
192 > > if both uncompressed and compressed variants existed. Furthermore,
193 > > it has been pointed out that ``DIST`` entries do not have uncompressed
194 > > variant either.
195 >
196 > s/uncompressed variant/an uncompressed variant/
197 >
198 > > .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
199 > > at the time of writing are duplicate, representing a 2 MiB
200 > > out of 25 MiB of DIST entries altogether.
201 >
202 > s/a 2 MiB/2 MiB/
203 >
204 > > Copyright
205 > > =========
206 >
207 > There should be two blank lines before this section heading (as
208 > required by GLEP 2).
209 >
210 > Ulrich
211
212 All fixed.
213
214 --
215 Best regards,
216 Michał Górny

Replies

Subject Author
Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v2] Ulrich Mueller <ulm@g.o>