1 |
W dniu pon, 20.11.2017 o godzinie 22∶37 +0100, użytkownik Ulrich Mueller |
2 |
napisał: |
3 |
> > > > > > On Mon, 20 Nov 2017, Michał Górny wrote: |
4 |
> > New changes: |
5 |
> > 9d819c9 glep-0074: Disallow filenames containing whitespace |
6 |
> > 4124b2f glep-0074: Explicitly specify UTF-8 encoding |
7 |
> > 7f9bd9f glep-0074: Include suggestions from Daniel Campbell |
8 |
> |
9 |
> Here are a few comments (quoting below only the parts of the text |
10 |
> referenced by them): |
11 |
> |
12 |
> > The Manifest files use UTF-8 encoding. |
13 |
> |
14 |
> I don't understand the purpose of that requirement. The only place |
15 |
> where bytes outside of the ASCII range can occur are names of |
16 |
> distfiles, and these should simply be passed transparently. Otherwise, |
17 |
> you would have to reject any sequence of non-ASCII bytes that doesn't |
18 |
> form a valid UTF-8 sequence, which looks like an arbitrary restriction |
19 |
> to me. |
20 |
|
21 |
Let me reply in parts. |
22 |
|
23 |
Why not plain ASCII? Because the spec tries to avoid entirely arbitrary |
24 |
restrictions, and forcing everyone to use just ASCII entirely counts |
25 |
as such. |
26 |
|
27 |
Why not plain bytestring? Mostly because it's really PITA to work |
28 |
on them in Python. Besides, you can't allow arbitrary bytestring since |
29 |
you still need to apply restrictions making it safe to parse in text |
30 |
context, i.e. forbid 0x20, 0x0A, possibly more. Which makes |
31 |
the definition kinda silly in the end. Not to mention transferring files |
32 |
over systems which can recode filenames but will not recode Manifest |
33 |
contents. |
34 |
|
35 |
Why UTF-8 then? Because it's quite reliable and widely established. |
36 |
It works for most of the people out of the box. Those who use other |
37 |
encodings can usually transcode reliably. It's what we're using |
38 |
in ebuilds and everywhere else wrt GLEP 31, so I don't think we should |
39 |
make Manifests any different. |
40 |
|
41 |
> > It is an error for a single file to be matched by multiple entries |
42 |
> > of different semantics, file size or checksum values. It is an error |
43 |
> > to specify another entry for a file matching ``IGNORE``, or one of its |
44 |
> > subdirectories. |
45 |
> |
46 |
> What about regular files in a directory (or subdirectory) matched by |
47 |
> IGNORE? Looks like this case is not covered (?). |
48 |
|
49 |
Ignored regular files must not have any other (e.g. DATA) entries. |
50 |
Otherwise the expected behavior is unclear -- are we supposed to verify |
51 |
the file or ignore it? |
52 |
|
53 |
> > All paths specified in the Manifest file must consist of characters |
54 |
> > corresponding to valid UTF-8 code points excluding the NULL character |
55 |
> > (``U+0000``) and characters classified as whitespace in the current |
56 |
> > version of the Unicode standard [#UNICODE]_. It is an error to use |
57 |
> > Manifest files in directories containing files whose names contain |
58 |
> > the disallowed characters. |
59 |
> |
60 |
> See above. I believe that NUL and ASCII whitespace (i.e. characters 09 |
61 |
> 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences like |
62 |
> "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE MARK") |
63 |
> doesn't make sense. |
64 |
|
65 |
The restriction is meant to be intentionally wider to prevent problems |
66 |
with implementations which e.g. use Python's str.split() or '\S' regular |
67 |
expression character (Portage). When working in Unicode-compliant mode, |
68 |
those can match additional whitespace characters, and I'm rejecting them |
69 |
to be on the safe side. |
70 |
|
71 |
> > During the verification process, the client should compare the timestamp |
72 |
> > against the update time obtained from a local clock or a trusted time |
73 |
> > source. If the comparison result indicates that the Manifest at the time |
74 |
> > of receiving was already significantly outdated, the client should |
75 |
> > either fail the verification or require manual confirmation from user. |
76 |
> |
77 |
> s/from user./from the user./ |
78 |
> |
79 |
> > ``TIMESTAMP <iso8601>`` |
80 |
> > Specifies a timestamp of when the Manifest file was last updated. |
81 |
> > The timestamp must be a valid second-precision ISO8601 extended format |
82 |
> |
83 |
> s/ISO8601/ISO 8601/ |
84 |
|
85 |
Both done. |
86 |
|
87 |
> |
88 |
> > ``IGNORE <path>`` |
89 |
> > Ignores a subdirectory or file from Manifest checks. If the specified |
90 |
> > path is present, it and its contents are omitted from the Manifest |
91 |
> > verification (always pass). *Path* must be a plain file or directory |
92 |
> > path without a trailing slash, and must not contain wildcards. |
93 |
> |
94 |
> What does that mean? Wildcards are not special (so "foo*" will match |
95 |
> literally), or wildcard characters like "*" are not allowed at all? |
96 |
|
97 |
Not special. Will reword to: |
98 |
|
99 |
| Wildcards are not supported and wildcard characters are interpreted |
100 |
| literally. |
101 |
|
102 |
> |
103 |
> > ``AUX <filename> <size> <checksums>...`` |
104 |
> > Equivalent to the ``DATA`` type, except that the filename is relative |
105 |
> > to ``files/`` subdirectory. |
106 |
> |
107 |
> s/to/to the/ |
108 |
> |
109 |
> > 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest |
110 |
> > files according to `file verification`_ section, and include their |
111 |
> |
112 |
> s/according to/according to the/ |
113 |
> |
114 |
> > 6. Verify the entries in *covered* set for incompatible duplicates |
115 |
> |
116 |
> s/in *covered* set/in the *covered* set/ |
117 |
> |
118 |
> > 7. Verify all the files in the union of the *present* and *covered* |
119 |
> > sets, according to `file verification`_ section. |
120 |
> |
121 |
> s/to/to the/ |
122 |
> |
123 |
> > a. If a ``IGNORE`` entry in the ``Manifest`` file covers |
124 |
> > the *original* directory (or one of the parent directories), stop. |
125 |
> |
126 |
> s/a ``IGNORE`` entry/an ``IGNORE`` entry/ |
127 |
|
128 |
All done. |
129 |
|
130 |
> |
131 |
> > An example top-level Manifest file for the Gentoo repository would have |
132 |
> > the following content:: |
133 |
> > TIMESTAMP 2017-10-30T10:11:12Z |
134 |
> > IGNORE distfiles |
135 |
> > IGNORE local |
136 |
> > IGNORE lost+found |
137 |
> > IGNORE packages |
138 |
> > MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb.. |
139 |
> > ... |
140 |
> > MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915.. |
141 |
> > ... |
142 |
> > An example modern Manifest (disregarding backwards compatibility) |
143 |
> > for a package directory would have the following content:: |
144 |
> > DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d.. |
145 |
> > DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749.. |
146 |
> > DATA metadata.xml 664 SHA256 97c6.. SHA512 1175.. |
147 |
> > DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468.. |
148 |
> > DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919.. |
149 |
> > DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33.. |
150 |
> > DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d.. |
151 |
> |
152 |
> Update hashes to BLAKE2B SHA512? |
153 |
|
154 |
I don't really want to go through the hoop of updating the first two |
155 |
bytes of each hash, and I don't think it'd be nice to replace the key |
156 |
while keeping incorrect value. Even that it does not serve any purpose. |
157 |
|
158 |
> |
159 |
> > This specification aims to avoid arbitrary restrictions. For this |
160 |
> > reason, the filename characters are only restricted by excluding two |
161 |
> |
162 |
> s/the filename characters/filename characters/ |
163 |
> |
164 |
> > technically problematic groups: |
165 |
> > 1. The NULL character (``U+0000``) is normally used to indicate the end |
166 |
> > of a null-terminated string. Its use could therefore break programs |
167 |
> > written using C. Furthermore, it is not allowed in any known |
168 |
> > filesystem. |
169 |
> > 2. The whitespace characters are used to separate Manifest fields. While |
170 |
> |
171 |
> s/The whitespace characters/Whitespace characters/ |
172 |
> |
173 |
> > 2. being able to run update automatically generated files locally |
174 |
> > without causing unnecessary verification failures. |
175 |
> |
176 |
> Strike the word "run"? |
177 |
> |
178 |
> > Strictly speaking, this information is already provided by the various |
179 |
> > ``metadata/timestamp*`` files that are already present. However, |
180 |
> |
181 |
> Twice "already" in this sentence. |
182 |
> |
183 |
> > The OpenPGP cleartext signature covers the contents of the Manifest, |
184 |
> > and is therefore compressed along with them. The possibility of using |
185 |
> > detached signature has been considered but it was rejected as |
186 |
> |
187 |
> s/detached signature/a detached signature/ |
188 |
> |
189 |
> > The existence of additional entries for uncompressed Manifest checksums |
190 |
> > was debated. However, plain entries for the uncompressed file would |
191 |
> > be confusing if only the compressed file existed, and conflicting |
192 |
> > if both uncompressed and compressed variants existed. Furthermore, |
193 |
> > it has been pointed out that ``DIST`` entries do not have uncompressed |
194 |
> > variant either. |
195 |
> |
196 |
> s/uncompressed variant/an uncompressed variant/ |
197 |
> |
198 |
> > .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries |
199 |
> > at the time of writing are duplicate, representing a 2 MiB |
200 |
> > out of 25 MiB of DIST entries altogether. |
201 |
> |
202 |
> s/a 2 MiB/2 MiB/ |
203 |
> |
204 |
> > Copyright |
205 |
> > ========= |
206 |
> |
207 |
> There should be two blank lines before this section heading (as |
208 |
> required by GLEP 2). |
209 |
> |
210 |
> Ulrich |
211 |
|
212 |
All fixed. |
213 |
|
214 |
-- |
215 |
Best regards, |
216 |
Michał Górny |