1 |
commit: a8ec3d4f1350cdffd3f5d4f7f3fbd8e3c5c7ac40 |
2 |
Author: Michał Górny <mgorny <AT> gentoo <DOT> org> |
3 |
AuthorDate: Sun Oct 22 13:19:20 2017 +0000 |
4 |
Commit: Michał Górny <mgorny <AT> gentoo <DOT> org> |
5 |
CommitDate: Mon Nov 13 16:33:01 2017 +0000 |
6 |
URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=a8ec3d4f |
7 |
|
8 |
glep-0074: Full-tree verification using Manifest files |
9 |
|
10 |
glep-0074.rst | 749 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
11 |
1 file changed, 749 insertions(+) |
12 |
|
13 |
diff --git a/glep-0074.rst b/glep-0074.rst |
14 |
new file mode 100644 |
15 |
index 0000000..e9f8bad |
16 |
--- /dev/null |
17 |
+++ b/glep-0074.rst |
18 |
@@ -0,0 +1,749 @@ |
19 |
+--- |
20 |
+GLEP: 74 |
21 |
+Title: Full-tree verification using Manifest files |
22 |
+Author: Michał Górny <mgorny@g.o>, |
23 |
+ Robin Hugh Johnson <robbat2@g.o>, |
24 |
+ Ulrich Müller <ulm@g.o> |
25 |
+Type: Standards Track |
26 |
+Status: Draft |
27 |
+Version: 1 |
28 |
+Created: 2017-10-21 |
29 |
+Last-Modified: 2017-10-26 |
30 |
+Post-History: 2017-10-26 |
31 |
+Content-Type: text/x-rst |
32 |
+Requires: 59, 61 |
33 |
+Replaces: 44, 58, 60 |
34 |
+--- |
35 |
+ |
36 |
+Abstract |
37 |
+======== |
38 |
+ |
39 |
+This GLEP extends the Manifest file format to cover full-tree file |
40 |
+integrity and authenticity checks.The format aims to be future-proof, |
41 |
+efficient and provide means of backwards compatibility. |
42 |
+ |
43 |
+ |
44 |
+Motivation |
45 |
+========== |
46 |
+ |
47 |
+The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current |
48 |
+means of verifying the integrity of distfiles and package files |
49 |
+in Gentoo. Combined with OpenPGP signatures, they provide means to |
50 |
+ensure the authenticity of the covered files. However, as noted |
51 |
+in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree |
52 |
+authenticity verification as they do not cover any files outside |
53 |
+the package directory. In particular, they provide multiple ways |
54 |
+for a third party to inject malicious code into the ebuild environment. |
55 |
+ |
56 |
+Historically, the topic of providing authenticity coverage for the whole |
57 |
+repository has been mentioned multiple times. The most noteworthy effort |
58 |
+are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008. |
59 |
+They were accepted by the Council in 2010 but have never been |
60 |
+implemented. When potential implementation work started in 2017, a new |
61 |
+discussion about the specification arose. It prompted the creation |
62 |
+of a competing GLEP that would provide a redesigned alternative to |
63 |
+the old GLEPs. |
64 |
+ |
65 |
+This specification is designed with the following goals in mind: |
66 |
+ |
67 |
+1. It should provide means to ensure the authenticity of the complete |
68 |
+ repository, including preventing the injection of additional files. |
69 |
+ |
70 |
+2. Alike the original Manifest2, the files should be split into two |
71 |
+ groups — files whose authenticity is critical, and those whose |
72 |
+ mismatch may be accepted in non-strict mode. The same classification |
73 |
+ should apply both to files listed in Manifests, and to stray files |
74 |
+ present only in the repository. |
75 |
+ |
76 |
+3. The format should be universal enough to work both for the Gentoo |
77 |
+ repository and third-party repositories of different characteristics. |
78 |
+ |
79 |
+4. The Manifest files should be verifiable stand-alone, that is without |
80 |
+ knowing any details about the underlying repository format. |
81 |
+ |
82 |
+ |
83 |
+Specification |
84 |
+============= |
85 |
+ |
86 |
+Manifest file format |
87 |
+-------------------- |
88 |
+ |
89 |
+This specification reuses and extends the Manifest file format defined |
90 |
+in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is |
91 |
+repurposed as a generic *tag* that could also indicate additional |
92 |
+(non-checksum) metadata. Appropriately, those tags can be followed by |
93 |
+other space-separated values. |
94 |
+ |
95 |
+Unless specified otherwise, the paths used in the Manifest files |
96 |
+are relative to the directory containing the Manifest file. The paths |
97 |
+must not reference the parent directory (``..``). |
98 |
+ |
99 |
+ |
100 |
+Manifest file locations and nesting |
101 |
+----------------------------------- |
102 |
+ |
103 |
+The ``Manifest`` file located in the root directory of the repository |
104 |
+is called top-level Manifest, and it is used to perform the full-tree |
105 |
+verification. In order to verify the authenticity, it must be signed |
106 |
+using OpenPGP, using the armored cleartext format. |
107 |
+ |
108 |
+The top-level Manifest may reference sub-Manifests contained |
109 |
+in subdirectories of the repository. The sub-Manifests are traditionally |
110 |
+named ``Manifest``; however, the implementation must support arbitrary |
111 |
+names, including the possibility of multiple (split) Manifests |
112 |
+for a single directory. The sub-Manifest can only cover the files inside |
113 |
+the directory tree where it resides. |
114 |
+ |
115 |
+The sub-Manifest can also be signed using OpenPGP armored cleartext |
116 |
+format. However, the signature verification can be omitted if it is |
117 |
+covered by a signed top-level Manifest. |
118 |
+ |
119 |
+The Manifest files can also specify ``IGNORE`` entries to skip Manifest |
120 |
+verification of subdirectories and/or files. Files and directories |
121 |
+starting with a dot are always implicitly ignored. All files that |
122 |
+are not ignored must be covered by at least one of the Manifests. |
123 |
+ |
124 |
+A single file may be matched by multiple identical or equivalent |
125 |
+Manifest entries, if and only if the entries have the same semantics, |
126 |
+specify the same size and the checksums common to both entries match. |
127 |
+It is an error for a single file to be matched by multiple entries |
128 |
+of different semantics, file size or checksum values. It is an error |
129 |
+to specify another entry for a file matching ``IGNORE``, or one of its |
130 |
+subdirectories. |
131 |
+ |
132 |
+The file entries (except for ``IGNORE``) can be specified for regular |
133 |
+files only. Symbolic links are followed when opening files. It is |
134 |
+an error to specify an entry for a different file type. |
135 |
+ |
136 |
+All the files covered by a Manifest tree must reside on the same |
137 |
+filesystem. It is an error to specify entries applying to files |
138 |
+on another filesystem. If subdirectories of the Manifest tree reside |
139 |
+on a different filesystem, they must be explicitly excluded |
140 |
+via ``IGNORE``. |
141 |
+ |
142 |
+ |
143 |
+File verification |
144 |
+----------------- |
145 |
+ |
146 |
+When verifying a file against the Manifest, the following rules are |
147 |
+used: |
148 |
+ |
149 |
+- if a file listed in Manifest is not present, then the verification |
150 |
+ for the file fails, |
151 |
+ |
152 |
+- if a file listed in Manifest is present but has a different size |
153 |
+ or one of the checksums does not match, the verification fails, |
154 |
+ |
155 |
+- if a file is present but not listed in Manifest, the verification |
156 |
+ fails, |
157 |
+ |
158 |
+- otherwise, the verification succeeds. |
159 |
+ |
160 |
+Unless specified otherwise, the package manager must not allow using |
161 |
+any files for which the verification failed. The package manager may |
162 |
+reject any package or even the whole repository if it may refer to files |
163 |
+for which the verification failed. |
164 |
+ |
165 |
+ |
166 |
+New Manifest tags |
167 |
+----------------- |
168 |
+ |
169 |
+The Manifest files can specify the following tags: |
170 |
+ |
171 |
+``TIMESTAMP <iso8601>`` |
172 |
+ Specifies a timestamp of when the Manifest file was last updated. |
173 |
+ The timestamp must be a valid second-precision ISO8601 extended format |
174 |
+ combined date and time in UTC timezone, i.e. using the following |
175 |
+ ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used |
176 |
+ in the top-level Manifest file. The package manager can use it |
177 |
+ to detect an outdated repository checkout. |
178 |
+ |
179 |
+``MANIFEST <path> <size> <checksums>…`` |
180 |
+ Specifies a sub-Manifest. The sub-Manifest must be verified like |
181 |
+ a regular file. If the verification succeeds, the entries from |
182 |
+ the sub-Manifest are included for verification as described |
183 |
+ in `Manifest file locations and nesting`_. |
184 |
+ |
185 |
+``IGNORE <path>`` |
186 |
+ Ignores a subdirectory or file from Manifest checks. If the specified |
187 |
+ path is present, it and its contents are omitted from the Manifest |
188 |
+ verification (always pass). |
189 |
+ |
190 |
+``DATA <path> <size> <checksums>…`` |
191 |
+ Specifies a file subject to obligatory Manifest verification. |
192 |
+ The file is required to pass verification. Used for all files directly |
193 |
+ affecting package manager operation (ebuilds, eclasses, profiles). |
194 |
+ |
195 |
+``MISC <path> <size> <checksums>…`` |
196 |
+ Specifies a file subject to non-obligatory Manifest verification. |
197 |
+ The package manager may ignore a verification failure if operating |
198 |
+ in non-strict mode. Used for files that do not affect the installed |
199 |
+ packages (``metadata.xml``, ``use.desc``). |
200 |
+ |
201 |
+``OPTIONAL <path>`` |
202 |
+ Specifies a file that would be subject to non-obligatory Manifest |
203 |
+ verification if it existed. The package may ignore a stray file |
204 |
+ matching this entry if operating in non-strict mode. Used for paths |
205 |
+ that would match ``MISC`` if they existed. |
206 |
+ |
207 |
+``DIST <filename> <size> <checksums>…`` |
208 |
+ Specifies a distfile entry used to verify files fetched as part |
209 |
+ of ``SRC_URI``. The filename must match the filename used to store |
210 |
+ the fetched file as specified in the PMS [#PMS-FETCH]_. The package |
211 |
+ manager must reject the fetched file if it fails verification. |
212 |
+ ``DIST`` entries apply to all packages below the Manifest file |
213 |
+ specifying them. |
214 |
+ |
215 |
+ |
216 |
+Deprecated Manifest tags |
217 |
+------------------------ |
218 |
+ |
219 |
+For backwards compatibility, the following tags are additionally |
220 |
+allowed at the package directory level: |
221 |
+ |
222 |
+``EBUILD <filename> <size> <checksums>…`` |
223 |
+ Equivalent to the ``DATA`` type. |
224 |
+ |
225 |
+``AUX <filename> <size> <checksums>…`` |
226 |
+ Equivalent to the ``DATA`` type, except that the filename is relative |
227 |
+ to ``files/`` subdirectory. |
228 |
+ |
229 |
+ |
230 |
+Algorithm for full-tree verification |
231 |
+------------------------------------ |
232 |
+ |
233 |
+In order to perform full-tree verification, the following algorithm |
234 |
+can be used: |
235 |
+ |
236 |
+1. Collect all files present in the repository into *present* set. |
237 |
+ |
238 |
+2. Start at the top-level Manifest file. Verify its OpenPGP signature. |
239 |
+ Optionally verify the ``TIMESTAMP`` entry if present. Remove |
240 |
+ the top-level Manifest from the *present* set. |
241 |
+ |
242 |
+3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest |
243 |
+ files according to `file verification`_ section, and include their |
244 |
+ entries in the current Manifest entry list (using paths relative |
245 |
+ to directories containing the Manifests). |
246 |
+ |
247 |
+4. Process all ``IGNORE`` entries. Remove any paths matching them |
248 |
+ from the *present* set. |
249 |
+ |
250 |
+5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``, |
251 |
+ ``EBUILD`` and ``AUX`` entries into the *covered* set. |
252 |
+ |
253 |
+6. Verify all the files in the union of the *present* and *covered* |
254 |
+ sets, according to `file verification`_ section. |
255 |
+ |
256 |
+ |
257 |
+Algorithm for finding parent Manifests |
258 |
+-------------------------------------- |
259 |
+ |
260 |
+In order to find the top-level Manifest from the current directory |
261 |
+the following algorithm can be used: |
262 |
+ |
263 |
+1. Store the current directory as *original* and the device ID |
264 |
+ of the containing filesystem (``st_dev``) as *startdev*, |
265 |
+ |
266 |
+2. If the device ID of the containing filesystem (``st_dev``) |
267 |
+ of the current directory is different than *startdev*, stop. |
268 |
+ |
269 |
+3. If the current directory contains a ``Manifest`` file: |
270 |
+ |
271 |
+ a. If a ``IGNORE`` entry in the ``Manifest`` file covers |
272 |
+ the *original* directory (or one of the parent directories), stop. |
273 |
+ |
274 |
+ b. Otherwise, store the current directory as *last_found*. |
275 |
+ |
276 |
+4. If the current directory is the root system directory (``/``), stop. |
277 |
+ |
278 |
+5. Otherwise, enter the parent directory and jump to step 2. |
279 |
+ |
280 |
+Once the algorithm stops, *last_found* will contain the relevant |
281 |
+top-level Manifest. If *last_found* is null, then the directory tree |
282 |
+does not contain any valid top-level Manifest candidates and one should |
283 |
+be created in the *original* directory. |
284 |
+ |
285 |
+Once the top-level Manifest is found, its ``MANIFEST`` entries should |
286 |
+be used to find any sub-Manifests below the top-level Manifest, |
287 |
+up to and including the *original* directory. Note that those |
288 |
+sub-Manifests can use different filenames than ``Manifest``. |
289 |
+ |
290 |
+ |
291 |
+Checksum algorithms |
292 |
+------------------- |
293 |
+ |
294 |
+This section is informational only. Specifying the exact set |
295 |
+of supported algorithms is outside the scope of this specification. |
296 |
+ |
297 |
+The algorithm names reserved at the time of writing are: |
298 |
+ |
299 |
+- ``MD5`` [#MD5]_, |
300 |
+- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_, |
301 |
+- ``SHA1`` [#SHS]_, |
302 |
+- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_, |
303 |
+- ``WHIRLPOOL`` [#WHIRLPOOL]_, |
304 |
+- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_, |
305 |
+- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_, |
306 |
+- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes |
307 |
+ [#STREEBOG]_. |
308 |
+ |
309 |
+The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_. |
310 |
+It is recommended that any new hashes are named after the Python |
311 |
+``hashlib`` module algorithm names, transformed into uppercase. |
312 |
+ |
313 |
+ |
314 |
+Manifest compression |
315 |
+-------------------- |
316 |
+ |
317 |
+The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_. |
318 |
+This section merely addresses interoperability issues between Manifest |
319 |
+compression and this specification. |
320 |
+ |
321 |
+The compressed Manifest files are required to be suffixed for their |
322 |
+compression algorithm. This suffix should be used to recognize |
323 |
+the compression and decompress Manifests transparently. The exact list |
324 |
+of algorithms and their corresponding suffixes are outside the scope |
325 |
+of this specification. |
326 |
+ |
327 |
+Whenever this specification refers to top-level Manifest file, |
328 |
+the implementation should account for compressed variants of this file |
329 |
+with appropriate suffixes (e.g. ``Manifest.gz``). |
330 |
+ |
331 |
+Whenever this specification refers to sub-Manifests, they can use any |
332 |
+names but are also required to use a specific compression suffix. |
333 |
+The ``MANIFEST`` entries are required to specify the full name including |
334 |
+compression suffix, and the verification is performed on the compressed |
335 |
+file. |
336 |
+ |
337 |
+The specification permits uncompressed Manifests to exist alongside |
338 |
+their compressed counterparts, and multiple compressed formats |
339 |
+to coexist. If that is the case, the files must have the same |
340 |
+uncompressed content and the specification is free to choose either |
341 |
+of the files using the same base name. |
342 |
+ |
343 |
+ |
344 |
+Rationale |
345 |
+========= |
346 |
+ |
347 |
+Stand-alone format |
348 |
+------------------ |
349 |
+ |
350 |
+The first question that needed to be asked before proceeding with |
351 |
+the design was whether the Manifest file format was supposed to be |
352 |
+stand-alone, or tightly bound to the repository format. |
353 |
+ |
354 |
+The stand-alone format has been selected because of its three |
355 |
+advantages: |
356 |
+ |
357 |
+1. It is more future-proof. If an incompatible change to the repository |
358 |
+ format is introduced, only developers need to be upgrade the tools |
359 |
+ they use to generate the Manifests. The tools used to verify |
360 |
+ the updated Manifests will continue to work. |
361 |
+ |
362 |
+2. It is more flexible and universal. With a dedicated tool, |
363 |
+ the Manifest files can be used to sign and verify arbitrary file |
364 |
+ sets. |
365 |
+ |
366 |
+3. It keeps the verification tool simpler. In particular, we can easily |
367 |
+ write an independent verification tool that could work on any |
368 |
+ distribution without needing to depend on a package manager |
369 |
+ implementation or rewrite parts of it. |
370 |
+ |
371 |
+Designing a stand-alone format requires that the Manifest carries enough |
372 |
+information to perform the verification following all the rules specific |
373 |
+to the Gentoo repository. |
374 |
+ |
375 |
+ |
376 |
+Tree design |
377 |
+----------- |
378 |
+ |
379 |
+The second important point of the design was determining whether |
380 |
+the Manifest files should be structured hierarchically, or independent. |
381 |
+Both options have their advantages. |
382 |
+ |
383 |
+In the hierarchical model, each sub-Manifest file is covered by a higher |
384 |
+level Manifest. As a result, only the top-level Manifest has to be |
385 |
+OpenPGP-signed, and subsequent Manifests need to be only verified by |
386 |
+checksum stored in the parent Manifest. This has the following |
387 |
+implications: |
388 |
+ |
389 |
+- Verifying any set of files in the repository requires using checksums |
390 |
+ from the most relevant Manifests and the parent Manifests. |
391 |
+ |
392 |
+- The OpenPGP signature of the top-level Manifest needs to be verified |
393 |
+ only once per process. |
394 |
+ |
395 |
+- Altering any set of files requires updating the relevant Manifests, |
396 |
+ and their parent Manifests up to the top-level Manifest, and signing |
397 |
+ the last one. |
398 |
+ |
399 |
+- As a result, the top-level Manifest changes on every commit, |
400 |
+ and various middle-level Manifests change (and need to be transferred) |
401 |
+ frequently. |
402 |
+ |
403 |
+In the independent model, each sub-Manifest file is independent |
404 |
+of the parent Manifests. As a result, each of them needs to be signed |
405 |
+and verified independently. However, the parent Manifests still need |
406 |
+to list sub-Manifests (albeit without verification data) in order |
407 |
+to detect removal or replacement of subdirectories. This has |
408 |
+the following implications: |
409 |
+ |
410 |
+- Verifying any set of files in the repository requires using checksums |
411 |
+ and verifying signatures of the most relevant Manifest files. |
412 |
+ |
413 |
+- Altering any set of files requires updating the relevant Manifests |
414 |
+ and signing them again. |
415 |
+ |
416 |
+- Parent Manifests are updated only when Manifests are added or removed |
417 |
+ from subdirectories. As a result, they change infrequently. |
418 |
+ |
419 |
+While both models have their advantages, the hierarchical model was |
420 |
+selected because it reduces the number of OpenPGP operations |
421 |
+which are comparatively costly to the minimum. |
422 |
+ |
423 |
+ |
424 |
+Tree layout restrictions |
425 |
+------------------------ |
426 |
+ |
427 |
+The algorithm is meant to work primarily with ebuild repositories which |
428 |
+normally contain only files and directories. Directories provide |
429 |
+no useful metadata for verification, and specifying special entries |
430 |
+for additional file types is purposeless. Therefore, the specification |
431 |
+is restricted to dealing with regular files. |
432 |
+ |
433 |
+The Gentoo repository does not use symbolic links. Some Gentoo |
434 |
+repositories do, however. To provide a simple solution for dealing with |
435 |
+symlinks without having to take care to implement special handling for |
436 |
+them, the common behavior of implicitly resolving them is used. |
437 |
+Therefore, symbolic links to files are stored as if they were regular |
438 |
+files, and symbolic links to directories are followed as if they were |
439 |
+regular directories. |
440 |
+ |
441 |
+Dotfiles are implicitly ignored as that is a common notion used |
442 |
+in software written for POSIX systems. All other filenames require |
443 |
+explicit ``IGNORE`` lines. |
444 |
+ |
445 |
+The algorithm is restricted to work on a single filesystem. This is |
446 |
+mostly relevant when scanning for top-level Manifest — we do not want |
447 |
+to cross filesystem boundaries then. However, to ensure consistent |
448 |
+bidirectional behavior we need to also ban them when operating downwards |
449 |
+the tree. |
450 |
+ |
451 |
+The directories and files on different filesystems needs to be ignored |
452 |
+explicitly as implicitly skipping them would cause confusion. |
453 |
+In particular, tools might then claim that a file does not exist when |
454 |
+it clearly does because it was skipped due to filesystem boundaries. |
455 |
+ |
456 |
+ |
457 |
+File verification model |
458 |
+----------------------- |
459 |
+ |
460 |
+The verification model aims to provide full coverage against different |
461 |
+forms of attack. In particular, three different kinds of manipulation |
462 |
+are considered: |
463 |
+ |
464 |
+1. Alteration of the file content. |
465 |
+ |
466 |
+2. Removal of a file. |
467 |
+ |
468 |
+3. Addition of a new file. |
469 |
+ |
470 |
+In order to prevent against all three, the system requires that all |
471 |
+files in the repository are listed in Manifests and verified against |
472 |
+them. |
473 |
+ |
474 |
+As a special case, ignores are allowed to account for directories |
475 |
+that are not part of the repository but were traditionally placed inside |
476 |
+it. Those directories were ``distfiles``, ``local`` and ``packages``. It |
477 |
+could be also used to ignore VCS directories such as ``CVS``. |
478 |
+ |
479 |
+ |
480 |
+Non-obligatory Manifest verification |
481 |
+------------------------------------ |
482 |
+ |
483 |
+While this specification recommends all tools to use strict verification |
484 |
+by default, it allows declaring some files as non-obligatory like |
485 |
+the original Manifest2 format did. This could be used on files that do |
486 |
+not affect the normal package manager operation. |
487 |
+ |
488 |
+It aims to account for two use cases: |
489 |
+ |
490 |
+1. Stripping down files that are not strictly required to install |
491 |
+ packages from repository checkouts. |
492 |
+ |
493 |
+2. Accounting for automatically generated files that might be updated |
494 |
+ by standard tooling. |
495 |
+ |
496 |
+The traditional ``MISC`` type is amended with a complementary |
497 |
+``OPTIONAL`` tag to account for files that are not provided |
498 |
+in the specific repository. It aims to ensure that the same path would |
499 |
+be non-fatal when provided by the repository but fatal when created |
500 |
+by the user tooling. |
501 |
+ |
502 |
+ |
503 |
+Timestamp field |
504 |
+--------------- |
505 |
+ |
506 |
+The top-level Manifests optionally allows using a ``TIMESTAMP`` tag |
507 |
+to include a generation timestamp in the Manifest. A similar feature |
508 |
+was originally proposed in GLEP 58 [#GLEP58]_. |
509 |
+ |
510 |
+The timestamp can be used to detect delay or replay attacks against |
511 |
+Gentoo mirrors. |
512 |
+ |
513 |
+Strictly speaking, this is already provided by the various |
514 |
+``metadata/timestamp.*`` files provided already by Gentoo which are also |
515 |
+covered by the Manifest. However, including the value in the Manifest |
516 |
+itself has a little cost and provides the ability to perform |
517 |
+the verification stand-alone. |
518 |
+ |
519 |
+ |
520 |
+New vs deprecated tags |
521 |
+---------------------- |
522 |
+ |
523 |
+Out of the four types defined by Manifest2, two are reused and two are |
524 |
+marked deprecated. |
525 |
+ |
526 |
+The ``DIST`` and ``MISC`` tags are reused since they can be relatively |
527 |
+clearly marked into the new concept. |
528 |
+ |
529 |
+The ``EBUILD`` tag could potentially be reused for generic file |
530 |
+verification data. However, it would be confusing if all the different |
531 |
+data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA`` |
532 |
+type was introduced as a replacement. |
533 |
+ |
534 |
+The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has |
535 |
+the limiting property of implicit ``files/`` path prefix. |
536 |
+ |
537 |
+ |
538 |
+Finding top-level Manifest |
539 |
+-------------------------- |
540 |
+ |
541 |
+The development of a reference implementation for this GLEP has brought |
542 |
+the following problem: how to find all the relevant Manifests when |
543 |
+the Manifest tool is run inside a subdirectory of the repository? |
544 |
+ |
545 |
+One of the options would be to provide a bi-directional linking |
546 |
+of Manifests via a ``PARENT`` tag. However, that would not solve |
547 |
+the problem when a new Manifest file is being created. |
548 |
+ |
549 |
+Instead, an algorithm for iterating over parent directories is proposed. |
550 |
+Since there is no obligatory explicit indicator for the top-level |
551 |
+Manifest, the algorithm assumes that the top-level Manifest |
552 |
+is the highest ``Manifest`` in the directory hierarchy that can cover |
553 |
+the current directory. This generally makes sense since the Manifest |
554 |
+files are required to provide coverage for all subdirectories, so all |
555 |
+Manifests starting from that one need to be updated. |
556 |
+ |
557 |
+If independent Manifest trees are nested in the directory structure, |
558 |
+then an ``IGNORE`` entry needs to be used to separate them. |
559 |
+ |
560 |
+Since sub-Manifests can use any filenames, the Manifest finding |
561 |
+algorithm must not short-cut the procedure by storing all ``Manifest`` |
562 |
+files along the parent directories. Instead, it needs to retrace |
563 |
+the relevant sub-Manifest files along ``MANIFEST`` entries |
564 |
+in the top-level Manifest. |
565 |
+ |
566 |
+ |
567 |
+Injecting ChangeLogs into the checkout |
568 |
+-------------------------------------- |
569 |
+ |
570 |
+One of the problems considered in the new Manifest format was that |
571 |
+of injecting historical and autogenerated ChangeLog into the repository. |
572 |
+Normally we are not including those files to reduce the checkout size. |
573 |
+However, some users have shown interest in them and Infra is working |
574 |
+on providing them via an additional rsync module. |
575 |
+ |
576 |
+If such files were injected into the repository, they would cause strict |
577 |
+verification failures of Manifests. To account for this, Infra could |
578 |
+provide either ``OPTIONAL`` entries for the Manifest files to allow them |
579 |
+in non-strict verification mode, or ``IGNORE`` entries to allow them |
580 |
+in the strict mode. |
581 |
+ |
582 |
+ |
583 |
+Splitting distfile checksums from file checksums |
584 |
+------------------------------------------------ |
585 |
+ |
586 |
+Another problem with the current Manifest format is that the checksums |
587 |
+for fetched files are combined with checksums for local files |
588 |
+in a single file inside the package directory. It has been specifically |
589 |
+pointed out that: |
590 |
+ |
591 |
+- since distfiles are sometimes reused across different packages, |
592 |
+ the repeating checksums are redundant, |
593 |
+ |
594 |
+- mirror admins were interested in the possibility of verifying all |
595 |
+ the distfiles with a single tool. |
596 |
+ |
597 |
+This specification does not provide a clean solution to this problem. |
598 |
+It technically permits moving ``DIST`` entries to higher-level Manifests |
599 |
+but the usefulness of such a solution is doubtful. |
600 |
+ |
601 |
+However, for the second problem we will probably deliver a dedicated |
602 |
+tool working with this Manifest format. |
603 |
+ |
604 |
+ |
605 |
+Hash algorithms |
606 |
+--------------- |
607 |
+ |
608 |
+While maintaining a consistent supported hash set is important |
609 |
+for interoperability, it is no good fit for the generic layout of this |
610 |
+GLEP. Furthermore, it would require updating the GLEP in the future |
611 |
+every time the used algorithms change. |
612 |
+ |
613 |
+Instead, the specification focuses on listing the currently used |
614 |
+algorithm names for interoperability, and sets a recommendation |
615 |
+for consistent naming of algorithms in the future. The Python |
616 |
+``hashlib`` module is used as a reference since it is used |
617 |
+as the provider of hash functions for most of the Python software, |
618 |
+including Portage and PkgCore. |
619 |
+ |
620 |
+The basic rules for changing hash algorithms are defined in GLEP 59 |
621 |
+[#GLEP59]_. The implementations can focus only on those algorithms |
622 |
+that are actually used or planned on being used. It may be feasible |
623 |
+to devise a new GLEP that specifies the currently used hashes (or update |
624 |
+GLEP 59 accordingly). |
625 |
+ |
626 |
+ |
627 |
+Manifest compression |
628 |
+-------------------- |
629 |
+ |
630 |
+The support for Manifest compression is introduced with minimal changes |
631 |
+to the file format. The ``MANIFEST`` entries are required to provide |
632 |
+the real (compressed) file path for compatibility with other file |
633 |
+entries and to avoid confusion. |
634 |
+ |
635 |
+The existence of additional entries for uncompressed Manifest checksums |
636 |
+was debated. However, plain entries for the uncompressed file would |
637 |
+be confusing if only compressed file existed, and conflicting if both |
638 |
+uncompressed and compressed variants existed. Furthermore, it has been |
639 |
+pointed out that ``DIST`` entries do not have uncompressed variant |
640 |
+either. |
641 |
+ |
642 |
+ |
643 |
+Performance considerations |
644 |
+-------------------------- |
645 |
+ |
646 |
+Performing a full-tree verification on every sync raises some |
647 |
+performance concerns for end-user systems. The initial testing has shown |
648 |
+that a cold-cache verification on a btrfs file system can take up around |
649 |
+4 minutes, with the process being mostly I/O bound. On the other hand, |
650 |
+it can be expected that the verification will be performed directly |
651 |
+after syncing, taking advantage of warm filesystem cache. |
652 |
+ |
653 |
+To improve speed on I/O and/or CPU-restrained systems even further, |
654 |
+the algorithms can be easily extended to perform incremental |
655 |
+verification. Given that rsync does not preserve mtimes by default, |
656 |
+the tool can take advantage of mtime and Manifest comparisons to recheck |
657 |
+only the parts of the repository that have changed. |
658 |
+ |
659 |
+Furthermore, the package manager implementations can restrict checking |
660 |
+only to the parts of the repository that are actually being used. |
661 |
+ |
662 |
+ |
663 |
+Backwards Compatibility |
664 |
+======================= |
665 |
+ |
666 |
+This GLEP provides optional means of preserving backwards compatibility. |
667 |
+To preserve the backwards compatibility, the following needs to be |
668 |
+ensured: |
669 |
+ |
670 |
+- all files within the package directory must be covered by ``Manifest`` |
671 |
+ file inside that package directory, |
672 |
+ |
673 |
+- all distfiles used by the package must be covered by ``Manifest`` |
674 |
+ file inside the package directory, |
675 |
+ |
676 |
+- all files inside the ``files/`` subdirectory of a package directory |
677 |
+ need to be use the deprecated ``AUX`` tag (rather than ``DATA``), |
678 |
+ |
679 |
+- all ``.ebuild`` files inside the package directory need to use |
680 |
+ the deprecated ``EBUILD`` tag (rather than ``DATA``), |
681 |
+ |
682 |
+- the Manifest files inside the package directory can be signed |
683 |
+ to provide authenticity verification. |
684 |
+ |
685 |
+Once the backwards compatibility is no longer a concern, the above |
686 |
+no longer needs to hold and the deprecated tags can be removed. |
687 |
+ |
688 |
+ |
689 |
+Reference Implementation |
690 |
+======================== |
691 |
+ |
692 |
+The reference implementation for this GLEP is being developed |
693 |
+as the gemato project [#GEMATO]_. |
694 |
+ |
695 |
+ |
696 |
+Credits |
697 |
+======= |
698 |
+ |
699 |
+Thanks to all the people whose contributions were invaluable |
700 |
+to the creation of this GLEP. This includes but is not limited to: |
701 |
+ |
702 |
+- Robin Hugh Johnson, |
703 |
+- Ulrich Müller. |
704 |
+ |
705 |
+Additionally, thanks to Robin Hugh Johnson for the original |
706 |
+MataManifest GLEP series which served both as inspiration and source |
707 |
+of many concepts used in this GLEP. Recursively, also thanks to all |
708 |
+the people who contributed to the original GLEPs. |
709 |
+ |
710 |
+ |
711 |
+References |
712 |
+========== |
713 |
+ |
714 |
+.. [#GLEP44] GLEP 44: Manifest2 format |
715 |
+ (https://www.gentoo.org/glep/glep-0044.html) |
716 |
+ |
717 |
+.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software |
718 |
+ - Overview |
719 |
+ (https://www.gentoo.org/glep/glep-0057.html) |
720 |
+ |
721 |
+.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software |
722 |
+ - Infrastructure to User distribution - MetaManifest |
723 |
+ (https://www.gentoo.org/glep/glep-0058.html) |
724 |
+ |
725 |
+.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications |
726 |
+ (https://www.gentoo.org/glep/glep-0059.html) |
727 |
+ |
728 |
+.. [#GLEP60] GLEP 60: Manifest2 filetypes |
729 |
+ (https://www.gentoo.org/glep/glep-0060.html) |
730 |
+ |
731 |
+.. [#GLEP61] GLEP 61: Manifest2 compression |
732 |
+ (https://www.gentoo.org/glep/glep-0061.html) |
733 |
+ |
734 |
+.. [#PMS-FETCH] Package Manager Specification: Dependency Specification |
735 |
+ Format - SRC_URI |
736 |
+ (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10) |
737 |
+ |
738 |
+.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm |
739 |
+ (https://www.ietf.org/rfc/rfc1321.txt) |
740 |
+ |
741 |
+.. [#RIPEMD160] The hash function RIPEMD-160 |
742 |
+ (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html) |
743 |
+ |
744 |
+.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS) |
745 |
+ (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf) |
746 |
+ |
747 |
+.. [#WHIRLPOOL] The WHIRLPOOL Hash Function |
748 |
+ (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html) |
749 |
+ |
750 |
+.. [#BLAKE2] BLAKE2 — fast secure hashing |
751 |
+ (https://blake2.net/) |
752 |
+ |
753 |
+.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash |
754 |
+ and Extendable-Output Functions |
755 |
+ (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf) |
756 |
+ |
757 |
+.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function |
758 |
+ (https://www.streebog.net/) |
759 |
+ |
760 |
+.. [#GEMATO] gemato: Gentoo Manifest Tool |
761 |
+ (https://github.com/mgorny/gemato/) |
762 |
+ |
763 |
+Copyright |
764 |
+========= |
765 |
+This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 |
766 |
+Unported License. To view a copy of this license, visit |
767 |
+http://creativecommons.org/licenses/by-sa/3.0/. |