Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev <gentoo-dev@l.g.o>
Subject: [gentoo-dev] [RFC] GLEP 74 post-Council review update
Date: Thu, 16 Nov 2017 10:20:08
Message-Id: 1510827594.10459.2.camel@gentoo.org
1 Hi, everyone.
2
3 Here's the updated version of GLEP 74 taking into consideration
4 the points made during the Council pre-review.
5
6 ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
7 HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
8
9 Changes:
10
11 09ed01f glep-0074: Explain combining multiple Manifest trees
12 9de0840 glep-0074: Clarify timestamp handling of sub-Manifests
13 516c2ec glep-0074: Forbid compressing top-level Manifest
14 b01783e glep-0074: Clarify sub-Manifest signing paragraph
15
16
17 ---
18 GLEP: 74
19 Title: Full-tree verification using Manifest files
20 Author: Michał Górny <mgorny@g.o>,
21 Robin Hugh Johnson <robbat2@g.o>,
22 Ulrich Müller <ulm@g.o>
23 Type: Standards Track
24 Status: Draft
25 Version: 1
26 Created: 2017-10-21
27 Last-Modified: 2017-11-16
28 Post-History: 2017-10-26, 2017-11-16
29 Content-Type: text/x-rst
30 Requires: 59, 61
31 Replaces: 44, 58, 60
32 ---
33
34 Abstract
35 ========
36
37 This GLEP extends the Manifest file format to cover full-tree file
38 integrity and authenticity checks.The format aims to be future-proof,
39 efficient and provide means of backwards compatibility.
40
41
42 Motivation
43 ==========
44
45 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
46 means of verifying the integrity of distfiles and package files
47 in Gentoo. Combined with OpenPGP signatures, they provide means to
48 ensure the authenticity of the covered files. However, as noted
49 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
50 authenticity verification as they do not cover any files outside
51 the package directory. In particular, they provide multiple ways
52 for a third party to inject malicious code into the ebuild environment.
53
54 Historically, the topic of providing authenticity coverage for the whole
55 repository has been mentioned multiple times. The most noteworthy effort
56 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
57 They were accepted by the Council in 2010 but have never been
58 implemented. When potential implementation work started in 2017, a new
59 discussion about the specification arose. It prompted the creation
60 of a competing GLEP that would provide a redesigned alternative to
61 the old GLEPs.
62
63 This specification is designed with the following goals in mind:
64
65 1. It should provide means to ensure the authenticity of the complete
66 repository, including preventing the injection of additional files.
67
68 2. The format should be universal enough to work both for the Gentoo
69 repository and third-party repositories of different characteristics.
70
71 3. The Manifest files should be verifiable stand-alone, that is without
72 knowing any details about the underlying repository format.
73
74
75 Specification
76 =============
77
78 Manifest file format
79 --------------------
80
81 This specification reuses and extends the Manifest file format defined
82 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
83 repurposed as a generic *tag* that could also indicate additional
84 (non-checksum) metadata. Appropriately, those tags can be followed by
85 other space-separated values.
86
87 Unless specified otherwise, the paths used in the Manifest files
88 are relative to the directory containing the Manifest file. The paths
89 must not reference the parent directory (``..``).
90
91
92 Manifest file locations and nesting
93 -----------------------------------
94
95 The ``Manifest`` file located in the root directory of the repository
96 is called top-level Manifest, and it is used to perform the full-tree
97 verification. In order to verify the authenticity, it must be signed
98 using OpenPGP, using the armored cleartext format.
99
100 The top-level Manifest may reference sub-Manifests contained
101 in subdirectories of the repository. The sub-Manifests are traditionally
102 named ``Manifest``; however, the implementation must support arbitrary
103 names, including the possibility of multiple (split) Manifests
104 for a single directory. The sub-Manifest can only cover the files inside
105 the directory tree where it resides.
106
107 The sub-Manifest can also be signed using OpenPGP armored cleartext
108 format. However, the signature verification can be omitted since it
109 already is covered by the signed top-level Manifest.
110
111
112 Directory tree coverage
113 -----------------------
114
115 The specification provides three ways of skipping Manifest verification
116 of specific files and directories (recursively):
117
118 1. explicit ``IGNORE`` entries in Manifest files,
119
120 2. injected ignore paths via package manager configuration,
121
122 3. using names starting with a dot (``.``) which are always skipped.
123
124 All files that are not ignored must be covered by at least one
125 of the Manifests.
126
127 A single file may be matched by multiple identical or equivalent
128 Manifest entries, if and only if the entries have the same semantics,
129 specify the same size and the checksums common to both entries match.
130 It is an error for a single file to be matched by multiple entries
131 of different semantics, file size or checksum values. It is an error
132 to specify another entry for a file matching ``IGNORE``, or one of its
133 subdirectories.
134
135 The file entries (except for ``IGNORE``) can be specified for regular
136 files only. Symbolic links are followed when opening files
137 and traversing directories. It is an error to specify an entry for
138 a different file type. If the tree contain files of other types
139 that are not otherwise ignored, they need to be covered by an explicit
140 ``IGNORE``.
141
142 All the local (non-``DIST``) files covered by a Manifest tree must
143 reside on the same filesystem. It is an error to specify entries
144 applying to files on another filesystem. If files or directories that
145 are not otherwise ignored reside on a different filesystem, or symbolic
146 links point to targets on a different filesystem, they must
147 be explicitly excluded via ``IGNORE``.
148
149
150 File verification
151 -----------------
152
153 When verifying a file against the Manifest, the following rules are
154 used:
155
156 1. If the file is covered directly or indirectly by an entry
157 of the ``IGNORE`` type, the verification always succeeds.
158
159 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
160 ``MISC``, ``EBUILD`` or ``AUX`` type:
161
162 a. if the file is not present, then the verification fails,
163
164 b. if the file is present but has a different size or one
165 of the checksums does not match, the verification fails,
166
167 c. otherwise, the verification succeeds.
168
169 3. If the file is present but not listed in Manifest, the verification
170 fails.
171
172 Unless specified otherwise, the package manager must not allow using
173 any files for which the verification failed. The package manager may
174 reject any package or even the whole repository if it may refer to files
175 for which the verification failed.
176
177
178 Timestamp verification
179 ----------------------
180
181 The top-level Manifest file can contain a ``TIMESTAMP`` entry to account
182 for attacks against tree update distribution. If such an entry
183 is present, it should be updated every time at least one
184 of the Manifests changes. Every unique timestamp value must correspond
185 to a single tree state.
186
187 During the verification process, the client should compare the timestamp
188 against the update time obtained from a local clock or a trusted time
189 source. If the comparison result indicates that the Manifest at the time
190 of receiving was already significantly outdated, the client should
191 either fail the verification or require manual confirmation from user.
192
193 Furthermore, the Manifest provider may employ additional methods
194 of distributing the timestamps of recently generated Manifests
195 using a secure channel from a trusted source for exact comparison.
196 The exact details of such a solution are outside the scope of this
197 specification.
198
199 ``TIMESTAMP`` entries may also be present in sub-Manifests. Those
200 timestamps must not be newer than the timestamp of the top-level
201 Manifest (if present). This specification does not define any specific
202 use for them.
203
204
205 Modern Manifest tags
206 --------------------
207
208 The Manifest files can specify the following tags:
209
210 ``TIMESTAMP <iso8601>``
211 Specifies a timestamp of when the Manifest file was last updated.
212 The timestamp must be a valid second-precision ISO8601 extended format
213 combined date and time in UTC timezone, i.e. using the following
214 ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optional.
215 The package manager can use it to detect an outdated repository
216 checkout as described in `Timestamp verification`_.
217
218 ``MANIFEST <path> <size> <checksums>...``
219 Specifies a sub-Manifest. The sub-Manifest must be verified like
220 a regular file. If the verification succeeds, the entries from
221 the sub-Manifest are included for verification as described
222 in `Manifest file locations and nesting`_.
223
224 ``IGNORE <path>``
225 Ignores a subdirectory or file from Manifest checks. If the specified
226 path is present, it and its contents are omitted from the Manifest
227 verification (always pass). *Path* must be a plain file or directory
228 path without a trailing slash, and must not contain wildcards.
229
230 ``DATA <path> <size> <checksums>...``
231 Specifies a regular file subject to Manifest verification. The file
232 is required to pass verification. Used for all files that do not match
233 any other type.
234
235 ``DIST <filename> <size> <checksums>...``
236 Specifies a distfile entry used to verify files fetched as part
237 of ``SRC_URI``. The filename must match the filename used to store
238 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
239 manager must reject the fetched file if it fails verification.
240 ``DIST`` entries apply to all packages below the Manifest file
241 specifying them.
242
243
244 Deprecated Manifest tags
245 ------------------------
246
247 For backwards compatibility, the following tags are additionally
248 allowed at the package directory level:
249
250 ``EBUILD <filename> <size> <checksums>...``
251 Equivalent to the ``DATA`` type.
252
253 ``MISC <path> <size> <checksums>...``
254 Equivalent to the ``DATA`` type. Historically indicated that
255 the package manager may ignore a verification failure if operating
256 in non-strict mode. However, that behavior is deprecated.
257
258 ``AUX <filename> <size> <checksums>...``
259 Equivalent to the ``DATA`` type, except that the filename is relative
260 to ``files/`` subdirectory.
261
262
263 Algorithm for full-tree verification
264 ------------------------------------
265
266 In order to perform full-tree verification, the following algorithm
267 can be used:
268
269 1. Collect all files present in the repository into *present* set.
270
271 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
272 Optionally verify the ``TIMESTAMP`` entry if present as specified
273 in `timestamp verification`. Remove the top-level Manifest
274 from the *present* set.
275
276 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
277 files according to `file verification`_ section, and include their
278 entries in the current Manifest entry list (using paths relative
279 to directories containing the Manifests).
280
281 4. Process all ``IGNORE`` entries. Remove any paths matching them
282 from the *present* set.
283
284 5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
285 and ``AUX`` entries into the *covered* set.
286
287 6. Verify the entries in *covered* set for incompatible duplicates
288 and collisions with ignored files as explained in `Manifest file
289 locations and nesting`_.
290
291 7. Verify all the files in the union of the *present* and *covered*
292 sets, according to `file verification`_ section.
293
294
295 Algorithm for finding parent Manifests
296 --------------------------------------
297
298 In order to find the top-level Manifest from the current directory
299 the following algorithm can be used:
300
301 1. Store the current directory as *original* and the device ID
302 of the containing filesystem (``st_dev``) as *startdev*,
303
304 2. If the device ID of the containing filesystem (``st_dev``)
305 of the current directory is different than *startdev*, stop.
306
307 3. If the current directory contains a ``Manifest`` file:
308
309 a. If a ``IGNORE`` entry in the ``Manifest`` file covers
310 the *original* directory (or one of the parent directories), stop.
311
312 b. Otherwise, store the current directory as *last_found*.
313
314 4. If the current directory is the root system directory (``/``), stop.
315
316 5. Otherwise, enter the parent directory and jump to step 2.
317
318 Once the algorithm stops, *last_found* will contain the relevant
319 top-level Manifest. If *last_found* is null, then the directory tree
320 does not contain any valid top-level Manifest candidates and one should
321 be created in the *original* directory.
322
323 Once the top-level Manifest is found, its ``MANIFEST`` entries should
324 be used to find any sub-Manifests below the top-level Manifest,
325 up to and including the *original* directory. Note that those
326 sub-Manifests can use different filenames than ``Manifest``.
327
328
329 Checksum algorithms
330 -------------------
331
332 This section is informational only. Specifying the exact set
333 of supported algorithms is outside the scope of this specification.
334
335 The algorithm names reserved at the time of writing are:
336
337 - ``MD5`` [#MD5]_,
338 - ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_,
339 - ``SHA1`` [#SHS]_,
340 - ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_,
341 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
342 - ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_,
343 - ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_,
344 - ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes
345 [#STREEBOG]_.
346
347 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
348 It is recommended that any new hashes are named after the Python
349 ``hashlib`` module algorithm names, transformed into uppercase.
350
351
352 Manifest compression
353 --------------------
354
355 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
356 This section merely addresses interoperability issues between Manifest
357 compression and this specification.
358
359 The compressed Manifest files are required to be suffixed for their
360 compression algorithm. This suffix should be used to recognize
361 the compression and decompress Manifests transparently. The exact list
362 of algorithms and their corresponding suffixes are outside the scope
363 of this specification.
364
365 The top-level Manifest file must not be compressed. Since the OpenPGP
366 signature covers the uncompressed text and is compressed itself,
367 the data would have to be decompressed without any prior verification.
368 This could expose users e.g. to zip bombs or exploits on decompressor
369 vulnerabilities.
370
371 Whenever this specification refers to sub-Manifests, they can use any
372 names but are also required to use a specific compression suffix.
373 The ``MANIFEST`` entries are required to specify the full name including
374 compression suffix, and the verification is performed on the compressed
375 file.
376
377 The specification permits uncompressed Manifests to exist alongside
378 their compressed counterparts, and multiple compressed formats
379 to coexist. If that is the case, the files must have the same
380 uncompressed content and the specification is free to choose either
381 of the files using the same base name.
382
383
384 Combining multiple Manifest trees (informational)
385 -------------------------------------------------
386
387 This specification permits nesting multiple hierarchical Manifest trees.
388 In this layout, the specific directories of the Manifest tree can
389 be verified both as a part of another top-level Manifest,
390 and as an independent Manifest tree (when obtained without the parent
391 directory).
392
393 For this to work, the sub-Manifest file in the directory must also
394 satisfy the requirements for the top-level Manifest file. That is:
395
396 - it must be named ``Manifest`` and not compressed,
397
398 - it must cover all the files in this directory and its subdirectories
399 (i.e. no files from the directory tree can be covered by parent
400 Manifest),
401
402 - if authenticity verification is desired, it must be OpenPGP-signed.
403
404 It should be noted that if such a directory is a subdirectory of a valid
405 Manifest tree, the sub-Manifest needs to be valid according
406 to the top-level Manifest and the OpenPGP signature is disregarded
407 as detailed in `Manifest file locations and nesting`_. The top-level
408 behavior is exhibited only when the directory is obtained without parent
409 directories.
410
411
412 An example Manifest file (informational)
413 ----------------------------------------
414
415 An example top-level Manifest file for the Gentoo repository would have
416 the following content::
417
418 TIMESTAMP 2017-10-30T10:11:12Z
419 IGNORE distfiles
420 IGNORE local
421 IGNORE lost+found
422 IGNORE packages
423 MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
424 ...
425 MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
426 ...
427
428 An example modern Manifest (disregarding backwards compatibility)
429 for a package directory would have the following content::
430
431 DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
432 DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
433 DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
434 DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
435 DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
436 DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
437 DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
438
439
440 Rationale
441 =========
442
443 Stand-alone format
444 ------------------
445
446 The first question that needed to be asked before proceeding with
447 the design was whether the Manifest file format was supposed to be
448 stand-alone, or tightly bound to the repository format.
449
450 The stand-alone format has been selected because of its three
451 advantages:
452
453 1. It is more future-proof. If an incompatible change to the repository
454 format is introduced, only developers need to be upgrade the tools
455 they use to generate the Manifests. The tools used to verify
456 the updated Manifests will continue to work.
457
458 2. It is more flexible and universal. With a dedicated tool,
459 the Manifest files can be used to sign and verify arbitrary file
460 sets.
461
462 3. It keeps the verification tool simpler. In particular, we can easily
463 write an independent verification tool that could work on any
464 distribution without needing to depend on a package manager
465 implementation or rewrite parts of it.
466
467 Designing a stand-alone format requires that the Manifest carries enough
468 information to perform the verification following all the rules specific
469 to the Gentoo repository.
470
471
472 Tree design
473 -----------
474
475 The second important point of the design was determining whether
476 the Manifest files should be structured hierarchically, or independent.
477 Both options have their advantages.
478
479 In the hierarchical model, each sub-Manifest file is covered by a higher
480 level Manifest. As a result, only the top-level Manifest has to be
481 OpenPGP-signed, and subsequent Manifests need to be only verified by
482 checksum stored in the parent Manifest. This has the following
483 implications:
484
485 - Verifying any set of files in the repository requires using checksums
486 from the most relevant Manifests and the parent Manifests.
487
488 - The OpenPGP signature of the top-level Manifest needs to be verified
489 only once per process.
490
491 - Altering any set of files requires updating the relevant Manifests,
492 and their parent Manifests up to the top-level Manifest, and signing
493 the last one.
494
495 - As a result, the top-level Manifest changes on every commit,
496 and various middle-level Manifests change (and need to be transferred)
497 frequently.
498
499 In the independent model, each sub-Manifest file is independent
500 of the parent Manifests. As a result, each of them needs to be signed
501 and verified independently. However, the parent Manifests still need
502 to list sub-Manifests (albeit without verification data) in order
503 to detect removal or replacement of subdirectories. This has
504 the following implications:
505
506 - Verifying any set of files in the repository requires using checksums
507 and verifying signatures of the most relevant Manifest files.
508
509 - Altering any set of files requires updating the relevant Manifests
510 and signing them again.
511
512 - Parent Manifests are updated only when Manifests are added or removed
513 from subdirectories. As a result, they change infrequently.
514
515 While both models have their advantages, the hierarchical model was
516 selected because it reduces the number of OpenPGP operations
517 which are comparatively costly to the minimum.
518
519
520 Tree layout restrictions
521 ------------------------
522
523 The algorithm is meant to work primarily with ebuild repositories which
524 normally contain only files and directories. Directories provide
525 no useful metadata for verification, and specifying special entries
526 for additional file types is purposeless. Therefore, the specification
527 is restricted to dealing with regular files.
528
529 The Gentoo repository does not use symbolic links. Some Gentoo
530 repositories do, however. To provide a simple solution for dealing with
531 symlinks without having to take care to implement special handling for
532 them, the common behavior of implicitly resolving them is used.
533 Therefore, symbolic links to files are stored as if they were regular
534 files, and symbolic links to directories are followed as if they were
535 regular directories.
536
537 Dotfiles are implicitly ignored as that is a common notion used
538 in software written for POSIX systems. All other filenames require
539 explicit ``IGNORE`` lines.
540
541 An ability to inject additional ignore entries is provided to account
542 for site configuration affecting the repository tree -- placing
543 additional files in it, skipping some of the categories from syncing.
544 This configuration can extend beyond the limits of this GLEP,
545 e.g. by allowing wildcards or regular expressions.
546
547 The algorithm is restricted to work on a single filesystem. This is
548 mostly relevant when scanning for top-level Manifest -- we do not want
549 to cross filesystem boundaries then. However, to ensure consistent
550 bidirectional behavior we need to also ban them when operating downwards
551 the tree.
552
553 The directories and files on different filesystems need to be ignored
554 explicitly as implicitly skipping them would cause confusion.
555 In particular, tools might then claim that a file does not exist when
556 it clearly does because it was skipped due to filesystem boundaries.
557
558
559 File verification model
560 -----------------------
561
562 The verification model aims to provide full coverage against different
563 forms of attack. In particular, three different kinds of manipulation
564 are considered:
565
566 1. Alteration of the file content.
567
568 2. Removal of a file.
569
570 3. Addition of a new file.
571
572 In order to prevent against all three, the system requires that all
573 files in the repository are listed in Manifests and verified against
574 them.
575
576 As a special case, ignores are allowed to account for directories
577 that are not part of the repository but were traditionally placed inside
578 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
579 could be also used to ignore VCS directories such as ``CVS``.
580
581
582 Non-strict Manifest verification
583 --------------------------------
584
585 Originally the Manifest2 format provided a special ``MISC`` tag that
586 was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
587 indicated that the Manifest verification failures could be ignored for
588 those files unless the package manager was working in strict mode.
589
590 The first versions of this specification continued the use of this tag.
591 However, after a long debate it was decided to deprecate it along with
592 the non-strict behavior, and require all files to strictly match.
593
594 Two arguments were mentioned for the usefulness of a ``MISC`` type:
595
596 1. being able to reduce the checkout size by stripping unnecessary
597 files out, and
598
599 2. being able to run update automatically generated files locally
600 without causing unnecessary verification failures.
601
602 However, the usefulness of ``MISC`` in both cases is doubtful.
603
604 The cases for stripping unnecessary files mostly focused around space
605 savings. For this purpose, stripping ``metadata.xml`` and similar files
606 has little value. It is much more common for users to strip whole
607 packages or categories. The ``MISC`` type is not suitable for that,
608 and so a dedicated package manager mechanism needs to be developed
609 instead. The same mechanism can also handle files that historically used
610 the ``MISC`` type. As an example, the package manager may choose
611 to generate both the rsync exclusion list and Manifest ignore list
612 using a single source list.
613
614 The cases for autogenerated files involve such cache files
615 as ``use.local.desc``. However, we can not include ``md5-cache`` there
616 due to security concerns which results in inconsistent cache handling.
617 Furthermore, the tools were historically modified to provide stable
618 output which means that their content can not change without
619 a non-``MISC`` content being changed first. This practically defeats
620 the purpose of using ``MISC``.
621
622 Finally, the non-strict mode could be used as means to an attack.
623 The allowance of missing or modified documentation file could be used
624 to spread misinformation, resulting in bad decisions made by the user.
625 A modified file could also be used e.g. to exploit vulnerabilities
626 of an XML parser.
627
628
629 Timestamp field
630 ---------------
631
632 The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
633 to include a generation timestamp in the Manifest. A similar feature
634 was originally proposed in GLEP 58 [#GLEP58]_.
635
636 A malicious third-party may use the principles of exclusion or replay
637 [#C08]_ to deny an update to clients, while at the same time recording
638 the identity of clients to attack. The timestamp field can be used to
639 detect that.
640
641 In order to provide a more complete protection, the Gentoo
642 Infrastructure should provide an ability to obtain the timestamps
643 of all Manifests from a recent timeframe over a secure channel
644 from a trusted source for comparison.
645
646 Strictly speaking, this information is already provided by the various
647 ``metadata/timestamp*`` files that are already present. However,
648 including the value in the Manifest itself has a little cost
649 and provides the ability to perform the verification stand-alone.
650
651 Furthermore, some of the timestamp files are added very late
652 in the distribution process, past the Manifest generation phase. Those
653 files will most likely receive ``IGNORE`` entries and therefore
654 be not suitable to safe use.
655
656 The specification permits additional timestamps in sub-Manifest files
657 for local use. A generic testing tool should ignore them.
658
659
660 New vs deprecated tags
661 ----------------------
662
663 Out of the four types defined by Manifest2, only one is reused
664 and the remaining three is replaced by a single, universal ``DATA``
665 type.
666
667 The ``DIST`` tag is reused since the specification does not change
668 anything with regard to distfile handling.
669
670 The ``EBUILD`` tag could potentially be reused for generic file
671 verification data. However, it would be confusing if all the different
672 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
673 type was introduced as a replacement.
674
675 The ``MISC`` tag and the relevant non-strict mode has been removed
676 as being of little value, as detailed in the `Non-strict Manifest
677 verification`_ section.
678
679 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
680 the limiting property of implicit ``files/`` path prefix.
681
682
683 Finding top-level Manifest
684 --------------------------
685
686 The development of a reference implementation for this GLEP has brought
687 the following problem: how to find all the relevant Manifests when
688 the Manifest tool is run inside a subdirectory of the repository?
689
690 One of the options would be to provide a bi-directional linking
691 of Manifests via a ``PARENT`` tag. However, that would not solve
692 the problem when a new Manifest file is being created.
693
694 Instead, an algorithm for iterating over parent directories is proposed.
695 Since there is no obligatory explicit indicator for the top-level
696 Manifest, the algorithm assumes that the top-level Manifest
697 is the highest ``Manifest`` in the directory hierarchy that can cover
698 the current directory. This generally makes sense since the Manifest
699 files are required to provide coverage for all subdirectories, so all
700 Manifests starting from that one need to be updated.
701
702 If independent Manifest trees are nested in the directory structure,
703 then an ``IGNORE`` entry needs to be used to separate them.
704
705 Since sub-Manifests can use any filenames, the Manifest finding
706 algorithm must not short-cut the procedure by storing all ``Manifest``
707 files along the parent directories. Instead, it needs to retrace
708 the relevant sub-Manifest files along ``MANIFEST`` entries
709 in the top-level Manifest.
710
711
712 Injecting ChangeLogs into the checkout
713 --------------------------------------
714
715 One of the problems considered in the new Manifest format was that
716 of injecting historical and autogenerated ChangeLog into the repository.
717 Normally we are not including those files to reduce the checkout size.
718 However, some users have shown interest in them and Infra is working
719 on providing them via an additional rsync module.
720
721 If such files were injected into the repository, they would cause
722 verification failures of Manifests. To account for this, Infra could
723 provide ``IGNORE`` entries to allow them to exist.
724
725
726 Splitting distfile checksums from file checksums
727 ------------------------------------------------
728
729 Another problem with the current Manifest format is that the checksums
730 for fetched files are combined with checksums for local files
731 in a single file inside the package directory. It has been specifically
732 pointed out that:
733
734 - since distfiles are sometimes reused across different packages,
735 the repeating checksums are redundant [#DIST]_.
736
737 - mirror admins were interested in the possibility of verifying all
738 the distfiles with a single tool.
739
740 This specification does not provide a clean solution to this problem.
741 It technically permits moving ``DIST`` entries to higher-level Manifests
742 but the usefulness of such a solution is doubtful.
743
744 However, for the second problem we will probably deliver a dedicated
745 tool working with this Manifest format.
746
747
748 Hash algorithms
749 ---------------
750
751 While maintaining a consistent supported hash set is important
752 for interoperability, it is no good fit for the generic layout of this
753 GLEP. Furthermore, it would require updating the GLEP in the future
754 every time the used algorithms change.
755
756 Instead, the specification focuses on listing the currently used
757 algorithm names for interoperability, and sets a recommendation
758 for consistent naming of algorithms in the future. The Python
759 ``hashlib`` module is used as a reference since it is used
760 as the provider of hash functions for most of the Python software,
761 including Portage and PkgCore.
762
763 The basic rules for changing hash algorithms are defined in GLEP 59
764 [#GLEP59]_. The implementations can focus only on those algorithms
765 that are actually used or planned on being used. It may be feasible
766 to devise a new GLEP that specifies the currently used hashes (or update
767 GLEP 59 accordingly).
768
769
770 Manifest compression
771 --------------------
772
773 The support for Manifest compression is introduced with minimal changes
774 to the file format. The ``MANIFEST`` entries are required to provide
775 the real (compressed) file path for compatibility with other file
776 entries and to avoid confusion.
777
778 The compression of top-level Manifest file has been prohibited
779 as the specification currently does not provide any means of verifying
780 the file prior to decompression. This would make it possibly for
781 a malicious third party to provide a compressed Manifest exposing
782 decompressor vulnerabilities, or being a zip bomb, and the tooling
783 would have to unpack it before being able to verify the contents.
784
785 The OpenPGP cleartext signature covers the contents of the Manifest,
786 and is therefore compressed along with them. The possibility of using
787 detached signature has been considered but it was rejected as
788 unnecessary complexity for minor gain.
789
790 Technically, a similar result could be effected via moving all the data
791 into a compressed sub-Manifest in the top directory (e.g.
792 ``Manifest.sub.gz``), and including a ``MANIFEST`` entry for this file
793 in a signed, uncompressed top-level Manifest.
794
795 The existence of additional entries for uncompressed Manifest checksums
796 was debated. However, plain entries for the uncompressed file would
797 be confusing if only compressed file existed, and conflicting if both
798 uncompressed and compressed variants existed. Furthermore, it has been
799 pointed out that ``DIST`` entries do not have uncompressed variant
800 either.
801
802
803 Performance considerations
804 --------------------------
805
806 Performing a full-tree verification on every sync raises some
807 performance concerns for end-user systems. The initial testing has shown
808 that a cold-cache verification on a btrfs file system can take up around
809 4 minutes, with the process being mostly I/O bound. On the other hand,
810 it can be expected that the verification will be performed directly
811 after syncing, taking advantage of warm filesystem cache.
812
813 To improve speed on I/O and/or CPU-restrained systems even further,
814 the algorithms can be easily extended to perform incremental
815 verification. Given that rsync does not preserve mtimes by default,
816 the tool can take advantage of mtime and Manifest comparisons to recheck
817 only the parts of the repository that have changed.
818
819 Furthermore, the package manager implementations can restrict checking
820 only to the parts of the repository that are actually being used.
821
822
823 Backwards Compatibility
824 =======================
825
826 This GLEP provides optional means of preserving backwards compatibility.
827 To preserve the backwards compatibility, the following needs to hold
828 for the ``Manifest`` file in every package directory:
829
830 - all files must be covered by the single ``Manifest`` file,
831
832 - all distfiles used by the package must be included,
833
834 - all files inside the ``files/`` subdirectory need to use
835 the ``AUX`` tag (rather than ``DATA``),
836
837 - all ``.ebuild`` files need to use the ``EBUILD`` tag,
838
839 - the ``metadata.xml`` and ``ChangeLog`` files need to use
840 the ``MISC`` tag,
841
842 - the Manifest can be signed to provide authenticity verification,
843
844 - an uncompressed Manifest must always exist, and a compressed Manifest
845 of identical content may be present.
846
847 Once the backwards compatibility is no longer a concern, the above
848 no longer needs to hold and the deprecated tags can be removed.
849
850
851 Reference Implementation
852 ========================
853
854 The reference implementation for this GLEP is being developed
855 as the gemato project [#GEMATO]_.
856
857
858 Credits
859 =======
860
861 Thanks to all the people whose contributions were invaluable
862 to the creation of this GLEP. This includes but is not limited to:
863
864 - Robin Hugh Johnson,
865 - Ulrich Müller.
866
867 Additionally, thanks to Robin Hugh Johnson for the original
868 MataManifest GLEP series which served both as inspiration and source
869 of many concepts used in this GLEP. Recursively, also thanks to all
870 the people who contributed to the original GLEPs.
871
872
873 References
874 ==========
875
876 .. [#GLEP44] GLEP 44: Manifest2 format
877 (https://www.gentoo.org/glep/glep-0044.html)
878
879 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
880 - Overview
881 (https://www.gentoo.org/glep/glep-0057.html)
882
883 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
884 - Infrastructure to User distribution - MetaManifest
885 (https://www.gentoo.org/glep/glep-0058.html)
886
887 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
888 (https://www.gentoo.org/glep/glep-0059.html)
889
890 .. [#GLEP60] GLEP 60: Manifest2 filetypes
891 (https://www.gentoo.org/glep/glep-0060.html)
892
893 .. [#GLEP61] GLEP 61: Manifest2 compression
894 (https://www.gentoo.org/glep/glep-0061.html)
895
896 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
897 Format - SRC_URI
898 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
899
900 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
901 (https://www.ietf.org/rfc/rfc1321.txt)
902
903 .. [#RIPEMD160] The hash function RIPEMD-160
904 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
905
906 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
907 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
908
909 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
910 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
911
912 .. [#BLAKE2] BLAKE2 -- fast secure hashing
913 (https://blake2.net/)
914
915 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
916 and Extendable-Output Functions
917 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
918
919 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
920 (https://www.streebog.net/)
921
922 .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
923 (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
924
925 .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
926 at the time of writing are duplicate, representing a 2 MiB
927 out of 25 MiB of DIST entries altogether.
928
929 .. [#GEMATO] gemato: Gentoo Manifest Tool
930 (https://github.com/mgorny/gemato/)
931
932 Copyright
933 =========
934 This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
935 Unported License. To view a copy of this license, visit
936 http://creativecommons.org/licenses/by-sa/3.0/.
937
938
939 --
940 Best regards,
941 Michał Górny

Replies