Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v3]
Date: Tue, 21 Nov 2017 17:26:32
Message-Id: 1511285178.1074.13.camel@gentoo.org
In Reply to: [gentoo-dev] [RFC] GLEP 74 post-Council review update by "Michał Górny"
1 W dniu czw, 16.11.2017 o godzinie 11∶19 +0100, użytkownik Michał Górny
2 napisał:
3 > Hi, everyone.
4 >
5 > Here's the updated version of GLEP 74 taking into consideration
6 > the points made during the Council pre-review.
7 >
8 > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
9 > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
10 >
11 > Changes:
12 >
13
14 5ba0654 glep-0074: Specify slash as path separator, disallow backwards
15 slash
16 d3b65ba glep-0074: Mention that newline needs to be restricted too in
17 rationale
18 54cc3ef glep-0074: Apply suggestions from Ulrich Müller
19
20
21 ---
22 GLEP: 74
23 Title: Full-tree verification using Manifest files
24 Author: Michał Górny <mgorny@g.o>,
25 Robin Hugh Johnson <robbat2@g.o>,
26 Ulrich Müller <ulm@g.o>
27 Type: Standards Track
28 Status: Draft
29 Version: 1
30 Created: 2017-10-21
31 Last-Modified: 2017-11-16
32 Post-History: 2017-10-26, 2017-11-16
33 Content-Type: text/x-rst
34 Requires: 59, 61
35 Replaces: 44, 58, 60
36 ---
37
38 Abstract
39 ========
40
41 This GLEP extends the Manifest file format to cover full-tree file
42 integrity and authenticity checks. The format aims to be future-proof,
43 efficient and provide means of backwards compatibility.
44
45
46 Motivation
47 ==========
48
49 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
50 means of verifying the integrity of distfiles and package files
51 in Gentoo. Combined with OpenPGP signatures, they provide means to
52 ensure the authenticity of the covered files. However, as noted
53 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
54 authenticity verification as they do not cover any files outside
55 the package directory. In particular, they provide multiple ways
56 for a third party to inject malicious code into the ebuild environment.
57
58 Historically, the topic of providing authenticity coverage for the whole
59 repository has been mentioned multiple times. The most noteworthy effort
60 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
61 They were accepted by the Council in 2010 but have never been
62 implemented. When potential implementation work started in 2017, a new
63 discussion about the specification arose. It prompted the creation
64 of a competing GLEP that would provide a redesigned alternative to
65 the old GLEPs.
66
67 This specification is designed with the following goals in mind:
68
69 1. It should provide means to ensure the authenticity of the complete
70 repository, including preventing the injection of additional files.
71
72 2. The format should be universal enough to work both for the Gentoo
73 repository and third-party repositories of different characteristics.
74
75 3. The Manifest files should be verifiable stand-alone, that is without
76 knowing any details about the underlying repository format.
77
78
79 Specification
80 =============
81
82 Manifest file format
83 --------------------
84
85 This specification reuses and extends the Manifest file format defined
86 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
87 repurposed as a generic *tag* that could also indicate additional
88 (non-checksum) metadata. Appropriately, those tags can be followed by
89 other space-separated values.
90
91 Unless specified otherwise, the paths used in the Manifest files
92 are relative to the directory containing the Manifest file. The paths
93 must not reference the parent directory (``..``).
94
95 The Manifest files use UTF-8 encoding.
96
97
98 Manifest file locations and nesting
99 -----------------------------------
100
101 The ``Manifest`` file located in the root directory of the repository
102 is called top-level Manifest, and it is used to perform the full-tree
103 verification. In order to verify the authenticity, it must be signed
104 using OpenPGP, using the armored cleartext format.
105
106 The top-level Manifest may reference sub-Manifests contained
107 in subdirectories of the repository. The sub-Manifests are traditionally
108 named ``Manifest``; however, the implementation must support arbitrary
109 names, including the possibility of multiple (split) Manifests
110 for a single directory. The sub-Manifest can only cover the files inside
111 the directory tree where it resides.
112
113 The sub-Manifest can also be signed using OpenPGP armored cleartext
114 format. However, the signature verification can be omitted since it
115 already is covered by the signed top-level Manifest.
116
117
118 Directory tree coverage
119 -----------------------
120
121 The specification provides three ways of skipping Manifest verification
122 of specific files and directories (recursively):
123
124 1. explicit ``IGNORE`` entries in Manifest files,
125
126 2. injected ignore paths via package manager configuration,
127
128 3. using names starting with a dot (``.``) which are always skipped.
129
130 All files that are not ignored must be covered by at least one
131 of the Manifests.
132
133 A single file may be matched by multiple identical or equivalent
134 Manifest entries, if and only if the entries have the same semantics,
135 specify the same size and the checksums common to both entries match.
136 It is an error for a single file to be matched by multiple entries
137 of different semantics, file size or checksum values. It is an error
138 to specify another entry for a file matching ``IGNORE``, or one of its
139 subdirectories.
140
141 The file entries (except for ``IGNORE``) can be specified for regular
142 files only. Symbolic links are followed when opening files
143 and traversing directories. It is an error to specify an entry for
144 a different file type. If the tree contain files of other types
145 that are not otherwise ignored, they need to be covered by an explicit
146 ``IGNORE``.
147
148 All the local (non-``DIST``) files covered by a Manifest tree must
149 reside on the same filesystem. It is an error to specify entries
150 applying to files on another filesystem. If files or directories that
151 are not otherwise ignored reside on a different filesystem, or symbolic
152 links point to targets on a different filesystem, they must
153 be explicitly excluded via ``IGNORE``.
154
155 All paths specified in the Manifest file must consist of characters
156 corresponding to valid UTF-8 code points excluding the NULL character
157 (``U+0000``), the backwards slash (``\``) and characters classified
158 as whitespace in the current version of the Unicode standard
159 [#UNICODE]_. It is an error to use Manifest files in directories
160 containing files whose names contain the disallowed characters.
161 The forward slash (``/``) must be used as path separator.
162
163
164 File verification
165 -----------------
166
167 When verifying a file against the Manifest, the following rules are
168 used:
169
170 1. If the file is covered directly or indirectly by an entry
171 of the ``IGNORE`` type, the verification always succeeds.
172
173 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
174 ``MISC``, ``EBUILD`` or ``AUX`` type:
175
176 a. if the file is not present, then the verification fails,
177
178 b. if the file is present but has a different size or one
179 of the checksums does not match, the verification fails,
180
181 c. otherwise, the verification succeeds.
182
183 3. If the file is present but not listed in Manifest, the verification
184 fails.
185
186 Unless specified otherwise, the package manager must not allow using
187 any files for which the verification failed. The package manager may
188 reject any package or even the whole repository if it may refer to files
189 for which the verification failed.
190
191
192 Timestamp verification
193 ----------------------
194
195 The top-level Manifest file can contain a ``TIMESTAMP`` entry to account
196 for attacks against tree update distribution. If such an entry
197 is present, it should be updated every time at least one
198 of the Manifests changes. Every unique timestamp value must correspond
199 to a single tree state.
200
201 During the verification process, the client should compare the timestamp
202 against the update time obtained from a local clock or a trusted time
203 source. If the comparison result indicates that the Manifest at the time
204 of receiving was already significantly outdated, the client should
205 either fail the verification or require manual confirmation from
206 the user.
207
208 Furthermore, the Manifest provider may employ additional methods
209 of distributing the timestamps of recently generated Manifests
210 using a secure channel from a trusted source for exact comparison.
211 The exact details of such a solution are outside the scope of this
212 specification.
213
214 ``TIMESTAMP`` entries may also be present in sub-Manifests. Those
215 timestamps must not be newer than the timestamp of the top-level
216 Manifest (if present). This specification does not define any specific
217 use for them.
218
219
220 Modern Manifest tags
221 --------------------
222
223 The Manifest files can specify the following tags:
224
225 ``TIMESTAMP <iso8601>``
226 Specifies a timestamp of when the Manifest file was last updated.
227 The timestamp must be a valid second-precision ISO 8601 extended
228 format combined date and time in UTC timezone, i.e. using
229 the following ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``.
230 Optional. The package manager can use it to detect an outdated
231 repository checkout as described in `Timestamp verification`_.
232
233 ``MANIFEST <path> <size> <checksums>...``
234 Specifies a sub-Manifest. The sub-Manifest must be verified like
235 a regular file. If the verification succeeds, the entries from
236 the sub-Manifest are included for verification as described
237 in `Manifest file locations and nesting`_.
238
239 ``IGNORE <path>``
240 Ignores a subdirectory or file from Manifest checks. If the specified
241 path is present, it and its contents are omitted from the Manifest
242 verification (always pass). *Path* must be a plain file or directory
243 path without a trailing slash. Wildcards are not supported
244 and wildcard characters are interpreted literally.
245
246 ``DATA <path> <size> <checksums>...``
247 Specifies a regular file subject to Manifest verification. The file
248 is required to pass verification. Used for all files that do not match
249 any other type.
250
251 ``DIST <filename> <size> <checksums>...``
252 Specifies a distfile entry used to verify files fetched as part
253 of ``SRC_URI``. The filename must match the filename used to store
254 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
255 manager must reject the fetched file if it fails verification.
256 ``DIST`` entries apply to all packages below the Manifest file
257 specifying them.
258
259
260 Deprecated Manifest tags
261 ------------------------
262
263 For backwards compatibility, the following tags are additionally
264 allowed at the package directory level:
265
266 ``EBUILD <filename> <size> <checksums>...``
267 Equivalent to the ``DATA`` type.
268
269 ``MISC <path> <size> <checksums>...``
270 Equivalent to the ``DATA`` type. Historically indicated that
271 the package manager may ignore a verification failure if operating
272 in non-strict mode. However, that behavior is deprecated.
273
274 ``AUX <filename> <size> <checksums>...``
275 Equivalent to the ``DATA`` type, except that the filename is relative
276 to the ``files/`` subdirectory.
277
278
279 Algorithm for full-tree verification
280 ------------------------------------
281
282 In order to perform full-tree verification, the following algorithm
283 can be used:
284
285 1. Collect all files present in the repository into *present* set.
286
287 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
288 Optionally verify the ``TIMESTAMP`` entry if present as specified
289 in `timestamp verification`. Remove the top-level Manifest
290 from the *present* set.
291
292 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
293 files according to the `file verification`_ section, and include
294 their entries in the current Manifest entry list (using paths
295 relative to directories containing the Manifests).
296
297 4. Process all ``IGNORE`` entries. Remove any paths matching them
298 from the *present* set.
299
300 5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
301 and ``AUX`` entries into the *covered* set.
302
303 6. Verify the entries in the *covered* set for incompatible duplicates
304 and collisions with ignored files as explained in `Manifest file
305 locations and nesting`_.
306
307 7. Verify all the files in the union of the *present* and *covered*
308 sets, according to the `file verification`_ section.
309
310
311 Algorithm for finding parent Manifests
312 --------------------------------------
313
314 In order to find the top-level Manifest from the current directory
315 the following algorithm can be used:
316
317 1. Store the current directory as *original* and the device ID
318 of the containing filesystem (``st_dev``) as *startdev*,
319
320 2. If the device ID of the containing filesystem (``st_dev``)
321 of the current directory is different than *startdev*, stop.
322
323 3. If the current directory contains a ``Manifest`` file:
324
325 a. If an ``IGNORE`` entry in the ``Manifest`` file covers
326 the *original* directory (or one of the parent directories), stop.
327
328 b. Otherwise, store the current directory as *last_found*.
329
330 4. If the current directory is the root system directory (``/``), stop.
331
332 5. Otherwise, enter the parent directory and jump to step 2.
333
334 Once the algorithm stops, *last_found* will contain the relevant
335 top-level Manifest. If *last_found* is null, then the directory tree
336 does not contain any valid top-level Manifest candidates and one should
337 be created in the *original* directory.
338
339 Once the top-level Manifest is found, its ``MANIFEST`` entries should
340 be used to find any sub-Manifests below the top-level Manifest,
341 up to and including the *original* directory. Note that those
342 sub-Manifests can use different filenames than ``Manifest``.
343
344
345 Checksum algorithms
346 -------------------
347
348 This section is informational only. Specifying the exact set
349 of supported algorithms is outside the scope of this specification.
350
351 The algorithm names reserved at the time of writing are:
352
353 - ``MD5`` [#MD5]_,
354 - ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_,
355 - ``SHA1`` [#SHS]_,
356 - ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_,
357 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
358 - ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_,
359 - ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_,
360 - ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes
361 [#STREEBOG]_.
362
363 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
364 It is recommended that any new hashes are named after the Python
365 ``hashlib`` module algorithm names, transformed into uppercase.
366
367
368 Manifest compression
369 --------------------
370
371 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
372 This section merely addresses interoperability issues between Manifest
373 compression and this specification.
374
375 The compressed Manifest files are required to be suffixed for their
376 compression algorithm. This suffix should be used to recognize
377 the compression and decompress Manifests transparently. The exact list
378 of algorithms and their corresponding suffixes are outside the scope
379 of this specification.
380
381 The top-level Manifest file must not be compressed. Since the OpenPGP
382 signature covers the uncompressed text and is compressed itself,
383 the data would have to be decompressed without any prior verification.
384 This could expose users e.g. to zip bombs or exploits on decompressor
385 vulnerabilities.
386
387 Whenever this specification refers to sub-Manifests, they can use any
388 names but are also required to use a specific compression suffix.
389 The ``MANIFEST`` entries are required to specify the full name including
390 compression suffix, and the verification is performed on the compressed
391 file.
392
393 The specification permits uncompressed Manifests to exist alongside
394 their compressed counterparts, and multiple compressed formats
395 to coexist. If that is the case, the files must have the same
396 uncompressed content and the specification is free to choose either
397 of the files using the same base name.
398
399
400 Combining multiple Manifest trees (informational)
401 -------------------------------------------------
402
403 This specification permits nesting multiple hierarchical Manifest trees.
404 In this layout, the specific directories of the Manifest tree can
405 be verified both as a part of another top-level Manifest,
406 and as an independent Manifest tree (when obtained without the parent
407 directory).
408
409 For this to work, the sub-Manifest file in the directory must also
410 satisfy the requirements for the top-level Manifest file. That is:
411
412 - it must be named ``Manifest`` and not compressed,
413
414 - it must cover all the files in this directory and its subdirectories
415 (i.e. no files from the directory tree can be covered by parent
416 Manifest),
417
418 - if authenticity verification is desired, it must be OpenPGP-signed.
419
420 It should be noted that if such a directory is a subdirectory of a valid
421 Manifest tree, the sub-Manifest needs to be valid according
422 to the top-level Manifest and the OpenPGP signature is disregarded
423 as detailed in `Manifest file locations and nesting`_. The top-level
424 behavior is exhibited only when the directory is obtained without parent
425 directories.
426
427
428 An example Manifest file (informational)
429 ----------------------------------------
430
431 An example top-level Manifest file for the Gentoo repository would have
432 the following content::
433
434 TIMESTAMP 2017-10-30T10:11:12Z
435 IGNORE distfiles
436 IGNORE local
437 IGNORE lost+found
438 IGNORE packages
439 MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
440 ...
441 MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
442 ...
443
444 An example modern Manifest (disregarding backwards compatibility)
445 for a package directory would have the following content::
446
447 DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
448 DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
449 DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
450 DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
451 DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
452 DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
453 DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
454
455
456 Rationale
457 =========
458
459 Stand-alone format
460 ------------------
461
462 The first question that needed to be asked before proceeding with
463 the design was whether the Manifest file format was supposed to be
464 stand-alone, or tightly bound to the repository format.
465
466 The stand-alone format has been selected because of its three
467 advantages:
468
469 1. It is more future-proof. If an incompatible change to the repository
470 format is introduced, only developers need to upgrade the tools
471 they use to generate the Manifests. The tools used to verify
472 the updated Manifests will continue to work.
473
474 2. It is more flexible and universal. With a dedicated tool,
475 the Manifest files can be used to sign and verify arbitrary file
476 sets.
477
478 3. It keeps the verification tool simpler. In particular, we can easily
479 write an independent verification tool that could work on any
480 distribution without needing to depend on a package manager
481 implementation or rewrite parts of it.
482
483 Designing a stand-alone format requires that the Manifest carries enough
484 information to perform the verification following all the rules specific
485 to the Gentoo repository.
486
487
488 Tree design
489 -----------
490
491 The second important point of the design was determining whether
492 the Manifest files should be structured hierarchically, or independent.
493 Both options have their advantages.
494
495 In the hierarchical model, each sub-Manifest file is covered by a higher
496 level Manifest. As a result, only the top-level Manifest has to be
497 OpenPGP-signed, and subsequent Manifests need to be only verified by
498 checksum stored in the parent Manifest. This has the following
499 implications:
500
501 - Verifying any set of files in the repository requires using checksums
502 from the most relevant Manifests and the parent Manifests.
503
504 - The OpenPGP signature of the top-level Manifest needs to be verified
505 only once per process.
506
507 - Altering any set of files requires updating the relevant Manifests,
508 and their parent Manifests up to the top-level Manifest, and signing
509 the last one.
510
511 - As a result, the top-level Manifest changes on every commit,
512 and various middle-level Manifests change (and need to be transferred)
513 frequently.
514
515 In the independent model, each sub-Manifest file is independent
516 of the parent Manifests. As a result, each of them needs to be signed
517 and verified independently. However, the parent Manifests still need
518 to list sub-Manifests (albeit without verification data) in order
519 to detect removal or replacement of subdirectories. This has
520 the following implications:
521
522 - Verifying any set of files in the repository requires using checksums
523 and verifying signatures of the most relevant Manifest files.
524
525 - Altering any set of files requires updating the relevant Manifests
526 and signing them again.
527
528 - Parent Manifests are updated only when Manifests are added or removed
529 from subdirectories. As a result, they change infrequently.
530
531 While both models have their advantages, the hierarchical model was
532 selected because it reduces the number of OpenPGP operations
533 (which are comparatively costly) to the minimum.
534
535
536 Tree layout restrictions
537 ------------------------
538
539 The algorithm is meant to work primarily with ebuild repositories which
540 normally contain only files and directories. Directories provide
541 no useful metadata for verification, and specifying special entries
542 for additional file types is purposeless. Therefore, the specification
543 is restricted to dealing with regular files.
544
545 The Gentoo repository does not use symbolic links. Some Gentoo
546 repositories do, however. To provide a simple solution for dealing with
547 symlinks without having to take care to implement special handling for
548 them, the common behavior of implicitly resolving them is used.
549 Therefore, symbolic links to files are stored as if they were regular
550 files, and symbolic links to directories are followed as if they were
551 regular directories.
552
553 Dotfiles are implicitly ignored as that is a common notion used
554 in software written for POSIX systems. All other filenames require
555 explicit ``IGNORE`` lines.
556
557 An ability to inject additional ignore entries is provided to account
558 for site configuration affecting the repository tree -- placing
559 additional files in it, skipping some of the categories from syncing.
560 This configuration can extend beyond the limits of this GLEP,
561 e.g. by allowing wildcards or regular expressions.
562
563 The algorithm is restricted to work on a single filesystem. This is
564 mostly relevant when scanning for top-level Manifest -- we do not want
565 to cross filesystem boundaries then. However, to ensure consistent
566 bidirectional behavior we need to also ban them when operating downwards
567 the tree.
568
569 The directories and files on different filesystems need to be ignored
570 explicitly as implicitly skipping them would cause confusion.
571 In particular, tools might then claim that a file does not exist when
572 it clearly does because it was skipped due to filesystem boundaries.
573
574
575 Filename character set restriction
576 ----------------------------------
577
578 The valid set of filename characters for the Gentoo repository
579 is restricted by the devmanual 'File Naming Rules' section
580 [#FILE-NAMING-RULES]_, and enforced via a git hook. The valid distfile
581 names are not restricted explicitly -- however, the PMS dependency
582 specification syntax [#PMS-FETCH]_ implicitly makes it impossible to use
583 filenames containing whitespace.
584
585 This specification aims to avoid arbitrary restrictions. For this
586 reason, filename characters are only restricted by excluding two
587 technically problematic groups:
588
589 1. The NULL character (``U+0000``) is normally used to indicate the end
590 of a null-terminated string. Its use could therefore break programs
591 written using C. Furthermore, it is not allowed in any known
592 filesystem.
593
594 2. The backwards slash character (``\``) is frequently used as an escape
595 character, in particular in the languages derived from C and in shell
596 script. Furthermore, it is used as path separator on Windows systems.
597 It is forbidden to avoid implementation mistakes (in particular,
598 attempting to use it to escape whitespace or as path separator
599 on Windows) but also reserved for possible future extension.
600
601 3. Whitespace characters are used to separate Manifest fields
602 and entries. While technically it would be enough to restrict space
603 (``U+0020``) character that is normally used as the separator
604 and newline (``U+000A``) character that is used to separate lines,
605 all whitespace characters are forbidden to avoid confusion
606 and implementation errors.
607
608 While the specification could be extended to allow such filenames
609 by using some form of escaping, there is currently no apparent need
610 for such a feature.
611
612 Historically, Portage attempted to overcome the whitespace limitation
613 by attempting to locate the size field and take everything before it
614 as filename. This was terribly fragile and even if it worked, it would
615 solve the problem only partially.
616
617 Since the same restrictions apply to ``IGNORE`` rules, it is currently
618 not possible to either list or ignore the file using whitespace
619 characters. Therefore, the presence of such files is forbidden entirely.
620
621
622 File verification model
623 -----------------------
624
625 The verification model aims to provide full coverage against different
626 forms of attack. In particular, three different kinds of manipulation
627 are considered:
628
629 1. Alteration of the file content.
630
631 2. Removal of a file.
632
633 3. Addition of a new file.
634
635 In order to prevent against all three, the system requires that all
636 files in the repository are listed in Manifests and verified against
637 them.
638
639 As a special case, ignores are allowed to account for directories
640 that are not part of the repository but were traditionally placed inside
641 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
642 could be also used to ignore VCS directories such as ``CVS``.
643
644
645 Non-strict Manifest verification
646 --------------------------------
647
648 Originally the Manifest2 format provided a special ``MISC`` tag that
649 was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
650 indicated that the Manifest verification failures could be ignored for
651 those files unless the package manager was working in strict mode.
652
653 The first versions of this specification continued the use of this tag.
654 However, after a long debate it was decided to deprecate it along with
655 the non-strict behavior, and require all files to strictly match.
656
657 Two arguments were mentioned for the usefulness of a ``MISC`` type:
658
659 1. being able to reduce the checkout size by stripping unnecessary
660 files out, and
661
662 2. being able to update automatically generated files locally
663 without causing unnecessary verification failures.
664
665 However, the usefulness of ``MISC`` in both cases is doubtful.
666
667 The cases for stripping unnecessary files mostly focused around space
668 savings. For this purpose, stripping ``metadata.xml`` and similar files
669 has little value. It is much more common for users to strip whole
670 packages or categories. The ``MISC`` type is not suitable for that,
671 and so a dedicated package manager mechanism needs to be developed
672 instead. The same mechanism can also handle files that historically used
673 the ``MISC`` type. As an example, the package manager may choose
674 to generate both the rsync exclusion list and Manifest ignore list
675 using a single source list.
676
677 The cases for autogenerated files involve such cache files
678 as ``use.local.desc``. However, we can not include ``md5-cache`` there
679 due to security concerns which results in inconsistent cache handling.
680 Furthermore, the tools were historically modified to provide stable
681 output which means that their content can not change without
682 a non-``MISC`` content being changed first. This practically defeats
683 the purpose of using ``MISC``.
684
685 Finally, the non-strict mode could be used as means to an attack.
686 The allowance of missing or modified documentation file could be used
687 to spread misinformation, resulting in bad decisions made by the user.
688 A modified file could also be used, e.g. to exploit vulnerabilities
689 of an XML parser.
690
691
692 Timestamp field
693 ---------------
694
695 The top-level Manifest optionally allows using a ``TIMESTAMP`` tag
696 to include a generation timestamp in the Manifest. A similar feature
697 was originally proposed in GLEP 58 [#GLEP58]_.
698
699 A malicious third-party may use the principles of exclusion or replay
700 [#C08]_ to deny an update to clients, while at the same time recording
701 the identity of clients to attack. The timestamp field can be used to
702 detect that.
703
704 In order to provide more complete protection, the Gentoo Infrastructure
705 should provide an ability to obtain the timestamps of all Manifests
706 from a recent timeframe over a secure channel from a trusted source
707 for comparison.
708
709 Strictly speaking, this information is provided by the various
710 ``metadata/timestamp*`` files that are already present. However,
711 including the value in the Manifest itself has a little cost
712 and provides the ability to perform the verification stand-alone.
713
714 Furthermore, some of the timestamp files are added very late
715 in the distribution process, past the Manifest generation phase. Those
716 files will most likely receive ``IGNORE`` entries and therefore
717 be unsafe to use.
718
719 The specification permits additional timestamps in sub-Manifest files
720 for local use. A generic testing tool should ignore them.
721
722
723 New vs deprecated tags
724 ----------------------
725
726 Out of the four types defined by Manifest2, only one is reused
727 and the remaining three are replaced by a single, universal ``DATA``
728 type.
729
730 The ``DIST`` tag is reused since the specification does not change
731 anything with regard to distfile handling.
732
733 The ``EBUILD`` tag could potentially be reused for generic file
734 verification data. However, it would be confusing if all the different
735 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
736 type was introduced as a replacement.
737
738 The ``MISC`` tag and the relevant non-strict mode has been removed
739 as being of little value, as detailed in the `Non-strict Manifest
740 verification`_ section.
741
742 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
743 the limiting property of implicit ``files/`` path prefix.
744
745
746 Finding top-level Manifest
747 --------------------------
748
749 The development of a reference implementation for this GLEP has brought
750 the following problem: how to find all the relevant Manifests when
751 the Manifest tool is run inside a subdirectory of the repository?
752
753 One of the options would be to provide a bi-directional linking
754 of Manifests via a ``PARENT`` tag. However, that would not solve
755 the problem when a new Manifest file is being created.
756
757 Instead, an algorithm for iterating over parent directories is proposed.
758 Since there is no obligatory explicit indicator for the top-level
759 Manifest, the algorithm assumes that the top-level Manifest
760 is the highest ``Manifest`` in the directory hierarchy that can cover
761 the current directory. This generally makes sense since the Manifest
762 files are required to provide coverage for all subdirectories, so all
763 Manifests starting from that one need to be updated.
764
765 If independent Manifest trees are nested in the directory structure,
766 then an ``IGNORE`` entry needs to be used to separate them.
767
768 Since sub-Manifests can use any filenames, the Manifest finding
769 algorithm must not short-cut the procedure by storing all ``Manifest``
770 files along the parent directories. Instead, it needs to retrace
771 the relevant sub-Manifest files along ``MANIFEST`` entries
772 in the top-level Manifest.
773
774
775 Injecting ChangeLogs into the checkout
776 --------------------------------------
777
778 One of the problems considered in the new Manifest format was injecting
779 historical and autogenerated ChangeLog into the repository. We normally
780 don't include those files, to reduce the checkout size. However, some
781 users have shown interest in them and Infra is working on providing them
782 via an additional rsync module.
783
784 If such files were injected into the repository, they would cause
785 verification failures of Manifests. To account for this, Infra could
786 provide ``IGNORE`` entries to allow them to exist.
787
788
789 Splitting distfile checksums from file checksums
790 ------------------------------------------------
791
792 Another problem with the current Manifest format is that the checksums
793 for fetched files are combined with checksums for local files
794 in a single file inside the package directory. It has been specifically
795 pointed out that:
796
797 - since distfiles are sometimes reused across different packages,
798 the repeating checksums are redundant [#DIST]_.
799
800 - mirror admins were interested in the possibility of verifying all
801 the distfiles with a single tool.
802
803 This specification does not provide a clean solution to this problem.
804 It technically permits moving ``DIST`` entries to higher-level Manifests
805 but the usefulness of such a solution is doubtful.
806
807 However, for the second problem we will probably deliver a dedicated
808 tool working with this Manifest format.
809
810
811 Hash algorithms
812 ---------------
813
814 While maintaining a consistent supported hash set is important
815 for interoperability, it is not a good fit for the generic layout
816 of this GLEP. Furthermore, it would require updating the GLEP
817 in the future every time the used algorithms change.
818
819 Instead, the specification focuses on listing the currently used
820 algorithm names for interoperability, and sets a recommendation
821 for consistent naming of algorithms in the future. The Python
822 ``hashlib`` module is used as a reference since it is used
823 as the provider of hash functions for most of the Python software,
824 including Portage and PkgCore.
825
826 The basic rules for changing hash algorithms are defined in GLEP 59
827 [#GLEP59]_. The implementations can focus only on those algorithms
828 that are actually used or planned on being used. It may be feasible
829 to devise a new GLEP that specifies the currently used hashes (or update
830 GLEP 59 accordingly).
831
832
833 Manifest compression
834 --------------------
835
836 The support for Manifest compression is introduced with minimal changes
837 to the file format. The ``MANIFEST`` entries are required to provide
838 the real (compressed) file path for compatibility with other file
839 entries and to avoid confusion.
840
841 The compression of top-level Manifest file has been prohibited
842 as the specification currently does not provide any means of verifying
843 the file prior to decompression. If the top-level Manifest is
844 compressed, tooling will have to unpack the file before being able
845 to verify the contents. This makes it possible for a malicious third
846 party to attack the system by providing a compressed Manifest that
847 exposes decompressor vulnerabilities, or a zip bomb.
848
849 The OpenPGP cleartext signature covers the contents of the Manifest,
850 and is therefore compressed along with them. The possibility of using
851 a detached signature has been considered but it was rejected as
852 unnecessary complexity for minor gain.
853
854 Technically, a similar result could be effected via moving all the data
855 into a compressed sub-Manifest in the top directory (e.g.
856 ``Manifest.sub.gz``), and including a ``MANIFEST`` entry for this file
857 in a signed, uncompressed top-level Manifest.
858
859 The existence of additional entries for uncompressed Manifest checksums
860 was debated. However, plain entries for the uncompressed file would
861 be confusing if only the compressed file existed, and conflicting
862 if both uncompressed and compressed variants existed. Furthermore,
863 it has been pointed out that ``DIST`` entries do not have
864 an uncompressed variant either.
865
866
867 Performance considerations
868 --------------------------
869
870 Performing a full-tree verification on every sync raises some
871 performance concerns for end-user systems. The initial testing has shown
872 that a cold-cache verification on a btrfs file system can take up around
873 4 minutes, with the process being mostly I/O bound. On the other hand,
874 it can be expected that the verification will be performed directly
875 after syncing, taking advantage of a warm filesystem cache.
876
877 To improve speed on I/O and/or CPU-restrained systems even further,
878 the algorithms can be easily extended to perform incremental
879 verification. Given that rsync does not preserve mtimes by default,
880 the tool can take advantage of mtime and Manifest comparisons to recheck
881 only the parts of the repository that have changed.
882
883 Furthermore, the package manager implementations can restrict checking
884 only to the parts of the repository that are actually being used.
885
886
887 Backwards Compatibility
888 =======================
889
890 This GLEP provides optional means of preserving backwards compatibility.
891 To preserve the backwards compatibility, the following needs to hold
892 for the ``Manifest`` file in every package directory:
893
894 - all files must be covered by the single ``Manifest`` file,
895
896 - all distfiles used by the package must be included,
897
898 - all files inside the ``files/`` subdirectory need to use
899 the ``AUX`` tag (rather than ``DATA``),
900
901 - all ``.ebuild`` files need to use the ``EBUILD`` tag,
902
903 - the ``metadata.xml`` and ``ChangeLog`` files need to use
904 the ``MISC`` tag,
905
906 - the Manifest can be signed to provide authenticity verification,
907
908 - an uncompressed Manifest must always exist, and a compressed Manifest
909 of identical content may be present.
910
911 Once the backwards compatibility is no longer a concern, the above
912 no longer needs to hold and the deprecated tags can be removed.
913
914
915 Reference Implementation
916 ========================
917
918 The reference implementation for this GLEP is being developed
919 as the gemato project [#GEMATO]_.
920
921
922 Credits
923 =======
924
925 Thanks to all the people whose contributions were invaluable
926 to the creation of this GLEP. This includes but is not limited to:
927
928 - Robin Hugh Johnson,
929 - Ulrich Müller.
930
931 Additionally, thanks to Robin Hugh Johnson for the original
932 MetaManifest GLEP series which served both as inspiration and source
933 of many concepts used in this GLEP. Recursively, also thanks to all
934 the people who contributed to the original GLEPs.
935
936
937 References
938 ==========
939
940 .. [#GLEP44] GLEP 44: Manifest2 format
941 (https://www.gentoo.org/glep/glep-0044.html)
942
943 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
944 - Overview
945 (https://www.gentoo.org/glep/glep-0057.html)
946
947 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
948 - Infrastructure to User distribution - MetaManifest
949 (https://www.gentoo.org/glep/glep-0058.html)
950
951 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
952 (https://www.gentoo.org/glep/glep-0059.html)
953
954 .. [#GLEP60] GLEP 60: Manifest2 filetypes
955 (https://www.gentoo.org/glep/glep-0060.html)
956
957 .. [#GLEP61] GLEP 61: Manifest2 compression
958 (https://www.gentoo.org/glep/glep-0061.html)
959
960 .. [#UNICODE] The Unicode standard
961 (https://unicode.org/versions/latest/)
962
963 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
964 Format - SRC_URI
965 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
966
967 .. [#FILE-NAMING-RULES] Ebuild File Format -- Gentoo Development Guide
968 (https://devmanual.gentoo.org/ebuild-writing/file-format/#file-naming-rules)
969
970 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
971 (https://www.ietf.org/rfc/rfc1321.txt)
972
973 .. [#RIPEMD160] The hash function RIPEMD-160
974 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
975
976 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
977 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
978
979 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
980 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
981
982 .. [#BLAKE2] BLAKE2 -- fast secure hashing
983 (https://blake2.net/)
984
985 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
986 and Extendable-Output Functions
987 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
988
989 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
990 (https://www.streebog.net/)
991
992 .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
993 (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
994
995 .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
996 at the time of writing are duplicate, representing 2 MiB
997 out of 25 MiB of DIST entries altogether.
998
999 .. [#GEMATO] gemato: Gentoo Manifest Tool
1000 (https://github.com/mgorny/gemato/)
1001
1002
1003 Copyright
1004 =========
1005 This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
1006 Unported License. To view a copy of this license, visit
1007 http://creativecommons.org/licenses/by-sa/3.0/.
1008
1009 --
1010 Best regards,
1011 Michał Górny

Replies

Subject Author
Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v3] Ulrich Mueller <ulm@g.o>