Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [v1.0.4] GLEP 74: Full-tree verification using Manifest files
Date: Mon, 06 Nov 2017 21:53:35
Message-Id: 1510005202.13103.2.camel@gentoo.org
In Reply to: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files by "Michał Górny"
1 Hopefully the last version, after getting all the suggestions
2 from Robin.
3
4 W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
5 napisał:
6 >
7 > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
8 > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
9 > impl: https://github.com/mgorny/gemato/
10 >
11
12 ---
13 GLEP: 74
14 Title: Full-tree verification using Manifest files
15 Author: Michał Górny <mgorny@g.o>,
16 Robin Hugh Johnson <robbat2@g.o>,
17 Ulrich Müller <ulm@g.o>
18 Type: Standards Track
19 Status: Draft
20 Version: 1
21 Created: 2017-10-21
22 Last-Modified: 2017-11-06
23 Post-History: 2017-10-26
24 Content-Type: text/x-rst
25 Requires: 59, 61
26 Replaces: 44, 58, 60
27 ---
28
29 Abstract
30 ========
31
32 This GLEP extends the Manifest file format to cover full-tree file
33 integrity and authenticity checks.The format aims to be future-proof,
34 efficient and provide means of backwards compatibility.
35
36
37 Motivation
38 ==========
39
40 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
41 means of verifying the integrity of distfiles and package files
42 in Gentoo. Combined with OpenPGP signatures, they provide means to
43 ensure the authenticity of the covered files. However, as noted
44 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
45 authenticity verification as they do not cover any files outside
46 the package directory. In particular, they provide multiple ways
47 for a third party to inject malicious code into the ebuild environment.
48
49 Historically, the topic of providing authenticity coverage for the whole
50 repository has been mentioned multiple times. The most noteworthy effort
51 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
52 They were accepted by the Council in 2010 but have never been
53 implemented. When potential implementation work started in 2017, a new
54 discussion about the specification arose. It prompted the creation
55 of a competing GLEP that would provide a redesigned alternative to
56 the old GLEPs.
57
58 This specification is designed with the following goals in mind:
59
60 1. It should provide means to ensure the authenticity of the complete
61 repository, including preventing the injection of additional files.
62
63 2. The format should be universal enough to work both for the Gentoo
64 repository and third-party repositories of different characteristics.
65
66 3. The Manifest files should be verifiable stand-alone, that is without
67 knowing any details about the underlying repository format.
68
69
70 Specification
71 =============
72
73 Manifest file format
74 --------------------
75
76 This specification reuses and extends the Manifest file format defined
77 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
78 repurposed as a generic *tag* that could also indicate additional
79 (non-checksum) metadata. Appropriately, those tags can be followed by
80 other space-separated values.
81
82 Unless specified otherwise, the paths used in the Manifest files
83 are relative to the directory containing the Manifest file. The paths
84 must not reference the parent directory (``..``).
85
86
87 Manifest file locations and nesting
88 -----------------------------------
89
90 The ``Manifest`` file located in the root directory of the repository
91 is called top-level Manifest, and it is used to perform the full-tree
92 verification. In order to verify the authenticity, it must be signed
93 using OpenPGP, using the armored cleartext format.
94
95 The top-level Manifest may reference sub-Manifests contained
96 in subdirectories of the repository. The sub-Manifests are traditionally
97 named ``Manifest``; however, the implementation must support arbitrary
98 names, including the possibility of multiple (split) Manifests
99 for a single directory. The sub-Manifest can only cover the files inside
100 the directory tree where it resides.
101
102 The sub-Manifest can also be signed using OpenPGP armored cleartext
103 format. However, the signature verification can be omitted if it is
104 covered by a signed top-level Manifest.
105
106
107 Directory tree coverage
108 -----------------------
109
110 The specification provides three ways of skipping Manifest verification
111 of specific files and directories (recursively):
112
113 1. explicit ``IGNORE`` entries in Manifest files,
114
115 2. injected ignore paths via package manager configuration,
116
117 3. using names starting with a dot (``.``) which are always skipped.
118
119 All files that are not ignored must be covered by at least one
120 of the Manifests.
121
122 A single file may be matched by multiple identical or equivalent
123 Manifest entries, if and only if the entries have the same semantics,
124 specify the same size and the checksums common to both entries match.
125 It is an error for a single file to be matched by multiple entries
126 of different semantics, file size or checksum values. It is an error
127 to specify another entry for a file matching ``IGNORE``, or one of its
128 subdirectories.
129
130 The file entries (except for ``IGNORE``) can be specified for regular
131 files only. Symbolic links are followed when opening files
132 and traversing directories. It is an error to specify an entry for
133 a different file type. If the tree contain files of other types
134 that are not otherwise ignored, they need to be covered by an explicit
135 ``IGNORE``.
136
137 All the local (non-``DIST``) files covered by a Manifest tree must
138 reside on the same filesystem. It is an error to specify entries
139 applying to files on another filesystem. If files or directories that
140 are not otherwise ignored reside on a different filesystem, or symbolic
141 links point to targets on a different filesystem, they must
142 be explicitly excluded via ``IGNORE``.
143
144
145 File verification
146 -----------------
147
148 When verifying a file against the Manifest, the following rules are
149 used:
150
151 1. If the file is covered directly or indirectly by an entry
152 of the ``IGNORE`` type, the verification always succeeds.
153
154 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
155 ``MISC``, ``EBUILD`` or ``AUX`` type:
156
157 a. if the file is not present, then the verification fails,
158
159 b. if the file is present but has a different size or one
160 of the checksums does not match, the verification fails,
161
162 c. otherwise, the verification succeeds.
163
164 3. If the file is present but not listed in Manifest, the verification
165 fails.
166
167 Unless specified otherwise, the package manager must not allow using
168 any files for which the verification failed. The package manager may
169 reject any package or even the whole repository if it may refer to files
170 for which the verification failed.
171
172
173 Timestamp verification
174 ----------------------
175
176 The Manifest file can contain a ``TIMESTAMP`` entry to account
177 for attacks against tree update distribution. If such an entry
178 is present, it should be updated every time at least one
179 of the Manifests changes. Every unique timestamp value must correspond
180 to a single tree state.
181
182 During the verification process, the client should compare the timestamp
183 against the update time obtained from a local clock or a trusted time
184 source. If the comparison result indicates that the Manifest at the time
185 of receiving was already significantly outdated, the client should
186 either fail the verification or require manual confirmation from user.
187
188 Furthermore, the Manifest provider may employ additional methods
189 of distributing the timestamps of recently generated Manifests
190 using a secure channel from a trusted source for exact comparison.
191 The exact details of such a solution are outside the scope of this
192 specification.
193
194
195 Modern Manifest tags
196 --------------------
197
198 The Manifest files can specify the following tags:
199
200 ``TIMESTAMP <iso8601>``
201 Specifies a timestamp of when the Manifest file was last updated.
202 The timestamp must be a valid second-precision ISO8601 extended format
203 combined date and time in UTC timezone, i.e. using the following
204 ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
205 in the top-level Manifest file. The package manager can use it
206 to detect an outdated repository checkout as described in `Timestamp
207 verification`_.
208
209 ``MANIFEST <path> <size> <checksums>...``
210 Specifies a sub-Manifest. The sub-Manifest must be verified like
211 a regular file. If the verification succeeds, the entries from
212 the sub-Manifest are included for verification as described
213 in `Manifest file locations and nesting`_.
214
215 ``IGNORE <path>``
216 Ignores a subdirectory or file from Manifest checks. If the specified
217 path is present, it and its contents are omitted from the Manifest
218 verification (always pass). *Path* must be a plain file or directory
219 path without a trailing slash, and must not contain wildcards.
220
221 ``DATA <path> <size> <checksums>...``
222 Specifies a regular file subject to Manifest verification. The file
223 is required to pass verification. Used for all files that do not match
224 any other type.
225
226 ``DIST <filename> <size> <checksums>...``
227 Specifies a distfile entry used to verify files fetched as part
228 of ``SRC_URI``. The filename must match the filename used to store
229 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
230 manager must reject the fetched file if it fails verification.
231 ``DIST`` entries apply to all packages below the Manifest file
232 specifying them.
233
234
235 Deprecated Manifest tags
236 ------------------------
237
238 For backwards compatibility, the following tags are additionally
239 allowed at the package directory level:
240
241 ``EBUILD <filename> <size> <checksums>...``
242 Equivalent to the ``DATA`` type.
243
244 ``MISC <path> <size> <checksums>...``
245 Equivalent to the ``DATA`` type. Historically indicated that
246 the package manager may ignore a verification failure if operating
247 in non-strict mode. However, that behavior is deprecated.
248
249 ``AUX <filename> <size> <checksums>...``
250 Equivalent to the ``DATA`` type, except that the filename is relative
251 to ``files/`` subdirectory.
252
253
254 Algorithm for full-tree verification
255 ------------------------------------
256
257 In order to perform full-tree verification, the following algorithm
258 can be used:
259
260 1. Collect all files present in the repository into *present* set.
261
262 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
263 Optionally verify the ``TIMESTAMP`` entry if present as specified
264 in `timestamp verification`. Remove the top-level Manifest
265 from the *present* set.
266
267 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
268 files according to `file verification`_ section, and include their
269 entries in the current Manifest entry list (using paths relative
270 to directories containing the Manifests).
271
272 4. Process all ``IGNORE`` entries. Remove any paths matching them
273 from the *present* set.
274
275 5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
276 and ``AUX`` entries into the *covered* set.
277
278 6. Verify the entries in *covered* set for incompatible duplicates
279 and collisions with ignored files as explained in `Manifest file
280 locations and nesting`_.
281
282 7. Verify all the files in the union of the *present* and *covered*
283 sets, according to `file verification`_ section.
284
285
286 Algorithm for finding parent Manifests
287 --------------------------------------
288
289 In order to find the top-level Manifest from the current directory
290 the following algorithm can be used:
291
292 1. Store the current directory as *original* and the device ID
293 of the containing filesystem (``st_dev``) as *startdev*,
294
295 2. If the device ID of the containing filesystem (``st_dev``)
296 of the current directory is different than *startdev*, stop.
297
298 3. If the current directory contains a ``Manifest`` file:
299
300 a. If a ``IGNORE`` entry in the ``Manifest`` file covers
301 the *original* directory (or one of the parent directories), stop.
302
303 b. Otherwise, store the current directory as *last_found*.
304
305 4. If the current directory is the root system directory (``/``), stop.
306
307 5. Otherwise, enter the parent directory and jump to step 2.
308
309 Once the algorithm stops, *last_found* will contain the relevant
310 top-level Manifest. If *last_found* is null, then the directory tree
311 does not contain any valid top-level Manifest candidates and one should
312 be created in the *original* directory.
313
314 Once the top-level Manifest is found, its ``MANIFEST`` entries should
315 be used to find any sub-Manifests below the top-level Manifest,
316 up to and including the *original* directory. Note that those
317 sub-Manifests can use different filenames than ``Manifest``.
318
319
320 Checksum algorithms
321 -------------------
322
323 This section is informational only. Specifying the exact set
324 of supported algorithms is outside the scope of this specification.
325
326 The algorithm names reserved at the time of writing are:
327
328 - ``MD5`` [#MD5]_,
329 - ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_,
330 - ``SHA1`` [#SHS]_,
331 - ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_,
332 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
333 - ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_,
334 - ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_,
335 - ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes
336 [#STREEBOG]_.
337
338 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
339 It is recommended that any new hashes are named after the Python
340 ``hashlib`` module algorithm names, transformed into uppercase.
341
342
343 Manifest compression
344 --------------------
345
346 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
347 This section merely addresses interoperability issues between Manifest
348 compression and this specification.
349
350 The compressed Manifest files are required to be suffixed for their
351 compression algorithm. This suffix should be used to recognize
352 the compression and decompress Manifests transparently. The exact list
353 of algorithms and their corresponding suffixes are outside the scope
354 of this specification.
355
356 Whenever this specification refers to top-level Manifest file,
357 the implementation should account for compressed variants of this file
358 with appropriate suffixes (e.g. ``Manifest.gz``).
359
360 Whenever this specification refers to sub-Manifests, they can use any
361 names but are also required to use a specific compression suffix.
362 The ``MANIFEST`` entries are required to specify the full name including
363 compression suffix, and the verification is performed on the compressed
364 file.
365
366 The specification permits uncompressed Manifests to exist alongside
367 their compressed counterparts, and multiple compressed formats
368 to coexist. If that is the case, the files must have the same
369 uncompressed content and the specification is free to choose either
370 of the files using the same base name.
371
372
373 An example Manifest file (informational)
374 ----------------------------------------
375
376 An example top-level Manifest file for the Gentoo repository would have
377 the following content::
378
379 TIMESTAMP 2017-10-30T10:11:12Z
380 IGNORE distfiles
381 IGNORE local
382 IGNORE lost+found
383 IGNORE packages
384 MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
385 ...
386 MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
387 ...
388
389 An example modern Manifest (disregarding backwards compatibility)
390 for a package directory would have the following content::
391
392 DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
393 DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
394 DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
395 DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
396 DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
397 DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
398 DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
399
400
401 Rationale
402 =========
403
404 Stand-alone format
405 ------------------
406
407 The first question that needed to be asked before proceeding with
408 the design was whether the Manifest file format was supposed to be
409 stand-alone, or tightly bound to the repository format.
410
411 The stand-alone format has been selected because of its three
412 advantages:
413
414 1. It is more future-proof. If an incompatible change to the repository
415 format is introduced, only developers need to be upgrade the tools
416 they use to generate the Manifests. The tools used to verify
417 the updated Manifests will continue to work.
418
419 2. It is more flexible and universal. With a dedicated tool,
420 the Manifest files can be used to sign and verify arbitrary file
421 sets.
422
423 3. It keeps the verification tool simpler. In particular, we can easily
424 write an independent verification tool that could work on any
425 distribution without needing to depend on a package manager
426 implementation or rewrite parts of it.
427
428 Designing a stand-alone format requires that the Manifest carries enough
429 information to perform the verification following all the rules specific
430 to the Gentoo repository.
431
432
433 Tree design
434 -----------
435
436 The second important point of the design was determining whether
437 the Manifest files should be structured hierarchically, or independent.
438 Both options have their advantages.
439
440 In the hierarchical model, each sub-Manifest file is covered by a higher
441 level Manifest. As a result, only the top-level Manifest has to be
442 OpenPGP-signed, and subsequent Manifests need to be only verified by
443 checksum stored in the parent Manifest. This has the following
444 implications:
445
446 - Verifying any set of files in the repository requires using checksums
447 from the most relevant Manifests and the parent Manifests.
448
449 - The OpenPGP signature of the top-level Manifest needs to be verified
450 only once per process.
451
452 - Altering any set of files requires updating the relevant Manifests,
453 and their parent Manifests up to the top-level Manifest, and signing
454 the last one.
455
456 - As a result, the top-level Manifest changes on every commit,
457 and various middle-level Manifests change (and need to be transferred)
458 frequently.
459
460 In the independent model, each sub-Manifest file is independent
461 of the parent Manifests. As a result, each of them needs to be signed
462 and verified independently. However, the parent Manifests still need
463 to list sub-Manifests (albeit without verification data) in order
464 to detect removal or replacement of subdirectories. This has
465 the following implications:
466
467 - Verifying any set of files in the repository requires using checksums
468 and verifying signatures of the most relevant Manifest files.
469
470 - Altering any set of files requires updating the relevant Manifests
471 and signing them again.
472
473 - Parent Manifests are updated only when Manifests are added or removed
474 from subdirectories. As a result, they change infrequently.
475
476 While both models have their advantages, the hierarchical model was
477 selected because it reduces the number of OpenPGP operations
478 which are comparatively costly to the minimum.
479
480
481 Tree layout restrictions
482 ------------------------
483
484 The algorithm is meant to work primarily with ebuild repositories which
485 normally contain only files and directories. Directories provide
486 no useful metadata for verification, and specifying special entries
487 for additional file types is purposeless. Therefore, the specification
488 is restricted to dealing with regular files.
489
490 The Gentoo repository does not use symbolic links. Some Gentoo
491 repositories do, however. To provide a simple solution for dealing with
492 symlinks without having to take care to implement special handling for
493 them, the common behavior of implicitly resolving them is used.
494 Therefore, symbolic links to files are stored as if they were regular
495 files, and symbolic links to directories are followed as if they were
496 regular directories.
497
498 Dotfiles are implicitly ignored as that is a common notion used
499 in software written for POSIX systems. All other filenames require
500 explicit ``IGNORE`` lines.
501
502 An ability to inject additional ignore entries is provided to account
503 for site configuration affecting the repository tree -- placing
504 additional files in it, skipping some of the categories from syncing.
505 This configuration can extend beyond the limits of this GLEP,
506 e.g. by allowing wildcards or regular expressions.
507
508 The algorithm is restricted to work on a single filesystem. This is
509 mostly relevant when scanning for top-level Manifest -- we do not want
510 to cross filesystem boundaries then. However, to ensure consistent
511 bidirectional behavior we need to also ban them when operating downwards
512 the tree.
513
514 The directories and files on different filesystems need to be ignored
515 explicitly as implicitly skipping them would cause confusion.
516 In particular, tools might then claim that a file does not exist when
517 it clearly does because it was skipped due to filesystem boundaries.
518
519
520 File verification model
521 -----------------------
522
523 The verification model aims to provide full coverage against different
524 forms of attack. In particular, three different kinds of manipulation
525 are considered:
526
527 1. Alteration of the file content.
528
529 2. Removal of a file.
530
531 3. Addition of a new file.
532
533 In order to prevent against all three, the system requires that all
534 files in the repository are listed in Manifests and verified against
535 them.
536
537 As a special case, ignores are allowed to account for directories
538 that are not part of the repository but were traditionally placed inside
539 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
540 could be also used to ignore VCS directories such as ``CVS``.
541
542
543 Non-strict Manifest verification
544 --------------------------------
545
546 Originally the Manifest2 format provided a special ``MISC`` tag that
547 was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
548 indicated that the Manifest verification failures could be ignored for
549 those files unless the package manager was working in strict mode.
550
551 The first versions of this specification continued the use of this tag.
552 However, after a long debate it was decided to deprecate it along with
553 the non-strict behavior, and require all files to strictly match.
554
555 Two arguments were mentioned for the usefulness of a ``MISC`` type:
556
557 1. being able to reduce the checkout size by stripping unnecessary
558 files out, and
559
560 2. being able to run update automatically generated files locally
561 without causing unnecessary verification failures.
562
563 However, the usefulness of ``MISC`` in both cases is doubtful.
564
565 The cases for stripping unnecessary files mostly focused around space
566 savings. For this purpose, stripping ``metadata.xml`` and similar files
567 has little value. It is much more common for users to strip whole
568 packages or categories. The ``MISC`` type is not suitable for that,
569 and so a dedicated package manager mechanism needs to be developed
570 instead. The same mechanism can also handle files that historically used
571 the ``MISC`` type. As an example, the package manager may choose
572 to generate both the rsync exclusion list and Manifest ignore list
573 using a single source list.
574
575 The cases for autogenerated files involve such cache files
576 as ``use.local.desc``. However, we can not include ``md5-cache`` there
577 due to security concerns which results in inconsistent cache handling.
578 Furthermore, the tools were historically modified to provide stable
579 output which means that their content can not change without
580 a non-``MISC`` content being changed first. This practically defeats
581 the purpose of using ``MISC``.
582
583 Finally, the non-strict mode could be used as means to an attack.
584 The allowance of missing or modified documentation file could be used
585 to spread misinformation, resulting in bad decisions made by the user.
586 A modified file could also be used e.g. to exploit vulnerabilities
587 of an XML parser.
588
589
590 Timestamp field
591 ---------------
592
593 The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
594 to include a generation timestamp in the Manifest. A similar feature
595 was originally proposed in GLEP 58 [#GLEP58]_.
596
597 A malicious third-party may use the principles of exclusion or replay
598 [#C08]_ to deny an update to clients, while at the same time recording
599 the identity of clients to attack. The timestamp field can be used to
600 detect that.
601
602 In order to provide a more complete protection, the Gentoo
603 Infrastructure should provide an ability to obtain the timestamps
604 of all Manifests from a recent timeframe over a secure channel
605 from a trusted source for comparison.
606
607 Strictly speaking, this information is already provided by the various
608 ``metadata/timestamp*`` files that are already present. However,
609 including the value in the Manifest itself has a little cost
610 and provides the ability to perform the verification stand-alone.
611
612 Furthermore, some of the timestamp files are added very late
613 in the distribution process, past the Manifest generation phase. Those
614 files will most likely receive ``IGNORE`` entries and therefore
615 be not suitable to safe use.
616
617
618 New vs deprecated tags
619 ----------------------
620
621 Out of the four types defined by Manifest2, only one is reused
622 and the remaining three is replaced by a single, universal ``DATA``
623 type.
624
625 The ``DIST`` tag is reused since the specification does not change
626 anything with regard to distfile handling.
627
628 The ``EBUILD`` tag could potentially be reused for generic file
629 verification data. However, it would be confusing if all the different
630 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
631 type was introduced as a replacement.
632
633 The ``MISC`` tag and the relevant non-strict mode has been removed
634 as being of little value, as detailed in the `Non-strict Manifest
635 verification`_ section.
636
637 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
638 the limiting property of implicit ``files/`` path prefix.
639
640
641 Finding top-level Manifest
642 --------------------------
643
644 The development of a reference implementation for this GLEP has brought
645 the following problem: how to find all the relevant Manifests when
646 the Manifest tool is run inside a subdirectory of the repository?
647
648 One of the options would be to provide a bi-directional linking
649 of Manifests via a ``PARENT`` tag. However, that would not solve
650 the problem when a new Manifest file is being created.
651
652 Instead, an algorithm for iterating over parent directories is proposed.
653 Since there is no obligatory explicit indicator for the top-level
654 Manifest, the algorithm assumes that the top-level Manifest
655 is the highest ``Manifest`` in the directory hierarchy that can cover
656 the current directory. This generally makes sense since the Manifest
657 files are required to provide coverage for all subdirectories, so all
658 Manifests starting from that one need to be updated.
659
660 If independent Manifest trees are nested in the directory structure,
661 then an ``IGNORE`` entry needs to be used to separate them.
662
663 Since sub-Manifests can use any filenames, the Manifest finding
664 algorithm must not short-cut the procedure by storing all ``Manifest``
665 files along the parent directories. Instead, it needs to retrace
666 the relevant sub-Manifest files along ``MANIFEST`` entries
667 in the top-level Manifest.
668
669
670 Injecting ChangeLogs into the checkout
671 --------------------------------------
672
673 One of the problems considered in the new Manifest format was that
674 of injecting historical and autogenerated ChangeLog into the repository.
675 Normally we are not including those files to reduce the checkout size.
676 However, some users have shown interest in them and Infra is working
677 on providing them via an additional rsync module.
678
679 If such files were injected into the repository, they would cause
680 verification failures of Manifests. To account for this, Infra could
681 provide ``IGNORE`` entries to allow them to exist.
682
683
684 Splitting distfile checksums from file checksums
685 ------------------------------------------------
686
687 Another problem with the current Manifest format is that the checksums
688 for fetched files are combined with checksums for local files
689 in a single file inside the package directory. It has been specifically
690 pointed out that:
691
692 - since distfiles are sometimes reused across different packages,
693 the repeating checksums are redundant [#DIST]_.
694
695 - mirror admins were interested in the possibility of verifying all
696 the distfiles with a single tool.
697
698 This specification does not provide a clean solution to this problem.
699 It technically permits moving ``DIST`` entries to higher-level Manifests
700 but the usefulness of such a solution is doubtful.
701
702 However, for the second problem we will probably deliver a dedicated
703 tool working with this Manifest format.
704
705
706 Hash algorithms
707 ---------------
708
709 While maintaining a consistent supported hash set is important
710 for interoperability, it is no good fit for the generic layout of this
711 GLEP. Furthermore, it would require updating the GLEP in the future
712 every time the used algorithms change.
713
714 Instead, the specification focuses on listing the currently used
715 algorithm names for interoperability, and sets a recommendation
716 for consistent naming of algorithms in the future. The Python
717 ``hashlib`` module is used as a reference since it is used
718 as the provider of hash functions for most of the Python software,
719 including Portage and PkgCore.
720
721 The basic rules for changing hash algorithms are defined in GLEP 59
722 [#GLEP59]_. The implementations can focus only on those algorithms
723 that are actually used or planned on being used. It may be feasible
724 to devise a new GLEP that specifies the currently used hashes (or update
725 GLEP 59 accordingly).
726
727
728 Manifest compression
729 --------------------
730
731 The support for Manifest compression is introduced with minimal changes
732 to the file format. The ``MANIFEST`` entries are required to provide
733 the real (compressed) file path for compatibility with other file
734 entries and to avoid confusion.
735
736 The existence of additional entries for uncompressed Manifest checksums
737 was debated. However, plain entries for the uncompressed file would
738 be confusing if only compressed file existed, and conflicting if both
739 uncompressed and compressed variants existed. Furthermore, it has been
740 pointed out that ``DIST`` entries do not have uncompressed variant
741 either.
742
743
744 Performance considerations
745 --------------------------
746
747 Performing a full-tree verification on every sync raises some
748 performance concerns for end-user systems. The initial testing has shown
749 that a cold-cache verification on a btrfs file system can take up around
750 4 minutes, with the process being mostly I/O bound. On the other hand,
751 it can be expected that the verification will be performed directly
752 after syncing, taking advantage of warm filesystem cache.
753
754 To improve speed on I/O and/or CPU-restrained systems even further,
755 the algorithms can be easily extended to perform incremental
756 verification. Given that rsync does not preserve mtimes by default,
757 the tool can take advantage of mtime and Manifest comparisons to recheck
758 only the parts of the repository that have changed.
759
760 Furthermore, the package manager implementations can restrict checking
761 only to the parts of the repository that are actually being used.
762
763
764 Backwards Compatibility
765 =======================
766
767 This GLEP provides optional means of preserving backwards compatibility.
768 To preserve the backwards compatibility, the following needs to hold
769 for the ``Manifest`` file in every package directory:
770
771 - all files must be covered by the single ``Manifest`` file,
772
773 - all distfiles used by the package must be included,
774
775 - all files inside the ``files/`` subdirectory need to use
776 the ``AUX`` tag (rather than ``DATA``),
777
778 - all ``.ebuild`` files need to use the ``EBUILD`` tag,
779
780 ` the ``metadata.xml`` and ``ChangeLog`` files need to use
781 the ``MISC`` tag,
782
783 - the Manifest can be signed to provide authenticity verification,
784
785 - an uncompressed Manifest must always exist, and a compressed Manifest
786 of identical content may be present.
787
788 Once the backwards compatibility is no longer a concern, the above
789 no longer needs to hold and the deprecated tags can be removed.
790
791
792 Reference Implementation
793 ========================
794
795 The reference implementation for this GLEP is being developed
796 as the gemato project [#GEMATO]_.
797
798
799 Credits
800 =======
801
802 Thanks to all the people whose contributions were invaluable
803 to the creation of this GLEP. This includes but is not limited to:
804
805 - Robin Hugh Johnson,
806 - Ulrich Müller.
807
808 Additionally, thanks to Robin Hugh Johnson for the original
809 MataManifest GLEP series which served both as inspiration and source
810 of many concepts used in this GLEP. Recursively, also thanks to all
811 the people who contributed to the original GLEPs.
812
813
814 References
815 ==========
816
817 .. [#GLEP44] GLEP 44: Manifest2 format
818 (https://www.gentoo.org/glep/glep-0044.html)
819
820 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
821 - Overview
822 (https://www.gentoo.org/glep/glep-0057.html)
823
824 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
825 - Infrastructure to User distribution - MetaManifest
826 (https://www.gentoo.org/glep/glep-0058.html)
827
828 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
829 (https://www.gentoo.org/glep/glep-0059.html)
830
831 .. [#GLEP60] GLEP 60: Manifest2 filetypes
832 (https://www.gentoo.org/glep/glep-0060.html)
833
834 .. [#GLEP61] GLEP 61: Manifest2 compression
835 (https://www.gentoo.org/glep/glep-0061.html)
836
837 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
838 Format - SRC_URI
839 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
840
841 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
842 (https://www.ietf.org/rfc/rfc1321.txt)
843
844 .. [#RIPEMD160] The hash function RIPEMD-160
845 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
846
847 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
848 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
849
850 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
851 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
852
853 .. [#BLAKE2] BLAKE2 -- fast secure hashing
854 (https://blake2.net/)
855
856 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
857 and Extendable-Output Functions
858 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
859
860 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
861 (https://www.streebog.net/)
862
863 .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
864 (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
865
866 .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
867 at the time of writing are duplicate, representing a 2 MiB
868 out of 25 MiB of DIST entries altogether.
869
870 .. [#GEMATO] gemato: Gentoo Manifest Tool
871 (https://github.com/mgorny/gemato/)
872
873 Copyright
874 =========
875 This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
876 Unported License. To view a copy of this license, visit
877 http://creativecommons.org/licenses/by-sa/3.0/.
878
879 --
880 Best regards,
881 Michał Górny