Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
Date: Thu, 02 Nov 2017 19:12:11
Message-Id: 1509649919.21210.12.camel@gentoo.org
In Reply to: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files by "Michał Górny"
1 Next version. Now without MISC/OPTIONAL, and with many clarifications.
2
3 W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
4 napisał:
5 > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
6 > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
7 > impl: https://github.com/mgorny/gemato/
8
9 ---
10 GLEP: 74
11 Title: Full-tree verification using Manifest files
12 Author: Michał Górny <mgorny@g.o>,
13 Robin Hugh Johnson <robbat2@g.o>,
14 Ulrich Müller <ulm@g.o>
15 Type: Standards Track
16 Status: Draft
17 Version: 1
18 Created: 2017-10-21
19 Last-Modified: 2017-10-30
20 Post-History: 2017-10-26
21 Content-Type: text/x-rst
22 Requires: 59, 61
23 Replaces: 44, 58, 60
24 ---
25
26 Abstract
27 ========
28
29 This GLEP extends the Manifest file format to cover full-tree file
30 integrity and authenticity checks.The format aims to be future-proof,
31 efficient and provide means of backwards compatibility.
32
33
34 Motivation
35 ==========
36
37 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
38 means of verifying the integrity of distfiles and package files
39 in Gentoo. Combined with OpenPGP signatures, they provide means to
40 ensure the authenticity of the covered files. However, as noted
41 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
42 authenticity verification as they do not cover any files outside
43 the package directory. In particular, they provide multiple ways
44 for a third party to inject malicious code into the ebuild environment.
45
46 Historically, the topic of providing authenticity coverage for the whole
47 repository has been mentioned multiple times. The most noteworthy effort
48 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
49 They were accepted by the Council in 2010 but have never been
50 implemented. When potential implementation work started in 2017, a new
51 discussion about the specification arose. It prompted the creation
52 of a competing GLEP that would provide a redesigned alternative to
53 the old GLEPs.
54
55 This specification is designed with the following goals in mind:
56
57 1. It should provide means to ensure the authenticity of the complete
58 repository, including preventing the injection of additional files.
59
60 2. The format should be universal enough to work both for the Gentoo
61 repository and third-party repositories of different characteristics.
62
63 3. The Manifest files should be verifiable stand-alone, that is without
64 knowing any details about the underlying repository format.
65
66
67 Specification
68 =============
69
70 Manifest file format
71 --------------------
72
73 This specification reuses and extends the Manifest file format defined
74 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
75 repurposed as a generic *tag* that could also indicate additional
76 (non-checksum) metadata. Appropriately, those tags can be followed by
77 other space-separated values.
78
79 Unless specified otherwise, the paths used in the Manifest files
80 are relative to the directory containing the Manifest file. The paths
81 must not reference the parent directory (``..``).
82
83
84 Manifest file locations and nesting
85 -----------------------------------
86
87 The ``Manifest`` file located in the root directory of the repository
88 is called top-level Manifest, and it is used to perform the full-tree
89 verification. In order to verify the authenticity, it must be signed
90 using OpenPGP, using the armored cleartext format.
91
92 The top-level Manifest may reference sub-Manifests contained
93 in subdirectories of the repository. The sub-Manifests are traditionally
94 named ``Manifest``; however, the implementation must support arbitrary
95 names, including the possibility of multiple (split) Manifests
96 for a single directory. The sub-Manifest can only cover the files inside
97 the directory tree where it resides.
98
99 The sub-Manifest can also be signed using OpenPGP armored cleartext
100 format. However, the signature verification can be omitted if it is
101 covered by a signed top-level Manifest.
102
103
104 Directory tree coverage
105 -----------------------
106
107 The specification provides three ways of skipping Manifest verification
108 of specific files and directories (recursively):
109
110 1. explicit ``IGNORE`` entries in Manifest files,
111
112 2. injected ignore paths via package manager configuration,
113
114 3. using names starting with a dot (``.``) which are always skipped.
115
116 All files that are not ignored must be covered by at least one
117 of the Manifests.
118
119 A single file may be matched by multiple identical or equivalent
120 Manifest entries, if and only if the entries have the same semantics,
121 specify the same size and the checksums common to both entries match.
122 It is an error for a single file to be matched by multiple entries
123 of different semantics, file size or checksum values. It is an error
124 to specify another entry for a file matching ``IGNORE``, or one of its
125 subdirectories.
126
127 The file entries (except for ``IGNORE``) can be specified for regular
128 files only. Symbolic links are followed when opening files
129 and traversing directories. It is an error to specify an entry for
130 a different file type. If the tree contain files of other types
131 that are not otherwise ignored, they need to be covered by an explicit
132 ``IGNORE``.
133
134 All the local (non-``DIST``) files covered by a Manifest tree must
135 reside on the same filesystem. It is an error to specify entries
136 applying to files on another filesystem. If subdirectories
137 that are not otherwise ignored reside on a different filesystem, they
138 must be explicitly excluded via ``IGNORE``.
139
140
141 File verification
142 -----------------
143
144 When verifying a file against the Manifest, the following rules are
145 used:
146
147 1. If the file is covered directly or indirectly by an entry
148 of the ``IGNORE`` type, the verification always succeeds.
149
150 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
151 ``MISC``, ``EBUILD`` or ``AUX`` type:
152
153 a. if the file is not present, then the verification fails,
154
155 b. if the file is present but has a different size or one
156 of the checksums does not match, the verification fails,
157
158 c. otherwise, the verification succeeds.
159
160 3. If the file is present but not listed in Manifest, the verification
161 fails.
162
163 Unless specified otherwise, the package manager must not allow using
164 any files for which the verification failed. The package manager may
165 reject any package or even the whole repository if it may refer to files
166 for which the verification failed.
167
168
169 Timestamp verification
170 ----------------------
171
172 The Manifest file can contain a ``TIMESTAMP`` entry to account
173 for attacks against tree update distribution. If such an entry
174 is present, it should be updated every time at least one
175 of the Manifests changes. Every unique timestamp value must correspond
176 to a single tree state.
177
178 During the verification process, the client should compare the timestamp
179 against the update time obtained from a local clock or a trusted time
180 source. If the comparison result indicates that the Manifest at the time
181 of receiving was already significantly outdated, the client should
182 either fail the verification or require manual confirmation from user.
183
184 Furthermore, the Manifest provider may employ additional methods
185 of distributing the timestamps of recently generated Manifests
186 using a secure channel from a trusted source for exact comparison.
187 The exact details of such a solution are outside the scope of this
188 specification.
189
190
191 Modern Manifest tags
192 --------------------
193
194 The Manifest files can specify the following tags:
195
196 ``TIMESTAMP <iso8601>``
197 Specifies a timestamp of when the Manifest file was last updated.
198 The timestamp must be a valid second-precision ISO8601 extended format
199 combined date and time in UTC timezone, i.e. using the following
200 ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
201 in the top-level Manifest file. The package manager can use it
202 to detect an outdated repository checkout as described in `Timestamp
203 verification`_.
204
205 ``MANIFEST <path> <size> <checksums>…``
206 Specifies a sub-Manifest. The sub-Manifest must be verified like
207 a regular file. If the verification succeeds, the entries from
208 the sub-Manifest are included for verification as described
209 in `Manifest file locations and nesting`_.
210
211 ``IGNORE <path>``
212 Ignores a subdirectory or file from Manifest checks. If the specified
213 path is present, it and its contents are omitted from the Manifest
214 verification (always pass). *Path* must be a plain file or directory
215 path without a trailing slash, and must not contain wildcards.
216
217 ``DATA <path> <size> <checksums>…``
218 Specifies a regular file subject to Manifest verification. The file
219 is required to pass verification. Used for all files that do not match
220 any other type.
221
222 ``DIST <filename> <size> <checksums>…``
223 Specifies a distfile entry used to verify files fetched as part
224 of ``SRC_URI``. The filename must match the filename used to store
225 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
226 manager must reject the fetched file if it fails verification.
227 ``DIST`` entries apply to all packages below the Manifest file
228 specifying them.
229
230
231 Deprecated Manifest tags
232 ------------------------
233
234 For backwards compatibility, the following tags are additionally
235 allowed at the package directory level:
236
237 ``EBUILD <filename> <size> <checksums>…``
238 Equivalent to the ``DATA`` type.
239
240 ``MISC <path> <size> <checksums>…``
241 Equivalent to the ``DATA`` type. Historically indicated that
242 the package manager may ignore a verification failure if operating
243 in non-strict mode. However, that behavior is deprecated.
244
245 ``AUX <filename> <size> <checksums>…``
246 Equivalent to the ``DATA`` type, except that the filename is relative
247 to ``files/`` subdirectory.
248
249
250 Algorithm for full-tree verification
251 ------------------------------------
252
253 In order to perform full-tree verification, the following algorithm
254 can be used:
255
256 1. Collect all files present in the repository into *present* set.
257
258 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
259 Optionally verify the ``TIMESTAMP`` entry if present as specified
260 in `timestamp verification`. Remove the top-level Manifest
261 from the *present* set.
262
263 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
264 files according to `file verification`_ section, and include their
265 entries in the current Manifest entry list (using paths relative
266 to directories containing the Manifests).
267
268 4. Process all ``IGNORE`` entries. Remove any paths matching them
269 from the *present* set.
270
271 5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
272 and ``AUX`` entries into the *covered* set.
273
274 6. Verify the entries in *covered* set for incompatible duplicates
275 and collisions with ignored files as explained in `Manifest file
276 locations and nesting`_.
277
278 7. Verify all the files in the union of the *present* and *covered*
279 sets, according to `file verification`_ section.
280
281
282 Algorithm for finding parent Manifests
283 --------------------------------------
284
285 In order to find the top-level Manifest from the current directory
286 the following algorithm can be used:
287
288 1. Store the current directory as *original* and the device ID
289 of the containing filesystem (``st_dev``) as *startdev*,
290
291 2. If the device ID of the containing filesystem (``st_dev``)
292 of the current directory is different than *startdev*, stop.
293
294 3. If the current directory contains a ``Manifest`` file:
295
296 a. If a ``IGNORE`` entry in the ``Manifest`` file covers
297 the *original* directory (or one of the parent directories), stop.
298
299 b. Otherwise, store the current directory as *last_found*.
300
301 4. If the current directory is the root system directory (``/``), stop.
302
303 5. Otherwise, enter the parent directory and jump to step 2.
304
305 Once the algorithm stops, *last_found* will contain the relevant
306 top-level Manifest. If *last_found* is null, then the directory tree
307 does not contain any valid top-level Manifest candidates and one should
308 be created in the *original* directory.
309
310 Once the top-level Manifest is found, its ``MANIFEST`` entries should
311 be used to find any sub-Manifests below the top-level Manifest,
312 up to and including the *original* directory. Note that those
313 sub-Manifests can use different filenames than ``Manifest``.
314
315
316 Checksum algorithms
317 -------------------
318
319 This section is informational only. Specifying the exact set
320 of supported algorithms is outside the scope of this specification.
321
322 The algorithm names reserved at the time of writing are:
323
324 - ``MD5`` [#MD5]_,
325 - ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
326 - ``SHA1`` [#SHS]_,
327 - ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
328 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
329 - ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
330 - ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
331 - ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
332 [#STREEBOG]_.
333
334 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
335 It is recommended that any new hashes are named after the Python
336 ``hashlib`` module algorithm names, transformed into uppercase.
337
338
339 Manifest compression
340 --------------------
341
342 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
343 This section merely addresses interoperability issues between Manifest
344 compression and this specification.
345
346 The compressed Manifest files are required to be suffixed for their
347 compression algorithm. This suffix should be used to recognize
348 the compression and decompress Manifests transparently. The exact list
349 of algorithms and their corresponding suffixes are outside the scope
350 of this specification.
351
352 Whenever this specification refers to top-level Manifest file,
353 the implementation should account for compressed variants of this file
354 with appropriate suffixes (e.g. ``Manifest.gz``).
355
356 Whenever this specification refers to sub-Manifests, they can use any
357 names but are also required to use a specific compression suffix.
358 The ``MANIFEST`` entries are required to specify the full name including
359 compression suffix, and the verification is performed on the compressed
360 file.
361
362 The specification permits uncompressed Manifests to exist alongside
363 their compressed counterparts, and multiple compressed formats
364 to coexist. If that is the case, the files must have the same
365 uncompressed content and the specification is free to choose either
366 of the files using the same base name.
367
368
369 An example Manifest file (informational)
370 ----------------------------------------
371
372 An example top-level Manifest file for the Gentoo repository would have
373 the following content::
374
375 TIMESTAMP 2017-10-30T10:11:12Z
376 IGNORE distfiles
377 IGNORE local
378 IGNORE lost+found
379 IGNORE packages
380 MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
381
382 MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
383
384
385 An example modern Manifest (disregarding backwards compatibility)
386 for a package directory would have the following content::
387
388 DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
389 DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
390 DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
391 DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
392 DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
393 DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
394 DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
395
396
397 Rationale
398 =========
399
400 Stand-alone format
401 ------------------
402
403 The first question that needed to be asked before proceeding with
404 the design was whether the Manifest file format was supposed to be
405 stand-alone, or tightly bound to the repository format.
406
407 The stand-alone format has been selected because of its three
408 advantages:
409
410 1. It is more future-proof. If an incompatible change to the repository
411 format is introduced, only developers need to be upgrade the tools
412 they use to generate the Manifests. The tools used to verify
413 the updated Manifests will continue to work.
414
415 2. It is more flexible and universal. With a dedicated tool,
416 the Manifest files can be used to sign and verify arbitrary file
417 sets.
418
419 3. It keeps the verification tool simpler. In particular, we can easily
420 write an independent verification tool that could work on any
421 distribution without needing to depend on a package manager
422 implementation or rewrite parts of it.
423
424 Designing a stand-alone format requires that the Manifest carries enough
425 information to perform the verification following all the rules specific
426 to the Gentoo repository.
427
428
429 Tree design
430 -----------
431
432 The second important point of the design was determining whether
433 the Manifest files should be structured hierarchically, or independent.
434 Both options have their advantages.
435
436 In the hierarchical model, each sub-Manifest file is covered by a higher
437 level Manifest. As a result, only the top-level Manifest has to be
438 OpenPGP-signed, and subsequent Manifests need to be only verified by
439 checksum stored in the parent Manifest. This has the following
440 implications:
441
442 - Verifying any set of files in the repository requires using checksums
443 from the most relevant Manifests and the parent Manifests.
444
445 - The OpenPGP signature of the top-level Manifest needs to be verified
446 only once per process.
447
448 - Altering any set of files requires updating the relevant Manifests,
449 and their parent Manifests up to the top-level Manifest, and signing
450 the last one.
451
452 - As a result, the top-level Manifest changes on every commit,
453 and various middle-level Manifests change (and need to be transferred)
454 frequently.
455
456 In the independent model, each sub-Manifest file is independent
457 of the parent Manifests. As a result, each of them needs to be signed
458 and verified independently. However, the parent Manifests still need
459 to list sub-Manifests (albeit without verification data) in order
460 to detect removal or replacement of subdirectories. This has
461 the following implications:
462
463 - Verifying any set of files in the repository requires using checksums
464 and verifying signatures of the most relevant Manifest files.
465
466 - Altering any set of files requires updating the relevant Manifests
467 and signing them again.
468
469 - Parent Manifests are updated only when Manifests are added or removed
470 from subdirectories. As a result, they change infrequently.
471
472 While both models have their advantages, the hierarchical model was
473 selected because it reduces the number of OpenPGP operations
474 which are comparatively costly to the minimum.
475
476
477 Tree layout restrictions
478 ------------------------
479
480 The algorithm is meant to work primarily with ebuild repositories which
481 normally contain only files and directories. Directories provide
482 no useful metadata for verification, and specifying special entries
483 for additional file types is purposeless. Therefore, the specification
484 is restricted to dealing with regular files.
485
486 The Gentoo repository does not use symbolic links. Some Gentoo
487 repositories do, however. To provide a simple solution for dealing with
488 symlinks without having to take care to implement special handling for
489 them, the common behavior of implicitly resolving them is used.
490 Therefore, symbolic links to files are stored as if they were regular
491 files, and symbolic links to directories are followed as if they were
492 regular directories.
493
494 Dotfiles are implicitly ignored as that is a common notion used
495 in software written for POSIX systems. All other common filenames
496 require explicit ``IGNORE`` lines.
497
498 An ability to inject additional ignore entries is provided to account
499 for site configuration affecting the repository tree — placing
500 additional files in it, skipping some of the categories from syncing.
501
502 The algorithm is restricted to work on a single filesystem. This is
503 mostly relevant when scanning for top-level Manifest — we do not want
504 to cross filesystem boundaries then. However, to ensure consistent
505 bidirectional behavior we need to also ban them when operating downwards
506 the tree.
507
508 The directories and files on different filesystems need to be ignored
509 explicitly as implicitly skipping them would cause confusion.
510 In particular, tools might then claim that a file does not exist when
511 it clearly does because it was skipped due to filesystem boundaries.
512
513
514 File verification model
515 -----------------------
516
517 The verification model aims to provide full coverage against different
518 forms of attack. In particular, three different kinds of manipulation
519 are considered:
520
521 1. Alteration of the file content.
522
523 2. Removal of a file.
524
525 3. Addition of a new file.
526
527 In order to prevent against all three, the system requires that all
528 files in the repository are listed in Manifests and verified against
529 them.
530
531 As a special case, ignores are allowed to account for directories
532 that are not part of the repository but were traditionally placed inside
533 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
534 could be also used to ignore VCS directories such as ``CVS``.
535
536
537 Non-strict Manifest verification
538 --------------------------------
539
540 Originally the Manifest2 format provided a special ``MISC`` tag that
541 was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
542 indicated that the Manifest verification failures could be ignored for
543 those files unless the package manager was working in strict mode.
544
545 The first versions of this specification continued the use of this tag.
546 However, after a long debate it was decided to deprecate it along with
547 the non-strict behavior, and require all files to strictly match.
548
549 Two arguments were mentioned for the usefulness of a ``MISC`` type:
550
551 1. being able to reduce the checkout size by stripping unnecessary
552 files out, and
553
554 2. being able to run update automatically generated files locally
555 without causing unnecessary verification failures.
556
557 However, the usefulness of ``MISC`` in both cases is doubtful.
558
559 The cases for stripping unnecessary files mostly focused around space
560 savings. For this purpose, stripping ``metadata.xml`` and similar files
561 has little value. It is much more common for users to strip whole
562 categories which can not be handled via the ``MISC`` type, and needs
563 a dedicated package manager mechanism. The same mechanism can also
564 handle files that used the ``MISC`` type.
565
566 The cases for autogenerated files involve such cache files
567 as ``use.local.desc``. However, we can not include ``md5-cache`` there
568 due to security concerns which results in inconsistent cache handling.
569 Furthermore, the tools were historically modified to provide stable
570 output which means that their content can not change without
571 a non-``MISC`` content being changed first. This practically defeats
572 the purpose of using ``MISC``.
573
574 Finally, the non-strict mode could be used as means to an attack.
575 The allowance of missing or modified documentation file could be used
576 to spread misinformation, resulting in bad decisions made by the user.
577 A modified file could also be used e.g. to exploit vulnerabilities
578 of an XML parser.
579
580
581 Timestamp field
582 ---------------
583
584 The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
585 to include a generation timestamp in the Manifest. A similar feature
586 was originally proposed in GLEP 58 [#GLEP58]_.
587
588 A malicious third-party may use the principles of exclusion or replay
589 [#C08]_ to deny an update to clients, while at the same time recording
590 the identity of clients to attack. The timestamp field can be used to
591 detect that.
592
593 In order to provide a more complete protection, the Gentoo
594 Infrastructure should provide an ability to obtain the timestamps
595 of all Manifests from a recent timeframe over a secure channel
596 from a trusted source for comparison.
597
598 Strictly speaking, this information is already provided by the various
599 ``metadata/timestamp*`` files that are already present. However,
600 including the value in the Manifest itself has a little cost
601 and provides the ability to perform the verification stand-alone.
602
603 Furthermore, some of the timestamp files are added very late
604 in the distribution process, past the Manifest generation phase. Those
605 files will most likely receive ``IGNORE`` entries and therefore
606 be not suitable to safe use.
607
608
609 New vs deprecated tags
610 ----------------------
611
612 Out of the four types defined by Manifest2, only one is reused
613 and the remaining three is replaced by a single, universal ``DATA``
614 type.
615
616 The ``DIST`` tag is reused since the specification does not change
617 anything with regard to distfile handling.
618
619 The ``EBUILD`` tag could potentially be reused for generic file
620 verification data. However, it would be confusing if all the different
621 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
622 type was introduced as a replacement.
623
624 The ``MISC`` tag and the relevant non-strict mode has been removed
625 as being of little value, as detailed in the `Non-strict Manifest
626 verification`_ section.
627
628 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
629 the limiting property of implicit ``files/`` path prefix.
630
631
632 Finding top-level Manifest
633 --------------------------
634
635 The development of a reference implementation for this GLEP has brought
636 the following problem: how to find all the relevant Manifests when
637 the Manifest tool is run inside a subdirectory of the repository?
638
639 One of the options would be to provide a bi-directional linking
640 of Manifests via a ``PARENT`` tag. However, that would not solve
641 the problem when a new Manifest file is being created.
642
643 Instead, an algorithm for iterating over parent directories is proposed.
644 Since there is no obligatory explicit indicator for the top-level
645 Manifest, the algorithm assumes that the top-level Manifest
646 is the highest ``Manifest`` in the directory hierarchy that can cover
647 the current directory. This generally makes sense since the Manifest
648 files are required to provide coverage for all subdirectories, so all
649 Manifests starting from that one need to be updated.
650
651 If independent Manifest trees are nested in the directory structure,
652 then an ``IGNORE`` entry needs to be used to separate them.
653
654 Since sub-Manifests can use any filenames, the Manifest finding
655 algorithm must not short-cut the procedure by storing all ``Manifest``
656 files along the parent directories. Instead, it needs to retrace
657 the relevant sub-Manifest files along ``MANIFEST`` entries
658 in the top-level Manifest.
659
660
661 Injecting ChangeLogs into the checkout
662 --------------------------------------
663
664 One of the problems considered in the new Manifest format was that
665 of injecting historical and autogenerated ChangeLog into the repository.
666 Normally we are not including those files to reduce the checkout size.
667 However, some users have shown interest in them and Infra is working
668 on providing them via an additional rsync module.
669
670 If such files were injected into the repository, they would cause
671 verification failures of Manifests. To account for this, Infra could
672 provide ``IGNORE`` entries to allow them to exist.
673
674
675 Splitting distfile checksums from file checksums
676 ------------------------------------------------
677
678 Another problem with the current Manifest format is that the checksums
679 for fetched files are combined with checksums for local files
680 in a single file inside the package directory. It has been specifically
681 pointed out that:
682
683 - since distfiles are sometimes reused across different packages,
684 the repeating checksums are redundant,
685
686 - mirror admins were interested in the possibility of verifying all
687 the distfiles with a single tool.
688
689 This specification does not provide a clean solution to this problem.
690 It technically permits moving ``DIST`` entries to higher-level Manifests
691 but the usefulness of such a solution is doubtful.
692
693 However, for the second problem we will probably deliver a dedicated
694 tool working with this Manifest format.
695
696
697 Hash algorithms
698 ---------------
699
700 While maintaining a consistent supported hash set is important
701 for interoperability, it is no good fit for the generic layout of this
702 GLEP. Furthermore, it would require updating the GLEP in the future
703 every time the used algorithms change.
704
705 Instead, the specification focuses on listing the currently used
706 algorithm names for interoperability, and sets a recommendation
707 for consistent naming of algorithms in the future. The Python
708 ``hashlib`` module is used as a reference since it is used
709 as the provider of hash functions for most of the Python software,
710 including Portage and PkgCore.
711
712 The basic rules for changing hash algorithms are defined in GLEP 59
713 [#GLEP59]_. The implementations can focus only on those algorithms
714 that are actually used or planned on being used. It may be feasible
715 to devise a new GLEP that specifies the currently used hashes (or update
716 GLEP 59 accordingly).
717
718
719 Manifest compression
720 --------------------
721
722 The support for Manifest compression is introduced with minimal changes
723 to the file format. The ``MANIFEST`` entries are required to provide
724 the real (compressed) file path for compatibility with other file
725 entries and to avoid confusion.
726
727 The existence of additional entries for uncompressed Manifest checksums
728 was debated. However, plain entries for the uncompressed file would
729 be confusing if only compressed file existed, and conflicting if both
730 uncompressed and compressed variants existed. Furthermore, it has been
731 pointed out that ``DIST`` entries do not have uncompressed variant
732 either.
733
734
735 Performance considerations
736 --------------------------
737
738 Performing a full-tree verification on every sync raises some
739 performance concerns for end-user systems. The initial testing has shown
740 that a cold-cache verification on a btrfs file system can take up around
741 4 minutes, with the process being mostly I/O bound. On the other hand,
742 it can be expected that the verification will be performed directly
743 after syncing, taking advantage of warm filesystem cache.
744
745 To improve speed on I/O and/or CPU-restrained systems even further,
746 the algorithms can be easily extended to perform incremental
747 verification. Given that rsync does not preserve mtimes by default,
748 the tool can take advantage of mtime and Manifest comparisons to recheck
749 only the parts of the repository that have changed.
750
751 Furthermore, the package manager implementations can restrict checking
752 only to the parts of the repository that are actually being used.
753
754
755 Backwards Compatibility
756 =======================
757
758 This GLEP provides optional means of preserving backwards compatibility.
759 To preserve the backwards compatibility, the following needs to hold
760 for the ``Manifest`` file in every package directory:
761
762 - all files must be covered by the single ``Manifest`` file,
763
764 - all distfiles used by the package must be included,
765
766 - all files inside the ``files/`` subdirectory need to use
767 the ``AUX`` tag (rather than ``DATA``),
768
769 - all ``.ebuild`` files need to use the ``EBUILD`` tag,
770
771 ` the ``metadata.xml`` and ``ChangeLog`` files need to use
772 the ``MISC`` tag,
773
774 - the Manifest can be signed to provide authenticity verification,
775
776 - an uncompressed Manifest must always exist, and a compressed Manifest
777 of identical content may be present.
778
779 Once the backwards compatibility is no longer a concern, the above
780 no longer needs to hold and the deprecated tags can be removed.
781
782
783 Reference Implementation
784 ========================
785
786 The reference implementation for this GLEP is being developed
787 as the gemato project [#GEMATO]_.
788
789
790 Credits
791 =======
792
793 Thanks to all the people whose contributions were invaluable
794 to the creation of this GLEP. This includes but is not limited to:
795
796 - Robin Hugh Johnson,
797 - Ulrich Müller.
798
799 Additionally, thanks to Robin Hugh Johnson for the original
800 MataManifest GLEP series which served both as inspiration and source
801 of many concepts used in this GLEP. Recursively, also thanks to all
802 the people who contributed to the original GLEPs.
803
804
805 References
806 ==========
807
808 .. [#GLEP44] GLEP 44: Manifest2 format
809 (https://www.gentoo.org/glep/glep-0044.html)
810
811 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
812 - Overview
813 (https://www.gentoo.org/glep/glep-0057.html)
814
815 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
816 - Infrastructure to User distribution - MetaManifest
817 (https://www.gentoo.org/glep/glep-0058.html)
818
819 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
820 (https://www.gentoo.org/glep/glep-0059.html)
821
822 .. [#GLEP60] GLEP 60: Manifest2 filetypes
823 (https://www.gentoo.org/glep/glep-0060.html)
824
825 .. [#GLEP61] GLEP 61: Manifest2 compression
826 (https://www.gentoo.org/glep/glep-0061.html)
827
828 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
829 Format - SRC_URI
830 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
831
832 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
833 (https://www.ietf.org/rfc/rfc1321.txt)
834
835 .. [#RIPEMD160] The hash function RIPEMD-160
836 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
837
838 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
839 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
840
841 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
842 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
843
844 .. [#BLAKE2] BLAKE2 — fast secure hashing
845 (https://blake2.net/)
846
847 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
848 and Extendable-Output Functions
849 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
850
851 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
852 (https://www.streebog.net/)
853
854 .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
855 (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
856
857 .. [#GEMATO] gemato: Gentoo Manifest Tool
858 (https://github.com/mgorny/gemato/)
859
860 Copyright
861 =========
862 This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
863 Unported License. To view a copy of this license, visit
864 http://creativecommons.org/licenses/by-sa/3.0/.
865
866 --
867 Best regards,
868 Michał Górny

Replies