Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] [v1.0.1] GLEP 74: Full-tree verification using Manifest files
Date: Sun, 29 Oct 2017 19:08:07
Message-Id: 1509304076.14897.17.camel@gentoo.org
In Reply to: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files by "Michał Górny"
1 W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
2 napisał:
3 > After a week of hard work, I'd like to request your comments
4 > on the draft of GLEP 74. This GLEP aims to replace the old tree-signing
5 > GLEPs 58 and 60 with a superior implementation and more complete
6 > specification.
7 >
8 > The original tree-signing GLEPs were accepted a few years back but they
9 > have never been implemented. This specification, on the other hand,
10 > comes with a working reference implementation for the verification
11 > algorithm. I expect to finish the update/generation part in a few days,
12 > then work on additional optimizations (threading, incremental
13 > verification, incremental updates).
14 >
15 > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
16 > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
17 > impl: https://github.com/mgorny/gemato/
18 >
19 > Full text following for inline comments.
20 >
21
22 Here's an updated version based on the feedback so far. Gemato is also
23 ready for the first public testing, albeit it does not implement Gentoo-
24 specific rules yet.
25
26 ---
27 GLEP: 74
28 Title: Full-tree verification using Manifest files
29 Author: Michał Górny <mgorny@g.o>,
30 Robin Hugh Johnson <robbat2@g.o>,
31 Ulrich Müller <ulm@g.o>
32 Type: Standards Track
33 Status: Draft
34 Version: 1
35 Created: 2017-10-21
36 Last-Modified: 2017-10-29
37 Post-History: 2017-10-26
38 Content-Type: text/x-rst
39 Requires: 59, 61
40 Replaces: 44, 58, 60
41 ---
42
43 Abstract
44 ========
45
46 This GLEP extends the Manifest file format to cover full-tree file
47 integrity and authenticity checks.The format aims to be future-proof,
48 efficient and provide means of backwards compatibility.
49
50
51 Motivation
52 ==========
53
54 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
55 means of verifying the integrity of distfiles and package files
56 in Gentoo. Combined with OpenPGP signatures, they provide means to
57 ensure the authenticity of the covered files. However, as noted
58 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
59 authenticity verification as they do not cover any files outside
60 the package directory. In particular, they provide multiple ways
61 for a third party to inject malicious code into the ebuild environment.
62
63 Historically, the topic of providing authenticity coverage for the whole
64 repository has been mentioned multiple times. The most noteworthy effort
65 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
66 They were accepted by the Council in 2010 but have never been
67 implemented. When potential implementation work started in 2017, a new
68 discussion about the specification arose. It prompted the creation
69 of a competing GLEP that would provide a redesigned alternative to
70 the old GLEPs.
71
72 This specification is designed with the following goals in mind:
73
74 1. It should provide means to ensure the authenticity of the complete
75 repository, including preventing the injection of additional files.
76
77 2. Like the original Manifest2, the files should be split into two
78 groups — files whose authenticity is critical, and those whose
79 mismatch may be accepted in non-strict mode. The same classification
80 should apply both to files listed in Manifests, and to stray files
81 present only in the repository.
82
83 3. The format should be universal enough to work both for the Gentoo
84 repository and third-party repositories of different characteristics.
85
86 4. The Manifest files should be verifiable stand-alone, that is without
87 knowing any details about the underlying repository format.
88
89
90 Specification
91 =============
92
93 Manifest file format
94 --------------------
95
96 This specification reuses and extends the Manifest file format defined
97 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
98 repurposed as a generic *tag* that could also indicate additional
99 (non-checksum) metadata. Appropriately, those tags can be followed by
100 other space-separated values.
101
102 Unless specified otherwise, the paths used in the Manifest files
103 are relative to the directory containing the Manifest file. The paths
104 must not reference the parent directory (``..``).
105
106
107 Manifest file locations and nesting
108 -----------------------------------
109
110 The ``Manifest`` file located in the root directory of the repository
111 is called top-level Manifest, and it is used to perform the full-tree
112 verification. In order to verify the authenticity, it must be signed
113 using OpenPGP, using the armored cleartext format.
114
115 The top-level Manifest may reference sub-Manifests contained
116 in subdirectories of the repository. The sub-Manifests are traditionally
117 named ``Manifest``; however, the implementation must support arbitrary
118 names, including the possibility of multiple (split) Manifests
119 for a single directory. The sub-Manifest can only cover the files inside
120 the directory tree where it resides.
121
122 The sub-Manifest can also be signed using OpenPGP armored cleartext
123 format. However, the signature verification can be omitted if it is
124 covered by a signed top-level Manifest.
125
126 The Manifest files can also specify ``IGNORE`` entries to skip Manifest
127 verification of subdirectories and/or files. Files and directories
128 starting with a dot are always implicitly ignored. All files that
129 are not ignored must be covered by at least one of the Manifests.
130
131 A single file may be matched by multiple identical or equivalent
132 Manifest entries, if and only if the entries have the same semantics,
133 specify the same size and the checksums common to both entries match.
134 It is an error for a single file to be matched by multiple entries
135 of different semantics, file size or checksum values. It is an error
136 to specify another entry for a file matching ``IGNORE``, or one of its
137 subdirectories.
138
139 The file entries (except for ``IGNORE``) can be specified for regular
140 files only. Symbolic links are followed when opening files. It is
141 an error to specify an entry for a different file type.
142
143 All the local (non-``DIST``) files covered by a Manifest tree must
144 reside on the same filesystem. It is an error to specify entries
145 applying to files on another filesystem. If subdirectories
146 of the Manifest tree reside on a different filesystem, they must
147 be explicitly excluded via ``IGNORE``.
148
149
150 File verification
151 -----------------
152
153 When verifying a file against the Manifest, the following rules are
154 used:
155
156 - if a file listed in Manifest is not present, then the verification
157 for the file fails,
158
159 - if a file listed in Manifest is present but has a different size
160 or one of the checksums does not match, the verification fails,
161
162 - if a file is present but not listed in Manifest, the verification
163 fails,
164
165 - otherwise, the verification succeeds.
166
167 Unless specified otherwise, the package manager must not allow using
168 any files for which the verification failed. The package manager may
169 reject any package or even the whole repository if it may refer to files
170 for which the verification failed.
171
172
173 New Manifest tags
174 -----------------
175
176 The Manifest files can specify the following tags:
177
178 ``TIMESTAMP <iso8601>``
179 Specifies a timestamp of when the Manifest file was last updated.
180 The timestamp must be a valid second-precision ISO8601 extended format
181 combined date and time in UTC timezone, i.e. using the following
182 ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
183 in the top-level Manifest file. The package manager can use it
184 to detect an outdated repository checkout as described in `Timestamp
185 verification`_.
186
187 ``MANIFEST <path> <size> <checksums>…``
188 Specifies a sub-Manifest. The sub-Manifest must be verified like
189 a regular file. If the verification succeeds, the entries from
190 the sub-Manifest are included for verification as described
191 in `Manifest file locations and nesting`_.
192
193 ``IGNORE <path>``
194 Ignores a subdirectory or file from Manifest checks. If the specified
195 path is present, it and its contents are omitted from the Manifest
196 verification (always pass).
197
198 ``DATA <path> <size> <checksums>…``
199 Specifies a file subject to obligatory Manifest verification.
200 The file is required to pass verification. Used for all files directly
201 affecting package manager operation (ebuilds, eclasses, profiles).
202
203 ``MISC <path> <size> <checksums>…``
204 Specifies a file subject to non-obligatory Manifest verification.
205 The package manager may ignore a verification failure if operating
206 in non-strict mode. Used for files that do not affect the installed
207 packages (``metadata.xml``, ``use.desc``).
208
209 ``OPTIONAL <path>``
210 Specifies a file that would be subject to non-obligatory Manifest
211 verification if it existed. The package may ignore a stray file
212 matching this entry if operating in non-strict mode. Used for paths
213 that would match ``MISC`` if they existed.
214
215 ``DIST <filename> <size> <checksums>…``
216 Specifies a distfile entry used to verify files fetched as part
217 of ``SRC_URI``. The filename must match the filename used to store
218 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
219 manager must reject the fetched file if it fails verification.
220 ``DIST`` entries apply to all packages below the Manifest file
221 specifying them.
222
223
224 Deprecated Manifest tags
225 ------------------------
226
227 For backwards compatibility, the following tags are additionally
228 allowed at the package directory level:
229
230 ``EBUILD <filename> <size> <checksums>…``
231 Equivalent to the ``DATA`` type.
232
233 ``AUX <filename> <size> <checksums>…``
234 Equivalent to the ``DATA`` type, except that the filename is relative
235 to ``files/`` subdirectory.
236
237
238 Timestamp verification
239 ----------------------
240
241 The Manifest file can contain a ``TIMESTAMP`` entry to account
242 for attacks against tree update distribution. If such an entry
243 is present, it should be updated every time at least one
244 of the Manifests changes. Every unique timestamp value must correspond
245 to a single tree state.
246
247 During the verification process, the client should compare the timestamp
248 against the update time obtained from a local clock or a trusted time
249 source. If the comparison result indicates that the Manifest at the time
250 of receiving was already significantly outdated, the client should
251 either fail the verification or require manual confirmation from user.
252
253 Furthermore, the Manifest provider may employ additional methods
254 of distributing the timestamps of recently generated Manifests
255 using a secure channel from a trusted source for exact comparison.
256 The exact details of such a solution are outside the scope of this
257 specification.
258
259
260 Algorithm for full-tree verification
261 ------------------------------------
262
263 In order to perform full-tree verification, the following algorithm
264 can be used:
265
266 1. Collect all files present in the repository into *present* set.
267
268 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
269 Optionally verify the ``TIMESTAMP`` entry if present as specified
270 in `timestamp verification`. Remove the top-level Manifest
271 from the *present* set.
272
273 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
274 files according to `file verification`_ section, and include their
275 entries in the current Manifest entry list (using paths relative
276 to directories containing the Manifests).
277
278 4. Process all ``IGNORE`` entries. Remove any paths matching them
279 from the *present* set.
280
281 5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
282 ``EBUILD`` and ``AUX`` entries into the *covered* set.
283
284 6. Verify the entries in *covered* set for incompatible duplicates
285 and collisions with ignored files as explained in `Manifest file
286 locations and nesting`_.
287
288 7. Verify all the files in the union of the *present* and *covered*
289 sets, according to `file verification`_ section.
290
291
292 Algorithm for finding parent Manifests
293 --------------------------------------
294
295 In order to find the top-level Manifest from the current directory
296 the following algorithm can be used:
297
298 1. Store the current directory as *original* and the device ID
299 of the containing filesystem (``st_dev``) as *startdev*,
300
301 2. If the device ID of the containing filesystem (``st_dev``)
302 of the current directory is different than *startdev*, stop.
303
304 3. If the current directory contains a ``Manifest`` file:
305
306 a. If a ``IGNORE`` entry in the ``Manifest`` file covers
307 the *original* directory (or one of the parent directories), stop.
308
309 b. Otherwise, store the current directory as *last_found*.
310
311 4. If the current directory is the root system directory (``/``), stop.
312
313 5. Otherwise, enter the parent directory and jump to step 2.
314
315 Once the algorithm stops, *last_found* will contain the relevant
316 top-level Manifest. If *last_found* is null, then the directory tree
317 does not contain any valid top-level Manifest candidates and one should
318 be created in the *original* directory.
319
320 Once the top-level Manifest is found, its ``MANIFEST`` entries should
321 be used to find any sub-Manifests below the top-level Manifest,
322 up to and including the *original* directory. Note that those
323 sub-Manifests can use different filenames than ``Manifest``.
324
325
326 Checksum algorithms
327 -------------------
328
329 This section is informational only. Specifying the exact set
330 of supported algorithms is outside the scope of this specification.
331
332 The algorithm names reserved at the time of writing are:
333
334 - ``MD5`` [#MD5]_,
335 - ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
336 - ``SHA1`` [#SHS]_,
337 - ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
338 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
339 - ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
340 - ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
341 - ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
342 [#STREEBOG]_.
343
344 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
345 It is recommended that any new hashes are named after the Python
346 ``hashlib`` module algorithm names, transformed into uppercase.
347
348
349 Manifest compression
350 --------------------
351
352 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
353 This section merely addresses interoperability issues between Manifest
354 compression and this specification.
355
356 The compressed Manifest files are required to be suffixed for their
357 compression algorithm. This suffix should be used to recognize
358 the compression and decompress Manifests transparently. The exact list
359 of algorithms and their corresponding suffixes are outside the scope
360 of this specification.
361
362 Whenever this specification refers to top-level Manifest file,
363 the implementation should account for compressed variants of this file
364 with appropriate suffixes (e.g. ``Manifest.gz``).
365
366 Whenever this specification refers to sub-Manifests, they can use any
367 names but are also required to use a specific compression suffix.
368 The ``MANIFEST`` entries are required to specify the full name including
369 compression suffix, and the verification is performed on the compressed
370 file.
371
372 The specification permits uncompressed Manifests to exist alongside
373 their compressed counterparts, and multiple compressed formats
374 to coexist. If that is the case, the files must have the same
375 uncompressed content and the specification is free to choose either
376 of the files using the same base name.
377
378
379 Rationale
380 =========
381
382 Stand-alone format
383 ------------------
384
385 The first question that needed to be asked before proceeding with
386 the design was whether the Manifest file format was supposed to be
387 stand-alone, or tightly bound to the repository format.
388
389 The stand-alone format has been selected because of its three
390 advantages:
391
392 1. It is more future-proof. If an incompatible change to the repository
393 format is introduced, only developers need to be upgrade the tools
394 they use to generate the Manifests. The tools used to verify
395 the updated Manifests will continue to work.
396
397 2. It is more flexible and universal. With a dedicated tool,
398 the Manifest files can be used to sign and verify arbitrary file
399 sets.
400
401 3. It keeps the verification tool simpler. In particular, we can easily
402 write an independent verification tool that could work on any
403 distribution without needing to depend on a package manager
404 implementation or rewrite parts of it.
405
406 Designing a stand-alone format requires that the Manifest carries enough
407 information to perform the verification following all the rules specific
408 to the Gentoo repository.
409
410
411 Tree design
412 -----------
413
414 The second important point of the design was determining whether
415 the Manifest files should be structured hierarchically, or independent.
416 Both options have their advantages.
417
418 In the hierarchical model, each sub-Manifest file is covered by a higher
419 level Manifest. As a result, only the top-level Manifest has to be
420 OpenPGP-signed, and subsequent Manifests need to be only verified by
421 checksum stored in the parent Manifest. This has the following
422 implications:
423
424 - Verifying any set of files in the repository requires using checksums
425 from the most relevant Manifests and the parent Manifests.
426
427 - The OpenPGP signature of the top-level Manifest needs to be verified
428 only once per process.
429
430 - Altering any set of files requires updating the relevant Manifests,
431 and their parent Manifests up to the top-level Manifest, and signing
432 the last one.
433
434 - As a result, the top-level Manifest changes on every commit,
435 and various middle-level Manifests change (and need to be transferred)
436 frequently.
437
438 In the independent model, each sub-Manifest file is independent
439 of the parent Manifests. As a result, each of them needs to be signed
440 and verified independently. However, the parent Manifests still need
441 to list sub-Manifests (albeit without verification data) in order
442 to detect removal or replacement of subdirectories. This has
443 the following implications:
444
445 - Verifying any set of files in the repository requires using checksums
446 and verifying signatures of the most relevant Manifest files.
447
448 - Altering any set of files requires updating the relevant Manifests
449 and signing them again.
450
451 - Parent Manifests are updated only when Manifests are added or removed
452 from subdirectories. As a result, they change infrequently.
453
454 While both models have their advantages, the hierarchical model was
455 selected because it reduces the number of OpenPGP operations
456 which are comparatively costly to the minimum.
457
458
459 Tree layout restrictions
460 ------------------------
461
462 The algorithm is meant to work primarily with ebuild repositories which
463 normally contain only files and directories. Directories provide
464 no useful metadata for verification, and specifying special entries
465 for additional file types is purposeless. Therefore, the specification
466 is restricted to dealing with regular files.
467
468 The Gentoo repository does not use symbolic links. Some Gentoo
469 repositories do, however. To provide a simple solution for dealing with
470 symlinks without having to take care to implement special handling for
471 them, the common behavior of implicitly resolving them is used.
472 Therefore, symbolic links to files are stored as if they were regular
473 files, and symbolic links to directories are followed as if they were
474 regular directories.
475
476 Dotfiles are implicitly ignored as that is a common notion used
477 in software written for POSIX systems. All other filenames require
478 explicit ``IGNORE`` lines.
479
480 The algorithm is restricted to work on a single filesystem. This is
481 mostly relevant when scanning for top-level Manifest — we do not want
482 to cross filesystem boundaries then. However, to ensure consistent
483 bidirectional behavior we need to also ban them when operating downwards
484 the tree.
485
486 The directories and files on different filesystems needs to be ignored
487 explicitly as implicitly skipping them would cause confusion.
488 In particular, tools might then claim that a file does not exist when
489 it clearly does because it was skipped due to filesystem boundaries.
490
491
492 File verification model
493 -----------------------
494
495 The verification model aims to provide full coverage against different
496 forms of attack. In particular, three different kinds of manipulation
497 are considered:
498
499 1. Alteration of the file content.
500
501 2. Removal of a file.
502
503 3. Addition of a new file.
504
505 In order to prevent against all three, the system requires that all
506 files in the repository are listed in Manifests and verified against
507 them.
508
509 As a special case, ignores are allowed to account for directories
510 that are not part of the repository but were traditionally placed inside
511 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
512 could be also used to ignore VCS directories such as ``CVS``.
513
514
515 Non-obligatory Manifest verification
516 ------------------------------------
517
518 While this specification recommends all tools to use strict verification
519 by default, it allows declaring some files as non-obligatory like
520 the original Manifest2 format did. This could be used on files that do
521 not affect the normal package manager operation.
522
523 It aims to account for two use cases:
524
525 1. Stripping down files that are not strictly required to install
526 packages from repository checkouts.
527
528 2. Accounting for automatically generated files that might be updated
529 by standard tooling.
530
531 The traditional ``MISC`` type is amended with a complementary
532 ``OPTIONAL`` tag to account for files that are not provided
533 in the specific repository. It aims to ensure that the same path would
534 be non-fatal when provided by the repository but fatal when created
535 by the user tooling.
536
537
538 Timestamp field
539 ---------------
540
541 The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
542 to include a generation timestamp in the Manifest. A similar feature
543 was originally proposed in GLEP 58 [#GLEP58]_.
544
545 A malicious third-party may use the principles of exclusion and replay
546 to deny an update to clients, while at the same time recording
547 the identity of clients to attack. The timestamp field can be used
548 to detect that.
549
550 In order to provide a more complete protection, the Gentoo
551 Infrastructure should provide an ability to obtain the timestamps
552 of all Manifests from a recent timeframe over a secure channel
553 from a trusted source for comparison.
554
555 Strictly speaking, this is already provided by the various
556 ``metadata/timestamp.*`` files provided already by Gentoo which are also
557 covered by the Manifest. However, including the value in the Manifest
558 itself has a little cost and provides the ability to perform
559 the verification stand-alone.
560
561
562 New vs deprecated tags
563 ----------------------
564
565 Out of the four types defined by Manifest2, two are reused and two are
566 marked deprecated.
567
568 The ``DIST`` and ``MISC`` tags are reused since they can be relatively
569 clearly marked into the new concept.
570
571 The ``EBUILD`` tag could potentially be reused for generic file
572 verification data. However, it would be confusing if all the different
573 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
574 type was introduced as a replacement.
575
576 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
577 the limiting property of implicit ``files/`` path prefix.
578
579
580 Finding top-level Manifest
581 --------------------------
582
583 The development of a reference implementation for this GLEP has brought
584 the following problem: how to find all the relevant Manifests when
585 the Manifest tool is run inside a subdirectory of the repository?
586
587 One of the options would be to provide a bi-directional linking
588 of Manifests via a ``PARENT`` tag. However, that would not solve
589 the problem when a new Manifest file is being created.
590
591 Instead, an algorithm for iterating over parent directories is proposed.
592 Since there is no obligatory explicit indicator for the top-level
593 Manifest, the algorithm assumes that the top-level Manifest
594 is the highest ``Manifest`` in the directory hierarchy that can cover
595 the current directory. This generally makes sense since the Manifest
596 files are required to provide coverage for all subdirectories, so all
597 Manifests starting from that one need to be updated.
598
599 If independent Manifest trees are nested in the directory structure,
600 then an ``IGNORE`` entry needs to be used to separate them.
601
602 Since sub-Manifests can use any filenames, the Manifest finding
603 algorithm must not short-cut the procedure by storing all ``Manifest``
604 files along the parent directories. Instead, it needs to retrace
605 the relevant sub-Manifest files along ``MANIFEST`` entries
606 in the top-level Manifest.
607
608
609 Injecting ChangeLogs into the checkout
610 --------------------------------------
611
612 One of the problems considered in the new Manifest format was that
613 of injecting historical and autogenerated ChangeLog into the repository.
614 Normally we are not including those files to reduce the checkout size.
615 However, some users have shown interest in them and Infra is working
616 on providing them via an additional rsync module.
617
618 If such files were injected into the repository, they would cause strict
619 verification failures of Manifests. To account for this, Infra could
620 provide either ``OPTIONAL`` entries for the Manifest files to allow them
621 in non-strict verification mode, or ``IGNORE`` entries to allow them
622 in the strict mode.
623
624
625 Splitting distfile checksums from file checksums
626 ------------------------------------------------
627
628 Another problem with the current Manifest format is that the checksums
629 for fetched files are combined with checksums for local files
630 in a single file inside the package directory. It has been specifically
631 pointed out that:
632
633 - since distfiles are sometimes reused across different packages,
634 the repeating checksums are redundant,
635
636 - mirror admins were interested in the possibility of verifying all
637 the distfiles with a single tool.
638
639 This specification does not provide a clean solution to this problem.
640 It technically permits moving ``DIST`` entries to higher-level Manifests
641 but the usefulness of such a solution is doubtful.
642
643 However, for the second problem we will probably deliver a dedicated
644 tool working with this Manifest format.
645
646
647 Hash algorithms
648 ---------------
649
650 While maintaining a consistent supported hash set is important
651 for interoperability, it is no good fit for the generic layout of this
652 GLEP. Furthermore, it would require updating the GLEP in the future
653 every time the used algorithms change.
654
655 Instead, the specification focuses on listing the currently used
656 algorithm names for interoperability, and sets a recommendation
657 for consistent naming of algorithms in the future. The Python
658 ``hashlib`` module is used as a reference since it is used
659 as the provider of hash functions for most of the Python software,
660 including Portage and PkgCore.
661
662 The basic rules for changing hash algorithms are defined in GLEP 59
663 [#GLEP59]_. The implementations can focus only on those algorithms
664 that are actually used or planned on being used. It may be feasible
665 to devise a new GLEP that specifies the currently used hashes (or update
666 GLEP 59 accordingly).
667
668
669 Manifest compression
670 --------------------
671
672 The support for Manifest compression is introduced with minimal changes
673 to the file format. The ``MANIFEST`` entries are required to provide
674 the real (compressed) file path for compatibility with other file
675 entries and to avoid confusion.
676
677 The existence of additional entries for uncompressed Manifest checksums
678 was debated. However, plain entries for the uncompressed file would
679 be confusing if only compressed file existed, and conflicting if both
680 uncompressed and compressed variants existed. Furthermore, it has been
681 pointed out that ``DIST`` entries do not have uncompressed variant
682 either.
683
684
685 Performance considerations
686 --------------------------
687
688 Performing a full-tree verification on every sync raises some
689 performance concerns for end-user systems. The initial testing has shown
690 that a cold-cache verification on a btrfs file system can take up around
691 4 minutes, with the process being mostly I/O bound. On the other hand,
692 it can be expected that the verification will be performed directly
693 after syncing, taking advantage of warm filesystem cache.
694
695 To improve speed on I/O and/or CPU-restrained systems even further,
696 the algorithms can be easily extended to perform incremental
697 verification. Given that rsync does not preserve mtimes by default,
698 the tool can take advantage of mtime and Manifest comparisons to recheck
699 only the parts of the repository that have changed.
700
701 Furthermore, the package manager implementations can restrict checking
702 only to the parts of the repository that are actually being used.
703
704
705 Backwards Compatibility
706 =======================
707
708 This GLEP provides optional means of preserving backwards compatibility.
709 To preserve the backwards compatibility, the following needs to be
710 ensured:
711
712 - all files within the package directory must be covered by ``Manifest``
713 file inside that package directory,
714
715 - all distfiles used by the package must be covered by ``Manifest``
716 file inside the package directory,
717
718 - all files inside the ``files/`` subdirectory of a package directory
719 need to be use the deprecated ``AUX`` tag (rather than ``DATA``),
720
721 - all ``.ebuild`` files inside the package directory need to use
722 the deprecated ``EBUILD`` tag (rather than ``DATA``),
723
724 - the Manifest files inside the package directory can be signed
725 to provide authenticity verification,
726
727 - if the Manifest files inside the package directory are compressed,
728 a uncompressed file of identical content must coexist.
729
730 Once the backwards compatibility is no longer a concern, the above
731 no longer needs to hold and the deprecated tags can be removed.
732
733
734 Reference Implementation
735 ========================
736
737 The reference implementation for this GLEP is being developed
738 as the gemato project [#GEMATO]_.
739
740
741 Credits
742 =======
743
744 Thanks to all the people whose contributions were invaluable
745 to the creation of this GLEP. This includes but is not limited to:
746
747 - Robin Hugh Johnson,
748 - Ulrich Müller.
749
750 Additionally, thanks to Robin Hugh Johnson for the original
751 MataManifest GLEP series which served both as inspiration and source
752 of many concepts used in this GLEP. Recursively, also thanks to all
753 the people who contributed to the original GLEPs.
754
755
756 References
757 ==========
758
759 .. [#GLEP44] GLEP 44: Manifest2 format
760 (https://www.gentoo.org/glep/glep-0044.html)
761
762 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
763 - Overview
764 (https://www.gentoo.org/glep/glep-0057.html)
765
766 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
767 - Infrastructure to User distribution - MetaManifest
768 (https://www.gentoo.org/glep/glep-0058.html)
769
770 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
771 (https://www.gentoo.org/glep/glep-0059.html)
772
773 .. [#GLEP60] GLEP 60: Manifest2 filetypes
774 (https://www.gentoo.org/glep/glep-0060.html)
775
776 .. [#GLEP61] GLEP 61: Manifest2 compression
777 (https://www.gentoo.org/glep/glep-0061.html)
778
779 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
780 Format - SRC_URI
781 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
782
783 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
784 (https://www.ietf.org/rfc/rfc1321.txt)
785
786 .. [#RIPEMD160] The hash function RIPEMD-160
787 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
788
789 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
790 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
791
792 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
793 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
794
795 .. [#BLAKE2] BLAKE2 — fast secure hashing
796 (https://blake2.net/)
797
798 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
799 and Extendable-Output Functions
800 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
801
802 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
803 (https://www.streebog.net/)
804
805 .. [#GEMATO] gemato: Gentoo Manifest Tool
806 (https://github.com/mgorny/gemato/)
807
808 Copyright
809 =========
810 This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
811 Unported License. To view a copy of this license, visit
812 http://creativecommons.org/licenses/by-sa/3.0/.
813
814 --
815 Best regards,
816 Michał Górny

Replies