Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
Date: Mon, 30 Oct 2017 16:51:49
Message-Id: 1509382296.1517.10.camel@gentoo.org
In Reply to: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files by "Michał Górny"
1 Here's another version with a few clarification-class changes:
2
3 62819e2 glep-0074: Clarify OPTIONAL desc
4 e953eaf glep-0074: Add two example files for reference
5 f98cabc glep-0074: Reorganize to have tag references after basic algos
6 56b06b0 glep-0074: Rewrite the file verificaton to cover OPTIONAL
7 bbabc4d glep-0074: Split 'Directory tree coverage' section out
8 fe62b50 glep-0074: Apply more suggestions from Robin
9
10 W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
11 napisał:
12 >
13 > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
14 > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
15 > impl: https://github.com/mgorny/gemato/
16 >
17
18 ---
19 GLEP: 74
20 Title: Full-tree verification using Manifest files
21 Author: Michał Górny <mgorny@g.o>,
22 Robin Hugh Johnson <robbat2@g.o>,
23 Ulrich Müller <ulm@g.o>
24 Type: Standards Track
25 Status: Draft
26 Version: 1
27 Created: 2017-10-21
28 Last-Modified: 2017-10-30
29 Post-History: 2017-10-26
30 Content-Type: text/x-rst
31 Requires: 59, 61
32 Replaces: 44, 58, 60
33 ---
34
35 Abstract
36 ========
37
38 This GLEP extends the Manifest file format to cover full-tree file
39 integrity and authenticity checks.The format aims to be future-proof,
40 efficient and provide means of backwards compatibility.
41
42
43 Motivation
44 ==========
45
46 The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
47 means of verifying the integrity of distfiles and package files
48 in Gentoo. Combined with OpenPGP signatures, they provide means to
49 ensure the authenticity of the covered files. However, as noted
50 in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
51 authenticity verification as they do not cover any files outside
52 the package directory. In particular, they provide multiple ways
53 for a third party to inject malicious code into the ebuild environment.
54
55 Historically, the topic of providing authenticity coverage for the whole
56 repository has been mentioned multiple times. The most noteworthy effort
57 are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
58 They were accepted by the Council in 2010 but have never been
59 implemented. When potential implementation work started in 2017, a new
60 discussion about the specification arose. It prompted the creation
61 of a competing GLEP that would provide a redesigned alternative to
62 the old GLEPs.
63
64 This specification is designed with the following goals in mind:
65
66 1. It should provide means to ensure the authenticity of the complete
67 repository, including preventing the injection of additional files.
68
69 2. Like the original Manifest2, the files should be split into two
70 groups — files whose authenticity is critical, and those whose
71 mismatch may be accepted in non-strict mode. The same classification
72 should apply both to files listed in Manifests, and to stray files
73 present only in the repository.
74
75 3. The format should be universal enough to work both for the Gentoo
76 repository and third-party repositories of different characteristics.
77
78 4. The Manifest files should be verifiable stand-alone, that is without
79 knowing any details about the underlying repository format.
80
81
82 Specification
83 =============
84
85 Manifest file format
86 --------------------
87
88 This specification reuses and extends the Manifest file format defined
89 in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
90 repurposed as a generic *tag* that could also indicate additional
91 (non-checksum) metadata. Appropriately, those tags can be followed by
92 other space-separated values.
93
94 Unless specified otherwise, the paths used in the Manifest files
95 are relative to the directory containing the Manifest file. The paths
96 must not reference the parent directory (``..``).
97
98
99 Manifest file locations and nesting
100 -----------------------------------
101
102 The ``Manifest`` file located in the root directory of the repository
103 is called top-level Manifest, and it is used to perform the full-tree
104 verification. In order to verify the authenticity, it must be signed
105 using OpenPGP, using the armored cleartext format.
106
107 The top-level Manifest may reference sub-Manifests contained
108 in subdirectories of the repository. The sub-Manifests are traditionally
109 named ``Manifest``; however, the implementation must support arbitrary
110 names, including the possibility of multiple (split) Manifests
111 for a single directory. The sub-Manifest can only cover the files inside
112 the directory tree where it resides.
113
114 The sub-Manifest can also be signed using OpenPGP armored cleartext
115 format. However, the signature verification can be omitted if it is
116 covered by a signed top-level Manifest.
117
118
119 Directory tree coverage
120 -----------------------
121
122 The Manifest files can also specify ``IGNORE`` entries to skip Manifest
123 verification of subdirectories and/or files. The package manager can
124 support injecting ignore paths to account for additional files created,
125 modified or removed by user's processes that would not be ignored
126 by existing rules. Files and directories starting with a dot are always
127 implicitly ignored. All files that are not ignored must be covered
128 by at least one of the Manifests.
129
130 A single file may be matched by multiple identical or equivalent
131 Manifest entries, if and only if the entries have the same semantics,
132 specify the same size and the checksums common to both entries match.
133 It is an error for a single file to be matched by multiple entries
134 of different semantics, file size or checksum values. It is an error
135 to specify another entry for a file matching ``IGNORE``, or one of its
136 subdirectories.
137
138 The file entries (except for ``IGNORE``) can be specified for regular
139 files only. Symbolic links are followed when opening files. It is
140 an error to specify an entry for a different file type.
141
142 All the local (non-``DIST``) files covered by a Manifest tree must
143 reside on the same filesystem. It is an error to specify entries
144 applying to files on another filesystem. If subdirectories
145 of the Manifest tree reside on a different filesystem, they must
146 be explicitly excluded via ``IGNORE``.
147
148
149 File verification
150 -----------------
151
152 When verifying a file against the Manifest, the following rules are
153 used:
154
155 1. If the file is covered directly or indirectly by an entry
156 of the ``IGNORE`` type, the verification always succeeds.
157
158 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
159 ``MISC``, ``EBUILD`` or ``AUX`` type:
160
161 a. if the file is not present, then the verification fails,
162
163 b. if the file is present but has a different size or one
164 of the checksums does not match, the verification fails,
165
166 c. otherwise, the verification succeeds.
167
168 3. If the file is covered by an entry of the ``OPTIONAL`` type:
169
170 a. if the file is present, then the verification fails,
171
172 b. otherwise, the verification succeeds.
173
174 4. If the file is present but not listed in Manifest, the verification
175 fails.
176
177 Unless specified otherwise, the package manager must not allow using
178 any files for which the verification failed. The package manager may
179 reject any package or even the whole repository if it may refer to files
180 for which the verification failed.
181
182
183 Timestamp verification
184 ----------------------
185
186 The Manifest file can contain a ``TIMESTAMP`` entry to account
187 for attacks against tree update distribution. If such an entry
188 is present, it should be updated every time at least one
189 of the Manifests changes. Every unique timestamp value must correspond
190 to a single tree state.
191
192 During the verification process, the client should compare the timestamp
193 against the update time obtained from a local clock or a trusted time
194 source. If the comparison result indicates that the Manifest at the time
195 of receiving was already significantly outdated, the client should
196 either fail the verification or require manual confirmation from user.
197
198 Furthermore, the Manifest provider may employ additional methods
199 of distributing the timestamps of recently generated Manifests
200 using a secure channel from a trusted source for exact comparison.
201 The exact details of such a solution are outside the scope of this
202 specification.
203
204
205 Modern Manifest tags
206 --------------------
207
208 The Manifest files can specify the following tags:
209
210 ``TIMESTAMP <iso8601>``
211 Specifies a timestamp of when the Manifest file was last updated.
212 The timestamp must be a valid second-precision ISO8601 extended format
213 combined date and time in UTC timezone, i.e. using the following
214 ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
215 in the top-level Manifest file. The package manager can use it
216 to detect an outdated repository checkout as described in `Timestamp
217 verification`_.
218
219 ``MANIFEST <path> <size> <checksums>…``
220 Specifies a sub-Manifest. The sub-Manifest must be verified like
221 a regular file. If the verification succeeds, the entries from
222 the sub-Manifest are included for verification as described
223 in `Manifest file locations and nesting`_.
224
225 ``IGNORE <path>``
226 Ignores a subdirectory or file from Manifest checks. If the specified
227 path is present, it and its contents are omitted from the Manifest
228 verification (always pass).
229
230 ``DATA <path> <size> <checksums>…``
231 Specifies a file subject to obligatory Manifest verification.
232 The file is required to pass verification. Used for all files directly
233 affecting package manager operation (ebuilds, eclasses, profiles).
234
235 ``MISC <path> <size> <checksums>…``
236 Specifies a file subject to non-obligatory Manifest verification.
237 The package manager may ignore a verification failure if operating
238 in non-strict mode. Used for files that do not affect the installed
239 packages (``metadata.xml``, ``use.desc``).
240
241 ``OPTIONAL <path>``
242 Specifies a file that does not exist in the distribution but if it
243 did, it would be marked as ``MISC``. In the strict mode, the file
244 must not exist for the verification to pass. The package manager
245 may ignore a stray file matching this entry if operating in non-strict
246 mode.
247
248 ``DIST <filename> <size> <checksums>…``
249 Specifies a distfile entry used to verify files fetched as part
250 of ``SRC_URI``. The filename must match the filename used to store
251 the fetched file as specified in the PMS [#PMS-FETCH]_. The package
252 manager must reject the fetched file if it fails verification.
253 ``DIST`` entries apply to all packages below the Manifest file
254 specifying them.
255
256
257 Deprecated Manifest tags
258 ------------------------
259
260 For backwards compatibility, the following tags are additionally
261 allowed at the package directory level:
262
263 ``EBUILD <filename> <size> <checksums>…``
264 Equivalent to the ``DATA`` type.
265
266 ``AUX <filename> <size> <checksums>…``
267 Equivalent to the ``DATA`` type, except that the filename is relative
268 to ``files/`` subdirectory.
269
270
271 Algorithm for full-tree verification
272 ------------------------------------
273
274 In order to perform full-tree verification, the following algorithm
275 can be used:
276
277 1. Collect all files present in the repository into *present* set.
278
279 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
280 Optionally verify the ``TIMESTAMP`` entry if present as specified
281 in `timestamp verification`. Remove the top-level Manifest
282 from the *present* set.
283
284 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
285 files according to `file verification`_ section, and include their
286 entries in the current Manifest entry list (using paths relative
287 to directories containing the Manifests).
288
289 4. Process all ``IGNORE`` entries. Remove any paths matching them
290 from the *present* set.
291
292 5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
293 ``EBUILD`` and ``AUX`` entries into the *covered* set.
294
295 6. Verify the entries in *covered* set for incompatible duplicates
296 and collisions with ignored files as explained in `Manifest file
297 locations and nesting`_.
298
299 7. Verify all the files in the union of the *present* and *covered*
300 sets, according to `file verification`_ section.
301
302
303 Algorithm for finding parent Manifests
304 --------------------------------------
305
306 In order to find the top-level Manifest from the current directory
307 the following algorithm can be used:
308
309 1. Store the current directory as *original* and the device ID
310 of the containing filesystem (``st_dev``) as *startdev*,
311
312 2. If the device ID of the containing filesystem (``st_dev``)
313 of the current directory is different than *startdev*, stop.
314
315 3. If the current directory contains a ``Manifest`` file:
316
317 a. If a ``IGNORE`` entry in the ``Manifest`` file covers
318 the *original* directory (or one of the parent directories), stop.
319
320 b. Otherwise, store the current directory as *last_found*.
321
322 4. If the current directory is the root system directory (``/``), stop.
323
324 5. Otherwise, enter the parent directory and jump to step 2.
325
326 Once the algorithm stops, *last_found* will contain the relevant
327 top-level Manifest. If *last_found* is null, then the directory tree
328 does not contain any valid top-level Manifest candidates and one should
329 be created in the *original* directory.
330
331 Once the top-level Manifest is found, its ``MANIFEST`` entries should
332 be used to find any sub-Manifests below the top-level Manifest,
333 up to and including the *original* directory. Note that those
334 sub-Manifests can use different filenames than ``Manifest``.
335
336
337 Checksum algorithms
338 -------------------
339
340 This section is informational only. Specifying the exact set
341 of supported algorithms is outside the scope of this specification.
342
343 The algorithm names reserved at the time of writing are:
344
345 - ``MD5`` [#MD5]_,
346 - ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
347 - ``SHA1`` [#SHS]_,
348 - ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
349 - ``WHIRLPOOL`` [#WHIRLPOOL]_,
350 - ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
351 - ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
352 - ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
353 [#STREEBOG]_.
354
355 The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
356 It is recommended that any new hashes are named after the Python
357 ``hashlib`` module algorithm names, transformed into uppercase.
358
359
360 Manifest compression
361 --------------------
362
363 The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
364 This section merely addresses interoperability issues between Manifest
365 compression and this specification.
366
367 The compressed Manifest files are required to be suffixed for their
368 compression algorithm. This suffix should be used to recognize
369 the compression and decompress Manifests transparently. The exact list
370 of algorithms and their corresponding suffixes are outside the scope
371 of this specification.
372
373 Whenever this specification refers to top-level Manifest file,
374 the implementation should account for compressed variants of this file
375 with appropriate suffixes (e.g. ``Manifest.gz``).
376
377 Whenever this specification refers to sub-Manifests, they can use any
378 names but are also required to use a specific compression suffix.
379 The ``MANIFEST`` entries are required to specify the full name including
380 compression suffix, and the verification is performed on the compressed
381 file.
382
383 The specification permits uncompressed Manifests to exist alongside
384 their compressed counterparts, and multiple compressed formats
385 to coexist. If that is the case, the files must have the same
386 uncompressed content and the specification is free to choose either
387 of the files using the same base name.
388
389
390 An example Manifest file (informational)
391 ----------------------------------------
392
393 An example top-level Manifest file for the Gentoo repository would have
394 the following content::
395
396 TIMESTAMP 2017-10-30T10:11:12Z
397 IGNORE distfiles
398 IGNORE local
399 IGNORE lost+found
400 IGNORE packages
401 MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512
402 f7eb..
403 ...
404 MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
405 ...
406
407 An example modern Manifest (disregarding backwards compatibility)
408 for a package directory would have the following content::
409
410 DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
411 DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
412 DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
413 DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
414 DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512
415 1b33..
416 DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
417 MISC metadata.xml 664 SHA256 97c6.. SHA512 1175..
418
419
420 Rationale
421 =========
422
423 Stand-alone format
424 ------------------
425
426 The first question that needed to be asked before proceeding with
427 the design was whether the Manifest file format was supposed to be
428 stand-alone, or tightly bound to the repository format.
429
430 The stand-alone format has been selected because of its three
431 advantages:
432
433 1. It is more future-proof. If an incompatible change to the repository
434 format is introduced, only developers need to be upgrade the tools
435 they use to generate the Manifests. The tools used to verify
436 the updated Manifests will continue to work.
437
438 2. It is more flexible and universal. With a dedicated tool,
439 the Manifest files can be used to sign and verify arbitrary file
440 sets.
441
442 3. It keeps the verification tool simpler. In particular, we can easily
443 write an independent verification tool that could work on any
444 distribution without needing to depend on a package manager
445 implementation or rewrite parts of it.
446
447 Designing a stand-alone format requires that the Manifest carries enough
448 information to perform the verification following all the rules specific
449 to the Gentoo repository.
450
451
452 Tree design
453 -----------
454
455 The second important point of the design was determining whether
456 the Manifest files should be structured hierarchically, or independent.
457 Both options have their advantages.
458
459 In the hierarchical model, each sub-Manifest file is covered by a higher
460 level Manifest. As a result, only the top-level Manifest has to be
461 OpenPGP-signed, and subsequent Manifests need to be only verified by
462 checksum stored in the parent Manifest. This has the following
463 implications:
464
465 - Verifying any set of files in the repository requires using checksums
466 from the most relevant Manifests and the parent Manifests.
467
468 - The OpenPGP signature of the top-level Manifest needs to be verified
469 only once per process.
470
471 - Altering any set of files requires updating the relevant Manifests,
472 and their parent Manifests up to the top-level Manifest, and signing
473 the last one.
474
475 - As a result, the top-level Manifest changes on every commit,
476 and various middle-level Manifests change (and need to be transferred)
477 frequently.
478
479 In the independent model, each sub-Manifest file is independent
480 of the parent Manifests. As a result, each of them needs to be signed
481 and verified independently. However, the parent Manifests still need
482 to list sub-Manifests (albeit without verification data) in order
483 to detect removal or replacement of subdirectories. This has
484 the following implications:
485
486 - Verifying any set of files in the repository requires using checksums
487 and verifying signatures of the most relevant Manifest files.
488
489 - Altering any set of files requires updating the relevant Manifests
490 and signing them again.
491
492 - Parent Manifests are updated only when Manifests are added or removed
493 from subdirectories. As a result, they change infrequently.
494
495 While both models have their advantages, the hierarchical model was
496 selected because it reduces the number of OpenPGP operations
497 which are comparatively costly to the minimum.
498
499
500 Tree layout restrictions
501 ------------------------
502
503 The algorithm is meant to work primarily with ebuild repositories which
504 normally contain only files and directories. Directories provide
505 no useful metadata for verification, and specifying special entries
506 for additional file types is purposeless. Therefore, the specification
507 is restricted to dealing with regular files.
508
509 The Gentoo repository does not use symbolic links. Some Gentoo
510 repositories do, however. To provide a simple solution for dealing with
511 symlinks without having to take care to implement special handling for
512 them, the common behavior of implicitly resolving them is used.
513 Therefore, symbolic links to files are stored as if they were regular
514 files, and symbolic links to directories are followed as if they were
515 regular directories.
516
517 Dotfiles are implicitly ignored as that is a common notion used
518 in software written for POSIX systems. All other filenames require
519 explicit ``IGNORE`` lines.
520
521 The algorithm is restricted to work on a single filesystem. This is
522 mostly relevant when scanning for top-level Manifest — we do not want
523 to cross filesystem boundaries then. However, to ensure consistent
524 bidirectional behavior we need to also ban them when operating downwards
525 the tree.
526
527 The directories and files on different filesystems needs to be ignored
528 explicitly as implicitly skipping them would cause confusion.
529 In particular, tools might then claim that a file does not exist when
530 it clearly does because it was skipped due to filesystem boundaries.
531
532
533 File verification model
534 -----------------------
535
536 The verification model aims to provide full coverage against different
537 forms of attack. In particular, three different kinds of manipulation
538 are considered:
539
540 1. Alteration of the file content.
541
542 2. Removal of a file.
543
544 3. Addition of a new file.
545
546 In order to prevent against all three, the system requires that all
547 files in the repository are listed in Manifests and verified against
548 them.
549
550 As a special case, ignores are allowed to account for directories
551 that are not part of the repository but were traditionally placed inside
552 it. Those directories were ``distfiles``, ``local`` and ``packages``. It
553 could be also used to ignore VCS directories such as ``CVS``.
554
555
556 Non-obligatory Manifest verification
557 ------------------------------------
558
559 While this specification recommends all tools to use strict verification
560 by default, it allows declaring some files as non-obligatory like
561 the original Manifest2 format did. This could be used on files that do
562 not affect the normal package manager operation.
563
564 It aims to account for two use cases:
565
566 1. Stripping down files that are not strictly required to install
567 packages from repository checkouts.
568
569 2. Accounting for automatically generated files that might be updated
570 by standard tooling.
571
572 The traditional ``MISC`` type is amended with a complementary
573 ``OPTIONAL`` tag to account for files that are not provided
574 in the specific repository. It aims to ensure that the same path would
575 be non-fatal when provided by the repository but fatal when created
576 by the user tooling.
577
578
579 Timestamp field
580 ---------------
581
582 The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
583 to include a generation timestamp in the Manifest. A similar feature
584 was originally proposed in GLEP 58 [#GLEP58]_.
585
586 A malicious third-party may use the principles of exclusion or replay
587 [#C08]_ to deny an update to clients, while at the same time recording
588 the identity of clients to attack. The timestamp field can be used to
589 detect that.
590
591 In order to provide a more complete protection, the Gentoo
592 Infrastructure should provide an ability to obtain the timestamps
593 of all Manifests from a recent timeframe over a secure channel
594 from a trusted source for comparison.
595
596 Strictly speaking, this information is already provided by the various
597 ``metadata/timestamp*`` files that are already present. However,
598 including the value in the Manifest itself has a little cost
599 and provides the ability to perform the verification stand-alone.
600
601 Furthermore, some of the timestamp files are added very late
602 in the distribution process, past the Manifest generation phase. Those
603 files will most likely receive ``IGNORE`` entries and therefore
604 be not suitable to safe use.
605
606
607 New vs deprecated tags
608 ----------------------
609
610 Out of the four types defined by Manifest2, two are reused and two are
611 marked deprecated.
612
613 The ``DIST`` and ``MISC`` tags are reused since they can be relatively
614 clearly marked into the new concept.
615
616 The ``EBUILD`` tag could potentially be reused for generic file
617 verification data. However, it would be confusing if all the different
618 data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
619 type was introduced as a replacement.
620
621 The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
622 the limiting property of implicit ``files/`` path prefix.
623
624
625 Finding top-level Manifest
626 --------------------------
627
628 The development of a reference implementation for this GLEP has brought
629 the following problem: how to find all the relevant Manifests when
630 the Manifest tool is run inside a subdirectory of the repository?
631
632 One of the options would be to provide a bi-directional linking
633 of Manifests via a ``PARENT`` tag. However, that would not solve
634 the problem when a new Manifest file is being created.
635
636 Instead, an algorithm for iterating over parent directories is proposed.
637 Since there is no obligatory explicit indicator for the top-level
638 Manifest, the algorithm assumes that the top-level Manifest
639 is the highest ``Manifest`` in the directory hierarchy that can cover
640 the current directory. This generally makes sense since the Manifest
641 files are required to provide coverage for all subdirectories, so all
642 Manifests starting from that one need to be updated.
643
644 If independent Manifest trees are nested in the directory structure,
645 then an ``IGNORE`` entry needs to be used to separate them.
646
647 Since sub-Manifests can use any filenames, the Manifest finding
648 algorithm must not short-cut the procedure by storing all ``Manifest``
649 files along the parent directories. Instead, it needs to retrace
650 the relevant sub-Manifest files along ``MANIFEST`` entries
651 in the top-level Manifest.
652
653
654 Injecting ChangeLogs into the checkout
655 --------------------------------------
656
657 One of the problems considered in the new Manifest format was that
658 of injecting historical and autogenerated ChangeLog into the repository.
659 Normally we are not including those files to reduce the checkout size.
660 However, some users have shown interest in them and Infra is working
661 on providing them via an additional rsync module.
662
663 If such files were injected into the repository, they would cause strict
664 verification failures of Manifests. To account for this, Infra could
665 provide either ``OPTIONAL`` entries for the Manifest files to allow them
666 in non-strict verification mode, or ``IGNORE`` entries to allow them
667 in the strict mode.
668
669
670 Splitting distfile checksums from file checksums
671 ------------------------------------------------
672
673 Another problem with the current Manifest format is that the checksums
674 for fetched files are combined with checksums for local files
675 in a single file inside the package directory. It has been specifically
676 pointed out that:
677
678 - since distfiles are sometimes reused across different packages,
679 the repeating checksums are redundant,
680
681 - mirror admins were interested in the possibility of verifying all
682 the distfiles with a single tool.
683
684 This specification does not provide a clean solution to this problem.
685 It technically permits moving ``DIST`` entries to higher-level Manifests
686 but the usefulness of such a solution is doubtful.
687
688 However, for the second problem we will probably deliver a dedicated
689 tool working with this Manifest format.
690
691
692 Hash algorithms
693 ---------------
694
695 While maintaining a consistent supported hash set is important
696 for interoperability, it is no good fit for the generic layout of this
697 GLEP. Furthermore, it would require updating the GLEP in the future
698 every time the used algorithms change.
699
700 Instead, the specification focuses on listing the currently used
701 algorithm names for interoperability, and sets a recommendation
702 for consistent naming of algorithms in the future. The Python
703 ``hashlib`` module is used as a reference since it is used
704 as the provider of hash functions for most of the Python software,
705 including Portage and PkgCore.
706
707 The basic rules for changing hash algorithms are defined in GLEP 59
708 [#GLEP59]_. The implementations can focus only on those algorithms
709 that are actually used or planned on being used. It may be feasible
710 to devise a new GLEP that specifies the currently used hashes (or update
711 GLEP 59 accordingly).
712
713
714 Manifest compression
715 --------------------
716
717 The support for Manifest compression is introduced with minimal changes
718 to the file format. The ``MANIFEST`` entries are required to provide
719 the real (compressed) file path for compatibility with other file
720 entries and to avoid confusion.
721
722 The existence of additional entries for uncompressed Manifest checksums
723 was debated. However, plain entries for the uncompressed file would
724 be confusing if only compressed file existed, and conflicting if both
725 uncompressed and compressed variants existed. Furthermore, it has been
726 pointed out that ``DIST`` entries do not have uncompressed variant
727 either.
728
729
730 Performance considerations
731 --------------------------
732
733 Performing a full-tree verification on every sync raises some
734 performance concerns for end-user systems. The initial testing has shown
735 that a cold-cache verification on a btrfs file system can take up around
736 4 minutes, with the process being mostly I/O bound. On the other hand,
737 it can be expected that the verification will be performed directly
738 after syncing, taking advantage of warm filesystem cache.
739
740 To improve speed on I/O and/or CPU-restrained systems even further,
741 the algorithms can be easily extended to perform incremental
742 verification. Given that rsync does not preserve mtimes by default,
743 the tool can take advantage of mtime and Manifest comparisons to recheck
744 only the parts of the repository that have changed.
745
746 Furthermore, the package manager implementations can restrict checking
747 only to the parts of the repository that are actually being used.
748
749
750 Backwards Compatibility
751 =======================
752
753 This GLEP provides optional means of preserving backwards compatibility.
754 To preserve the backwards compatibility, the following needs to be
755 ensured:
756
757 - all files within the package directory must be covered by ``Manifest``
758 file inside that package directory,
759
760 - all distfiles used by the package must be covered by ``Manifest``
761 file inside the package directory,
762
763 - all files inside the ``files/`` subdirectory of a package directory
764 need to be use the deprecated ``AUX`` tag (rather than ``DATA``),
765
766 - all ``.ebuild`` files inside the package directory need to use
767 the deprecated ``EBUILD`` tag (rather than ``DATA``),
768
769 - the Manifest files inside the package directory can be signed
770 to provide authenticity verification,
771
772 - an uncompressed Manifest file must exist in the package directory,
773 and a compressed Manifest of identical content may be present.
774
775 Once the backwards compatibility is no longer a concern, the above
776 no longer needs to hold and the deprecated tags can be removed.
777
778
779 Reference Implementation
780 ========================
781
782 The reference implementation for this GLEP is being developed
783 as the gemato project [#GEMATO]_.
784
785
786 Credits
787 =======
788
789 Thanks to all the people whose contributions were invaluable
790 to the creation of this GLEP. This includes but is not limited to:
791
792 - Robin Hugh Johnson,
793 - Ulrich Müller.
794
795 Additionally, thanks to Robin Hugh Johnson for the original
796 MataManifest GLEP series which served both as inspiration and source
797 of many concepts used in this GLEP. Recursively, also thanks to all
798 the people who contributed to the original GLEPs.
799
800
801 References
802 ==========
803
804 .. [#GLEP44] GLEP 44: Manifest2 format
805 (https://www.gentoo.org/glep/glep-0044.html)
806
807 .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
808 - Overview
809 (https://www.gentoo.org/glep/glep-0057.html)
810
811 .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
812 - Infrastructure to User distribution - MetaManifest
813 (https://www.gentoo.org/glep/glep-0058.html)
814
815 .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
816 (https://www.gentoo.org/glep/glep-0059.html)
817
818 .. [#GLEP60] GLEP 60: Manifest2 filetypes
819 (https://www.gentoo.org/glep/glep-0060.html)
820
821 .. [#GLEP61] GLEP 61: Manifest2 compression
822 (https://www.gentoo.org/glep/glep-0061.html)
823
824 .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
825 Format - SRC_URI
826 (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
827
828 .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
829 (https://www.ietf.org/rfc/rfc1321.txt)
830
831 .. [#RIPEMD160] The hash function RIPEMD-160
832 (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
833
834 .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
835 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
836
837 .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
838 (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
839
840 .. [#BLAKE2] BLAKE2 — fast secure hashing
841 (https://blake2.net/)
842
843 .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
844 and Extendable-Output Functions
845 (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
846
847 .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
848 (https://www.streebog.net/)
849
850 .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
851 (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-
852 package-managers.html)
853
854 .. [#GEMATO] gemato: Gentoo Manifest Tool
855 (https://github.com/mgorny/gemato/)
856
857 Copyright
858 =========
859 This work is licensed under the Creative Commons Attribution-ShareAlike
860 3.0
861 Unported License. To view a copy of this license, visit
862 http://creativecommons.org/licenses/by-sa/3.0/.
863
864 --
865 Best regards,
866 Michał Górny

Replies