Gentoo Archives: gentoo-commits

From: "Michał Górny" <mgorny@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] data/glep:glep-manifest commit in: /
Date: Sun, 29 Oct 2017 19:05:41
Message-Id: 1509139221.ae28a67b2402e3b37535234441bc97670ba535c4.mgorny@gentoo
1 commit: ae28a67b2402e3b37535234441bc97670ba535c4
2 Author: Michał Górny <mgorny <AT> gentoo <DOT> org>
3 AuthorDate: Sun Oct 22 13:19:20 2017 +0000
4 Commit: Michał Górny <mgorny <AT> gentoo <DOT> org>
5 CommitDate: Fri Oct 27 21:20:21 2017 +0000
6 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=ae28a67b
7
8 glep-0074: Full-tree verification using Manifest files
9
10 glep-0074.rst | 749 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
11 1 file changed, 749 insertions(+)
12
13 diff --git a/glep-0074.rst b/glep-0074.rst
14 new file mode 100644
15 index 0000000..e9f8bad
16 --- /dev/null
17 +++ b/glep-0074.rst
18 @@ -0,0 +1,749 @@
19 +---
20 +GLEP: 74
21 +Title: Full-tree verification using Manifest files
22 +Author: Michał Górny <mgorny@g.o>,
23 + Robin Hugh Johnson <robbat2@g.o>,
24 + Ulrich Müller <ulm@g.o>
25 +Type: Standards Track
26 +Status: Draft
27 +Version: 1
28 +Created: 2017-10-21
29 +Last-Modified: 2017-10-26
30 +Post-History: 2017-10-26
31 +Content-Type: text/x-rst
32 +Requires: 59, 61
33 +Replaces: 44, 58, 60
34 +---
35 +
36 +Abstract
37 +========
38 +
39 +This GLEP extends the Manifest file format to cover full-tree file
40 +integrity and authenticity checks.The format aims to be future-proof,
41 +efficient and provide means of backwards compatibility.
42 +
43 +
44 +Motivation
45 +==========
46 +
47 +The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
48 +means of verifying the integrity of distfiles and package files
49 +in Gentoo. Combined with OpenPGP signatures, they provide means to
50 +ensure the authenticity of the covered files. However, as noted
51 +in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
52 +authenticity verification as they do not cover any files outside
53 +the package directory. In particular, they provide multiple ways
54 +for a third party to inject malicious code into the ebuild environment.
55 +
56 +Historically, the topic of providing authenticity coverage for the whole
57 +repository has been mentioned multiple times. The most noteworthy effort
58 +are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
59 +They were accepted by the Council in 2010 but have never been
60 +implemented. When potential implementation work started in 2017, a new
61 +discussion about the specification arose. It prompted the creation
62 +of a competing GLEP that would provide a redesigned alternative to
63 +the old GLEPs.
64 +
65 +This specification is designed with the following goals in mind:
66 +
67 +1. It should provide means to ensure the authenticity of the complete
68 + repository, including preventing the injection of additional files.
69 +
70 +2. Alike the original Manifest2, the files should be split into two
71 + groups — files whose authenticity is critical, and those whose
72 + mismatch may be accepted in non-strict mode. The same classification
73 + should apply both to files listed in Manifests, and to stray files
74 + present only in the repository.
75 +
76 +3. The format should be universal enough to work both for the Gentoo
77 + repository and third-party repositories of different characteristics.
78 +
79 +4. The Manifest files should be verifiable stand-alone, that is without
80 + knowing any details about the underlying repository format.
81 +
82 +
83 +Specification
84 +=============
85 +
86 +Manifest file format
87 +--------------------
88 +
89 +This specification reuses and extends the Manifest file format defined
90 +in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
91 +repurposed as a generic *tag* that could also indicate additional
92 +(non-checksum) metadata. Appropriately, those tags can be followed by
93 +other space-separated values.
94 +
95 +Unless specified otherwise, the paths used in the Manifest files
96 +are relative to the directory containing the Manifest file. The paths
97 +must not reference the parent directory (``..``).
98 +
99 +
100 +Manifest file locations and nesting
101 +-----------------------------------
102 +
103 +The ``Manifest`` file located in the root directory of the repository
104 +is called top-level Manifest, and it is used to perform the full-tree
105 +verification. In order to verify the authenticity, it must be signed
106 +using OpenPGP, using the armored cleartext format.
107 +
108 +The top-level Manifest may reference sub-Manifests contained
109 +in subdirectories of the repository. The sub-Manifests are traditionally
110 +named ``Manifest``; however, the implementation must support arbitrary
111 +names, including the possibility of multiple (split) Manifests
112 +for a single directory. The sub-Manifest can only cover the files inside
113 +the directory tree where it resides.
114 +
115 +The sub-Manifest can also be signed using OpenPGP armored cleartext
116 +format. However, the signature verification can be omitted if it is
117 +covered by a signed top-level Manifest.
118 +
119 +The Manifest files can also specify ``IGNORE`` entries to skip Manifest
120 +verification of subdirectories and/or files. Files and directories
121 +starting with a dot are always implicitly ignored. All files that
122 +are not ignored must be covered by at least one of the Manifests.
123 +
124 +A single file may be matched by multiple identical or equivalent
125 +Manifest entries, if and only if the entries have the same semantics,
126 +specify the same size and the checksums common to both entries match.
127 +It is an error for a single file to be matched by multiple entries
128 +of different semantics, file size or checksum values. It is an error
129 +to specify another entry for a file matching ``IGNORE``, or one of its
130 +subdirectories.
131 +
132 +The file entries (except for ``IGNORE``) can be specified for regular
133 +files only. Symbolic links are followed when opening files. It is
134 +an error to specify an entry for a different file type.
135 +
136 +All the files covered by a Manifest tree must reside on the same
137 +filesystem. It is an error to specify entries applying to files
138 +on another filesystem. If subdirectories of the Manifest tree reside
139 +on a different filesystem, they must be explicitly excluded
140 +via ``IGNORE``.
141 +
142 +
143 +File verification
144 +-----------------
145 +
146 +When verifying a file against the Manifest, the following rules are
147 +used:
148 +
149 +- if a file listed in Manifest is not present, then the verification
150 + for the file fails,
151 +
152 +- if a file listed in Manifest is present but has a different size
153 + or one of the checksums does not match, the verification fails,
154 +
155 +- if a file is present but not listed in Manifest, the verification
156 + fails,
157 +
158 +- otherwise, the verification succeeds.
159 +
160 +Unless specified otherwise, the package manager must not allow using
161 +any files for which the verification failed. The package manager may
162 +reject any package or even the whole repository if it may refer to files
163 +for which the verification failed.
164 +
165 +
166 +New Manifest tags
167 +-----------------
168 +
169 +The Manifest files can specify the following tags:
170 +
171 +``TIMESTAMP <iso8601>``
172 + Specifies a timestamp of when the Manifest file was last updated.
173 + The timestamp must be a valid second-precision ISO8601 extended format
174 + combined date and time in UTC timezone, i.e. using the following
175 + ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
176 + in the top-level Manifest file. The package manager can use it
177 + to detect an outdated repository checkout.
178 +
179 +``MANIFEST <path> <size> <checksums>…``
180 + Specifies a sub-Manifest. The sub-Manifest must be verified like
181 + a regular file. If the verification succeeds, the entries from
182 + the sub-Manifest are included for verification as described
183 + in `Manifest file locations and nesting`_.
184 +
185 +``IGNORE <path>``
186 + Ignores a subdirectory or file from Manifest checks. If the specified
187 + path is present, it and its contents are omitted from the Manifest
188 + verification (always pass).
189 +
190 +``DATA <path> <size> <checksums>…``
191 + Specifies a file subject to obligatory Manifest verification.
192 + The file is required to pass verification. Used for all files directly
193 + affecting package manager operation (ebuilds, eclasses, profiles).
194 +
195 +``MISC <path> <size> <checksums>…``
196 + Specifies a file subject to non-obligatory Manifest verification.
197 + The package manager may ignore a verification failure if operating
198 + in non-strict mode. Used for files that do not affect the installed
199 + packages (``metadata.xml``, ``use.desc``).
200 +
201 +``OPTIONAL <path>``
202 + Specifies a file that would be subject to non-obligatory Manifest
203 + verification if it existed. The package may ignore a stray file
204 + matching this entry if operating in non-strict mode. Used for paths
205 + that would match ``MISC`` if they existed.
206 +
207 +``DIST <filename> <size> <checksums>…``
208 + Specifies a distfile entry used to verify files fetched as part
209 + of ``SRC_URI``. The filename must match the filename used to store
210 + the fetched file as specified in the PMS [#PMS-FETCH]_. The package
211 + manager must reject the fetched file if it fails verification.
212 + ``DIST`` entries apply to all packages below the Manifest file
213 + specifying them.
214 +
215 +
216 +Deprecated Manifest tags
217 +------------------------
218 +
219 +For backwards compatibility, the following tags are additionally
220 +allowed at the package directory level:
221 +
222 +``EBUILD <filename> <size> <checksums>…``
223 + Equivalent to the ``DATA`` type.
224 +
225 +``AUX <filename> <size> <checksums>…``
226 + Equivalent to the ``DATA`` type, except that the filename is relative
227 + to ``files/`` subdirectory.
228 +
229 +
230 +Algorithm for full-tree verification
231 +------------------------------------
232 +
233 +In order to perform full-tree verification, the following algorithm
234 +can be used:
235 +
236 +1. Collect all files present in the repository into *present* set.
237 +
238 +2. Start at the top-level Manifest file. Verify its OpenPGP signature.
239 + Optionally verify the ``TIMESTAMP`` entry if present. Remove
240 + the top-level Manifest from the *present* set.
241 +
242 +3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
243 + files according to `file verification`_ section, and include their
244 + entries in the current Manifest entry list (using paths relative
245 + to directories containing the Manifests).
246 +
247 +4. Process all ``IGNORE`` entries. Remove any paths matching them
248 + from the *present* set.
249 +
250 +5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
251 + ``EBUILD`` and ``AUX`` entries into the *covered* set.
252 +
253 +6. Verify all the files in the union of the *present* and *covered*
254 + sets, according to `file verification`_ section.
255 +
256 +
257 +Algorithm for finding parent Manifests
258 +--------------------------------------
259 +
260 +In order to find the top-level Manifest from the current directory
261 +the following algorithm can be used:
262 +
263 +1. Store the current directory as *original* and the device ID
264 + of the containing filesystem (``st_dev``) as *startdev*,
265 +
266 +2. If the device ID of the containing filesystem (``st_dev``)
267 + of the current directory is different than *startdev*, stop.
268 +
269 +3. If the current directory contains a ``Manifest`` file:
270 +
271 + a. If a ``IGNORE`` entry in the ``Manifest`` file covers
272 + the *original* directory (or one of the parent directories), stop.
273 +
274 + b. Otherwise, store the current directory as *last_found*.
275 +
276 +4. If the current directory is the root system directory (``/``), stop.
277 +
278 +5. Otherwise, enter the parent directory and jump to step 2.
279 +
280 +Once the algorithm stops, *last_found* will contain the relevant
281 +top-level Manifest. If *last_found* is null, then the directory tree
282 +does not contain any valid top-level Manifest candidates and one should
283 +be created in the *original* directory.
284 +
285 +Once the top-level Manifest is found, its ``MANIFEST`` entries should
286 +be used to find any sub-Manifests below the top-level Manifest,
287 +up to and including the *original* directory. Note that those
288 +sub-Manifests can use different filenames than ``Manifest``.
289 +
290 +
291 +Checksum algorithms
292 +-------------------
293 +
294 +This section is informational only. Specifying the exact set
295 +of supported algorithms is outside the scope of this specification.
296 +
297 +The algorithm names reserved at the time of writing are:
298 +
299 +- ``MD5`` [#MD5]_,
300 +- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
301 +- ``SHA1`` [#SHS]_,
302 +- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
303 +- ``WHIRLPOOL`` [#WHIRLPOOL]_,
304 +- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
305 +- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
306 +- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
307 + [#STREEBOG]_.
308 +
309 +The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
310 +It is recommended that any new hashes are named after the Python
311 +``hashlib`` module algorithm names, transformed into uppercase.
312 +
313 +
314 +Manifest compression
315 +--------------------
316 +
317 +The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
318 +This section merely addresses interoperability issues between Manifest
319 +compression and this specification.
320 +
321 +The compressed Manifest files are required to be suffixed for their
322 +compression algorithm. This suffix should be used to recognize
323 +the compression and decompress Manifests transparently. The exact list
324 +of algorithms and their corresponding suffixes are outside the scope
325 +of this specification.
326 +
327 +Whenever this specification refers to top-level Manifest file,
328 +the implementation should account for compressed variants of this file
329 +with appropriate suffixes (e.g. ``Manifest.gz``).
330 +
331 +Whenever this specification refers to sub-Manifests, they can use any
332 +names but are also required to use a specific compression suffix.
333 +The ``MANIFEST`` entries are required to specify the full name including
334 +compression suffix, and the verification is performed on the compressed
335 +file.
336 +
337 +The specification permits uncompressed Manifests to exist alongside
338 +their compressed counterparts, and multiple compressed formats
339 +to coexist. If that is the case, the files must have the same
340 +uncompressed content and the specification is free to choose either
341 +of the files using the same base name.
342 +
343 +
344 +Rationale
345 +=========
346 +
347 +Stand-alone format
348 +------------------
349 +
350 +The first question that needed to be asked before proceeding with
351 +the design was whether the Manifest file format was supposed to be
352 +stand-alone, or tightly bound to the repository format.
353 +
354 +The stand-alone format has been selected because of its three
355 +advantages:
356 +
357 +1. It is more future-proof. If an incompatible change to the repository
358 + format is introduced, only developers need to be upgrade the tools
359 + they use to generate the Manifests. The tools used to verify
360 + the updated Manifests will continue to work.
361 +
362 +2. It is more flexible and universal. With a dedicated tool,
363 + the Manifest files can be used to sign and verify arbitrary file
364 + sets.
365 +
366 +3. It keeps the verification tool simpler. In particular, we can easily
367 + write an independent verification tool that could work on any
368 + distribution without needing to depend on a package manager
369 + implementation or rewrite parts of it.
370 +
371 +Designing a stand-alone format requires that the Manifest carries enough
372 +information to perform the verification following all the rules specific
373 +to the Gentoo repository.
374 +
375 +
376 +Tree design
377 +-----------
378 +
379 +The second important point of the design was determining whether
380 +the Manifest files should be structured hierarchically, or independent.
381 +Both options have their advantages.
382 +
383 +In the hierarchical model, each sub-Manifest file is covered by a higher
384 +level Manifest. As a result, only the top-level Manifest has to be
385 +OpenPGP-signed, and subsequent Manifests need to be only verified by
386 +checksum stored in the parent Manifest. This has the following
387 +implications:
388 +
389 +- Verifying any set of files in the repository requires using checksums
390 + from the most relevant Manifests and the parent Manifests.
391 +
392 +- The OpenPGP signature of the top-level Manifest needs to be verified
393 + only once per process.
394 +
395 +- Altering any set of files requires updating the relevant Manifests,
396 + and their parent Manifests up to the top-level Manifest, and signing
397 + the last one.
398 +
399 +- As a result, the top-level Manifest changes on every commit,
400 + and various middle-level Manifests change (and need to be transferred)
401 + frequently.
402 +
403 +In the independent model, each sub-Manifest file is independent
404 +of the parent Manifests. As a result, each of them needs to be signed
405 +and verified independently. However, the parent Manifests still need
406 +to list sub-Manifests (albeit without verification data) in order
407 +to detect removal or replacement of subdirectories. This has
408 +the following implications:
409 +
410 +- Verifying any set of files in the repository requires using checksums
411 + and verifying signatures of the most relevant Manifest files.
412 +
413 +- Altering any set of files requires updating the relevant Manifests
414 + and signing them again.
415 +
416 +- Parent Manifests are updated only when Manifests are added or removed
417 + from subdirectories. As a result, they change infrequently.
418 +
419 +While both models have their advantages, the hierarchical model was
420 +selected because it reduces the number of OpenPGP operations
421 +which are comparatively costly to the minimum.
422 +
423 +
424 +Tree layout restrictions
425 +------------------------
426 +
427 +The algorithm is meant to work primarily with ebuild repositories which
428 +normally contain only files and directories. Directories provide
429 +no useful metadata for verification, and specifying special entries
430 +for additional file types is purposeless. Therefore, the specification
431 +is restricted to dealing with regular files.
432 +
433 +The Gentoo repository does not use symbolic links. Some Gentoo
434 +repositories do, however. To provide a simple solution for dealing with
435 +symlinks without having to take care to implement special handling for
436 +them, the common behavior of implicitly resolving them is used.
437 +Therefore, symbolic links to files are stored as if they were regular
438 +files, and symbolic links to directories are followed as if they were
439 +regular directories.
440 +
441 +Dotfiles are implicitly ignored as that is a common notion used
442 +in software written for POSIX systems. All other filenames require
443 +explicit ``IGNORE`` lines.
444 +
445 +The algorithm is restricted to work on a single filesystem. This is
446 +mostly relevant when scanning for top-level Manifest — we do not want
447 +to cross filesystem boundaries then. However, to ensure consistent
448 +bidirectional behavior we need to also ban them when operating downwards
449 +the tree.
450 +
451 +The directories and files on different filesystems needs to be ignored
452 +explicitly as implicitly skipping them would cause confusion.
453 +In particular, tools might then claim that a file does not exist when
454 +it clearly does because it was skipped due to filesystem boundaries.
455 +
456 +
457 +File verification model
458 +-----------------------
459 +
460 +The verification model aims to provide full coverage against different
461 +forms of attack. In particular, three different kinds of manipulation
462 +are considered:
463 +
464 +1. Alteration of the file content.
465 +
466 +2. Removal of a file.
467 +
468 +3. Addition of a new file.
469 +
470 +In order to prevent against all three, the system requires that all
471 +files in the repository are listed in Manifests and verified against
472 +them.
473 +
474 +As a special case, ignores are allowed to account for directories
475 +that are not part of the repository but were traditionally placed inside
476 +it. Those directories were ``distfiles``, ``local`` and ``packages``. It
477 +could be also used to ignore VCS directories such as ``CVS``.
478 +
479 +
480 +Non-obligatory Manifest verification
481 +------------------------------------
482 +
483 +While this specification recommends all tools to use strict verification
484 +by default, it allows declaring some files as non-obligatory like
485 +the original Manifest2 format did. This could be used on files that do
486 +not affect the normal package manager operation.
487 +
488 +It aims to account for two use cases:
489 +
490 +1. Stripping down files that are not strictly required to install
491 + packages from repository checkouts.
492 +
493 +2. Accounting for automatically generated files that might be updated
494 + by standard tooling.
495 +
496 +The traditional ``MISC`` type is amended with a complementary
497 +``OPTIONAL`` tag to account for files that are not provided
498 +in the specific repository. It aims to ensure that the same path would
499 +be non-fatal when provided by the repository but fatal when created
500 +by the user tooling.
501 +
502 +
503 +Timestamp field
504 +---------------
505 +
506 +The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
507 +to include a generation timestamp in the Manifest. A similar feature
508 +was originally proposed in GLEP 58 [#GLEP58]_.
509 +
510 +The timestamp can be used to detect delay or replay attacks against
511 +Gentoo mirrors.
512 +
513 +Strictly speaking, this is already provided by the various
514 +``metadata/timestamp.*`` files provided already by Gentoo which are also
515 +covered by the Manifest. However, including the value in the Manifest
516 +itself has a little cost and provides the ability to perform
517 +the verification stand-alone.
518 +
519 +
520 +New vs deprecated tags
521 +----------------------
522 +
523 +Out of the four types defined by Manifest2, two are reused and two are
524 +marked deprecated.
525 +
526 +The ``DIST`` and ``MISC`` tags are reused since they can be relatively
527 +clearly marked into the new concept.
528 +
529 +The ``EBUILD`` tag could potentially be reused for generic file
530 +verification data. However, it would be confusing if all the different
531 +data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
532 +type was introduced as a replacement.
533 +
534 +The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
535 +the limiting property of implicit ``files/`` path prefix.
536 +
537 +
538 +Finding top-level Manifest
539 +--------------------------
540 +
541 +The development of a reference implementation for this GLEP has brought
542 +the following problem: how to find all the relevant Manifests when
543 +the Manifest tool is run inside a subdirectory of the repository?
544 +
545 +One of the options would be to provide a bi-directional linking
546 +of Manifests via a ``PARENT`` tag. However, that would not solve
547 +the problem when a new Manifest file is being created.
548 +
549 +Instead, an algorithm for iterating over parent directories is proposed.
550 +Since there is no obligatory explicit indicator for the top-level
551 +Manifest, the algorithm assumes that the top-level Manifest
552 +is the highest ``Manifest`` in the directory hierarchy that can cover
553 +the current directory. This generally makes sense since the Manifest
554 +files are required to provide coverage for all subdirectories, so all
555 +Manifests starting from that one need to be updated.
556 +
557 +If independent Manifest trees are nested in the directory structure,
558 +then an ``IGNORE`` entry needs to be used to separate them.
559 +
560 +Since sub-Manifests can use any filenames, the Manifest finding
561 +algorithm must not short-cut the procedure by storing all ``Manifest``
562 +files along the parent directories. Instead, it needs to retrace
563 +the relevant sub-Manifest files along ``MANIFEST`` entries
564 +in the top-level Manifest.
565 +
566 +
567 +Injecting ChangeLogs into the checkout
568 +--------------------------------------
569 +
570 +One of the problems considered in the new Manifest format was that
571 +of injecting historical and autogenerated ChangeLog into the repository.
572 +Normally we are not including those files to reduce the checkout size.
573 +However, some users have shown interest in them and Infra is working
574 +on providing them via an additional rsync module.
575 +
576 +If such files were injected into the repository, they would cause strict
577 +verification failures of Manifests. To account for this, Infra could
578 +provide either ``OPTIONAL`` entries for the Manifest files to allow them
579 +in non-strict verification mode, or ``IGNORE`` entries to allow them
580 +in the strict mode.
581 +
582 +
583 +Splitting distfile checksums from file checksums
584 +------------------------------------------------
585 +
586 +Another problem with the current Manifest format is that the checksums
587 +for fetched files are combined with checksums for local files
588 +in a single file inside the package directory. It has been specifically
589 +pointed out that:
590 +
591 +- since distfiles are sometimes reused across different packages,
592 + the repeating checksums are redundant,
593 +
594 +- mirror admins were interested in the possibility of verifying all
595 + the distfiles with a single tool.
596 +
597 +This specification does not provide a clean solution to this problem.
598 +It technically permits moving ``DIST`` entries to higher-level Manifests
599 +but the usefulness of such a solution is doubtful.
600 +
601 +However, for the second problem we will probably deliver a dedicated
602 +tool working with this Manifest format.
603 +
604 +
605 +Hash algorithms
606 +---------------
607 +
608 +While maintaining a consistent supported hash set is important
609 +for interoperability, it is no good fit for the generic layout of this
610 +GLEP. Furthermore, it would require updating the GLEP in the future
611 +every time the used algorithms change.
612 +
613 +Instead, the specification focuses on listing the currently used
614 +algorithm names for interoperability, and sets a recommendation
615 +for consistent naming of algorithms in the future. The Python
616 +``hashlib`` module is used as a reference since it is used
617 +as the provider of hash functions for most of the Python software,
618 +including Portage and PkgCore.
619 +
620 +The basic rules for changing hash algorithms are defined in GLEP 59
621 +[#GLEP59]_. The implementations can focus only on those algorithms
622 +that are actually used or planned on being used. It may be feasible
623 +to devise a new GLEP that specifies the currently used hashes (or update
624 +GLEP 59 accordingly).
625 +
626 +
627 +Manifest compression
628 +--------------------
629 +
630 +The support for Manifest compression is introduced with minimal changes
631 +to the file format. The ``MANIFEST`` entries are required to provide
632 +the real (compressed) file path for compatibility with other file
633 +entries and to avoid confusion.
634 +
635 +The existence of additional entries for uncompressed Manifest checksums
636 +was debated. However, plain entries for the uncompressed file would
637 +be confusing if only compressed file existed, and conflicting if both
638 +uncompressed and compressed variants existed. Furthermore, it has been
639 +pointed out that ``DIST`` entries do not have uncompressed variant
640 +either.
641 +
642 +
643 +Performance considerations
644 +--------------------------
645 +
646 +Performing a full-tree verification on every sync raises some
647 +performance concerns for end-user systems. The initial testing has shown
648 +that a cold-cache verification on a btrfs file system can take up around
649 +4 minutes, with the process being mostly I/O bound. On the other hand,
650 +it can be expected that the verification will be performed directly
651 +after syncing, taking advantage of warm filesystem cache.
652 +
653 +To improve speed on I/O and/or CPU-restrained systems even further,
654 +the algorithms can be easily extended to perform incremental
655 +verification. Given that rsync does not preserve mtimes by default,
656 +the tool can take advantage of mtime and Manifest comparisons to recheck
657 +only the parts of the repository that have changed.
658 +
659 +Furthermore, the package manager implementations can restrict checking
660 +only to the parts of the repository that are actually being used.
661 +
662 +
663 +Backwards Compatibility
664 +=======================
665 +
666 +This GLEP provides optional means of preserving backwards compatibility.
667 +To preserve the backwards compatibility, the following needs to be
668 +ensured:
669 +
670 +- all files within the package directory must be covered by ``Manifest``
671 + file inside that package directory,
672 +
673 +- all distfiles used by the package must be covered by ``Manifest``
674 + file inside the package directory,
675 +
676 +- all files inside the ``files/`` subdirectory of a package directory
677 + need to be use the deprecated ``AUX`` tag (rather than ``DATA``),
678 +
679 +- all ``.ebuild`` files inside the package directory need to use
680 + the deprecated ``EBUILD`` tag (rather than ``DATA``),
681 +
682 +- the Manifest files inside the package directory can be signed
683 + to provide authenticity verification.
684 +
685 +Once the backwards compatibility is no longer a concern, the above
686 +no longer needs to hold and the deprecated tags can be removed.
687 +
688 +
689 +Reference Implementation
690 +========================
691 +
692 +The reference implementation for this GLEP is being developed
693 +as the gemato project [#GEMATO]_.
694 +
695 +
696 +Credits
697 +=======
698 +
699 +Thanks to all the people whose contributions were invaluable
700 +to the creation of this GLEP. This includes but is not limited to:
701 +
702 +- Robin Hugh Johnson,
703 +- Ulrich Müller.
704 +
705 +Additionally, thanks to Robin Hugh Johnson for the original
706 +MataManifest GLEP series which served both as inspiration and source
707 +of many concepts used in this GLEP. Recursively, also thanks to all
708 +the people who contributed to the original GLEPs.
709 +
710 +
711 +References
712 +==========
713 +
714 +.. [#GLEP44] GLEP 44: Manifest2 format
715 + (https://www.gentoo.org/glep/glep-0044.html)
716 +
717 +.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
718 + - Overview
719 + (https://www.gentoo.org/glep/glep-0057.html)
720 +
721 +.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
722 + - Infrastructure to User distribution - MetaManifest
723 + (https://www.gentoo.org/glep/glep-0058.html)
724 +
725 +.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
726 + (https://www.gentoo.org/glep/glep-0059.html)
727 +
728 +.. [#GLEP60] GLEP 60: Manifest2 filetypes
729 + (https://www.gentoo.org/glep/glep-0060.html)
730 +
731 +.. [#GLEP61] GLEP 61: Manifest2 compression
732 + (https://www.gentoo.org/glep/glep-0061.html)
733 +
734 +.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
735 + Format - SRC_URI
736 + (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
737 +
738 +.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
739 + (https://www.ietf.org/rfc/rfc1321.txt)
740 +
741 +.. [#RIPEMD160] The hash function RIPEMD-160
742 + (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
743 +
744 +.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
745 + (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
746 +
747 +.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
748 + (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
749 +
750 +.. [#BLAKE2] BLAKE2 — fast secure hashing
751 + (https://blake2.net/)
752 +
753 +.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
754 + and Extendable-Output Functions
755 + (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
756 +
757 +.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
758 + (https://www.streebog.net/)
759 +
760 +.. [#GEMATO] gemato: Gentoo Manifest Tool
761 + (https://github.com/mgorny/gemato/)
762 +
763 +Copyright
764 +=========
765 +This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
766 +Unported License. To view a copy of this license, visit
767 +http://creativecommons.org/licenses/by-sa/3.0/.