Gentoo Archives: gentoo-dev

From: Fabian Groffen <grobian@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v5]
Date: Fri, 01 Dec 2017 11:30:29
Message-Id: 20171201113004.GS829@gentoo.org
In Reply to: Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v5] by "Michał Górny"
1 Hi,
2
3 While trying to implement full tree Manifests for the Prefix tree, I ran
4 into the following:
5
6 Would it be possible to add a section to define what directories receive
7 what kind of Manifest?
8
9 I mean in particular what is encoded in gemato/profile.py, the metadata
10 directory is an interesting mix and match of subdirectories that have a
11 Manifest of their own, and subdirectories whose content is included in
12 the Manifest at the metadata level.
13
14 More specifically, it seems like in the current GLEP it doesn't mention
15 what directories should have their own Manifest or not. It would be
16 good to know if for instance adding Manifest(.gz) to
17 metadata/install-qa-check.d is ok as per GLEP or not (and if so, the
18 consumer of that directory should be fixed to ignore the Manifest*
19 files, instead of barking it can't source the gz file or doesn't get
20 it). Also, what if someone would want to include all entries in the
21 top-level Manifest, would that be OK (albeit stupid I guess)?
22
23 I think it would be a good addition to specify (for a Gentoo tree) what
24 directories receive a Manifest file and what their content is.
25
26 In addition to this, because it is related, it would be nice to also
27 document the IGNORE entries that seem present at the top-level and
28 metadata-level, or specify where they would come from for the Gentoo
29 case.
30
31 Thanks!
32 Fabian
33
34 On 23-11-2017 21:53:57 +0100, Michał Górny wrote:
35 > W dniu czw, 16.11.2017 o godzinie 11∶19 +0100, użytkownik Michał Górny
36 > napisał:
37 > > Hi, everyone.
38 > >
39 > > Here's the updated version of GLEP 74 taking into consideration
40 > > the points made during the Council pre-review.
41 > >
42 > > ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
43 > > HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
44 > >
45 > > Changes:
46 >
47 > 27c2a9e glep-0074: Grammar corrections from Ulrich Müller
48 > d39f865 glep-0074: Make extended filename encoding optional
49 > ed111f8 glep-0074: Always exclude control characters
50 >
51 > ---
52 > GLEP: 74
53 > Title: Full-tree verification using Manifest files
54 > Author: Michał Górny <mgorny@g.o>,
55 > Robin Hugh Johnson <robbat2@g.o>,
56 > Ulrich Müller <ulm@g.o>
57 > Type: Standards Track
58 > Status: Draft
59 > Version: 1
60 > Created: 2017-10-21
61 > Last-Modified: 2017-11-23
62 > Post-History: 2017-10-26, 2017-11-16
63 > Content-Type: text/x-rst
64 > Requires: 59, 61
65 > Replaces: 44, 58, 60
66 > ---
67 >
68 > Abstract
69 > ========
70 >
71 > This GLEP extends the Manifest file format to cover full-tree file
72 > integrity and authenticity checks. The format aims to be future-proof,
73 > efficient and provide means of backwards compatibility.
74 >
75 >
76 > Motivation
77 > ==========
78 >
79 > The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
80 > means of verifying the integrity of distfiles and package files
81 > in Gentoo. Combined with OpenPGP signatures, they provide means to
82 > ensure the authenticity of the covered files. However, as noted
83 > in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
84 > authenticity verification as they do not cover any files outside
85 > the package directory. In particular, they provide multiple ways
86 > for a third party to inject malicious code into the ebuild environment.
87 >
88 > Historically, the topic of providing authenticity coverage for the whole
89 > repository has been mentioned multiple times. The most noteworthy effort
90 > are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
91 > They were accepted by the Council in 2010 but have never been
92 > implemented. When potential implementation work started in 2017, a new
93 > discussion about the specification arose. It prompted the creation
94 > of a competing GLEP that would provide a redesigned alternative to
95 > the old GLEPs.
96 >
97 > This specification is designed with the following goals in mind:
98 >
99 > 1. It should provide means to ensure the authenticity of the complete
100 > repository, including preventing the injection of additional files.
101 >
102 > 2. The format should be universal enough to work both for the Gentoo
103 > repository and third-party repositories of different characteristics.
104 >
105 > 3. The Manifest files should be verifiable stand-alone, that is without
106 > knowing any details about the underlying repository format.
107 >
108 >
109 > Specification
110 > =============
111 >
112 > Manifest file format
113 > --------------------
114 >
115 > This specification reuses and extends the Manifest file format defined
116 > in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
117 > repurposed as a generic *tag* that could also indicate additional
118 > (non-checksum) metadata. Appropriately, those tags can be followed by
119 > other space-separated values.
120 >
121 > Unless specified otherwise, the paths used in the Manifest files
122 > are relative to the directory containing the Manifest file. The paths
123 > must not reference the parent directory (``..``). Forward slash (``/``)
124 > is used as path component separator.
125 >
126 > The Manifest files use UTF-8 encoding.
127 >
128 >
129 > Manifest file locations and nesting
130 > -----------------------------------
131 >
132 > The ``Manifest`` file located in the root directory of the repository
133 > is called top-level Manifest, and it is used to perform the full-tree
134 > verification. In order to verify the authenticity, it must be signed
135 > using OpenPGP, using the armored cleartext format.
136 >
137 > The top-level Manifest may reference sub-Manifests contained
138 > in subdirectories of the repository. The sub-Manifests are traditionally
139 > named ``Manifest``; however, the implementation must support arbitrary
140 > names, including the possibility of multiple (split) Manifests
141 > for a single directory. The sub-Manifest can only cover the files inside
142 > the directory tree where it resides.
143 >
144 > The sub-Manifest can also be signed using OpenPGP armored cleartext
145 > format. However, the signature verification can be omitted since it
146 > already is covered by the signed top-level Manifest.
147 >
148 >
149 > Directory tree coverage
150 > -----------------------
151 >
152 > The specification provides three ways of skipping Manifest verification
153 > of specific files and directories (recursively):
154 >
155 > 1. explicit ``IGNORE`` entries in Manifest files,
156 >
157 > 2. injected ignore paths via package manager configuration,
158 >
159 > 3. using names starting with a dot (``.``) which are always skipped.
160 >
161 > All files that are not ignored must be covered by at least one
162 > of the Manifests.
163 >
164 > A single file may be matched by multiple identical or equivalent
165 > Manifest entries, if and only if the entries have the same semantics,
166 > specify the same size and the checksums common to both entries match.
167 > It is an error for a single file to be matched by multiple entries
168 > of different semantics, file size or checksum values. It is an error
169 > to specify another entry for a file that matches ``IGNORE``, or that
170 > is located inside an ignored directory.
171 >
172 > The file entries (except for ``IGNORE``) can be specified for regular
173 > files only. Symbolic links are followed when opening files
174 > and traversing directories. It is an error to specify an entry for
175 > a different file type. If the tree contain files of other types
176 > that are not otherwise ignored, they need to be covered by an explicit
177 > ``IGNORE``.
178 >
179 > All the local (non-``DIST``) files covered by a Manifest tree must
180 > reside on the same filesystem. It is an error to specify entries
181 > applying to files on another filesystem. If files or directories that
182 > are not otherwise ignored reside on a different filesystem, or symbolic
183 > links point to targets on a different filesystem, they must
184 > be explicitly excluded via ``IGNORE``.
185 >
186 >
187 > Path and filename encoding
188 > --------------------------
189 >
190 > The path fields in the Manifest file must consist of characters
191 > corresponding to valid UTF-8 code points excluding the backwards slash
192 > (``\``) and characters classified as control characters or as whitespace
193 > in the current version of the Unicode standard [#UNICODE]_.
194 >
195 > The implementation can optionally support extended filename encoding
196 > to support those paths. If encoding is not supported, the implementation
197 > must reject directories containing any files using non-compliant names,
198 > as well as Manifest files whose filename field contains such filenames.
199 >
200 > If encoding is supported, then all of the excluded characters that
201 > are present in paths must be encoded using one of the following escape
202 > sequences:
203 >
204 > - characters in the ``U+0000`` to ``U+007F`` range can be encoded
205 > as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal
206 > character code,
207 >
208 > - characters in the ``U+0000`` to ``U+FFFF`` range can be encoded
209 > as ``\uHHHH`` where ``HHHH`` specifies the zero-padded, hexadecimal
210 > character code,
211 >
212 > - characters in the UCS-4 range can be encoded as ``\UHHHHHHHH``
213 > where ``HHHHHHHH`` specifies the zero-padded, hexadecimal character
214 > code.
215 >
216 > It is invalid for the backwards slash to be used in any other context,
217 > and a backwards slash present in filename must be encoded. A backwards
218 > slash used as a path component separator should be replaced by a forward
219 > slash instead.
220 >
221 > The encoding can be used for other characters as well. In particular,
222 > escaping non-printable characters might be desirable.
223 >
224 >
225 > File verification
226 > -----------------
227 >
228 > When verifying a file against the Manifest, the following rules are
229 > used:
230 >
231 > 1. If the file is covered directly or indirectly by an entry
232 > of the ``IGNORE`` type, the verification always succeeds.
233 >
234 > 2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
235 > ``MISC``, ``EBUILD`` or ``AUX`` type:
236 >
237 > a. if the file is not present, then the verification fails,
238 >
239 > b. if the file is present but has a different size or one
240 > of the checksums does not match, the verification fails,
241 >
242 > c. otherwise, the verification succeeds.
243 >
244 > 3. If the file is present but not listed in Manifest, the verification
245 > fails.
246 >
247 > Unless specified otherwise, the package manager must not allow using
248 > any files for which the verification failed. The package manager may
249 > reject any package or even the whole repository if it may refer to files
250 > for which the verification failed.
251 >
252 >
253 > Timestamp verification
254 > ----------------------
255 >
256 > The top-level Manifest file can contain a ``TIMESTAMP`` entry to account
257 > for attacks against tree update distribution. If such an entry
258 > is present, it should be updated every time at least one
259 > of the Manifests changes. Every unique timestamp value must correspond
260 > to a single tree state.
261 >
262 > During the verification process, the client should compare the timestamp
263 > against the update time obtained from a local clock or a trusted time
264 > source. If the comparison result indicates that the Manifest at the time
265 > of receiving was already significantly outdated, the client should
266 > either fail the verification or require manual confirmation from
267 > the user.
268 >
269 > Furthermore, the Manifest provider may employ additional methods
270 > of distributing the timestamps of recently generated Manifests
271 > using a secure channel from a trusted source for exact comparison.
272 > The exact details of such a solution are outside the scope of this
273 > specification.
274 >
275 > ``TIMESTAMP`` entries may also be present in sub-Manifests. Those
276 > timestamps must not be newer than the timestamp of the top-level
277 > Manifest (if present). This specification does not define any specific
278 > use for them.
279 >
280 >
281 > Modern Manifest tags
282 > --------------------
283 >
284 > The Manifest files can specify the following tags:
285 >
286 > ``TIMESTAMP <iso8601>``
287 > Specifies a timestamp of when the Manifest file was last updated.
288 > The timestamp must be a valid second-precision ISO 8601 extended
289 > format combined date and time in UTC timezone, i.e. using
290 > the following ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``.
291 > Optional. The package manager can use it to detect an outdated
292 > repository checkout as described in `Timestamp verification`_.
293 >
294 > ``MANIFEST <path> <size> <checksums>...``
295 > Specifies a sub-Manifest. The sub-Manifest must be verified like
296 > a regular file. If the verification succeeds, the entries from
297 > the sub-Manifest are included for verification as described
298 > in `Manifest file locations and nesting`_.
299 >
300 > ``IGNORE <path>``
301 > Ignores a subdirectory or file from Manifest checks. If the specified
302 > path is present, it and its contents are omitted from the Manifest
303 > verification (always pass). *Path* must be a plain file or directory
304 > path without a trailing slash. Wildcards are not supported
305 > and wildcard characters are interpreted literally.
306 >
307 > ``DATA <path> <size> <checksums>...``
308 > Specifies a regular file subject to Manifest verification. The file
309 > is required to pass verification. Used for all files that do not match
310 > any other type.
311 >
312 > ``DIST <filename> <size> <checksums>...``
313 > Specifies a distfile entry used to verify files fetched as part
314 > of ``SRC_URI``. The filename must match the filename used to store
315 > the fetched file as specified in the PMS [#PMS-FETCH]_. The package
316 > manager must reject the fetched file if it fails verification.
317 > ``DIST`` entries apply to all packages below the Manifest file
318 > specifying them.
319 >
320 >
321 > Deprecated Manifest tags
322 > ------------------------
323 >
324 > For backwards compatibility, the following tags are additionally
325 > allowed at the package directory level:
326 >
327 > ``EBUILD <filename> <size> <checksums>...``
328 > Equivalent to the ``DATA`` type.
329 >
330 > ``MISC <path> <size> <checksums>...``
331 > Equivalent to the ``DATA`` type. Historically indicated that
332 > the package manager may ignore a verification failure if operating
333 > in non-strict mode. However, that behavior is deprecated.
334 >
335 > ``AUX <filename> <size> <checksums>...``
336 > Equivalent to the ``DATA`` type, except that the filename is relative
337 > to the ``files/`` subdirectory.
338 >
339 >
340 > Algorithm for full-tree verification
341 > ------------------------------------
342 >
343 > In order to perform full-tree verification, the following algorithm
344 > can be used:
345 >
346 > 1. Collect all files present in the repository into *present* set.
347 >
348 > 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
349 > Optionally verify the ``TIMESTAMP`` entry if present as specified
350 > in `timestamp verification`. Remove the top-level Manifest
351 > from the *present* set.
352 >
353 > 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
354 > files according to the `file verification`_ section, and include
355 > their entries in the current Manifest entry list (using paths
356 > relative to directories containing the Manifests).
357 >
358 > 4. Process all ``IGNORE`` entries. Remove any paths matching them
359 > from the *present* set.
360 >
361 > 5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
362 > and ``AUX`` entries into the *covered* set.
363 >
364 > 6. Verify the entries in the *covered* set for incompatible duplicates
365 > and collisions with ignored files as explained in `Manifest file
366 > locations and nesting`_.
367 >
368 > 7. Verify all the files in the union of the *present* and *covered*
369 > sets, according to the `file verification`_ section.
370 >
371 >
372 > Algorithm for finding parent Manifests
373 > --------------------------------------
374 >
375 > In order to find the top-level Manifest from the current directory
376 > the following algorithm can be used:
377 >
378 > 1. Store the current directory as *original* and the device ID
379 > of the containing filesystem (``st_dev``) as *startdev*,
380 >
381 > 2. If the device ID of the containing filesystem (``st_dev``)
382 > of the current directory is different than *startdev*, stop.
383 >
384 > 3. If the current directory contains a ``Manifest`` file:
385 >
386 > a. If an ``IGNORE`` entry in the ``Manifest`` file covers
387 > the *original* directory (or one of the parent directories), stop.
388 >
389 > b. Otherwise, store the current directory as *last_found*.
390 >
391 > 4. If the current directory is the root system directory (``/``), stop.
392 >
393 > 5. Otherwise, enter the parent directory and jump to step 2.
394 >
395 > Once the algorithm stops, *last_found* will contain the relevant
396 > top-level Manifest. If *last_found* is null, then the directory tree
397 > does not contain any valid top-level Manifest candidates and one should
398 > be created in the *original* directory.
399 >
400 > Once the top-level Manifest is found, its ``MANIFEST`` entries should
401 > be used to find any sub-Manifests below the top-level Manifest,
402 > up to and including the *original* directory. Note that those
403 > sub-Manifests can use different filenames than ``Manifest``.
404 >
405 >
406 > Checksum algorithms
407 > -------------------
408 >
409 > This section is informational only. Specifying the exact set
410 > of supported algorithms is outside the scope of this specification.
411 >
412 > The algorithm names reserved at the time of writing are:
413 >
414 > - ``MD5`` [#MD5]_,
415 > - ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_,
416 > - ``SHA1`` [#SHS]_,
417 > - ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_,
418 > - ``WHIRLPOOL`` [#WHIRLPOOL]_,
419 > - ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_,
420 > - ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_,
421 > - ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes
422 > [#STREEBOG]_.
423 >
424 > The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
425 > It is recommended that any new hashes are named after the Python
426 > ``hashlib`` module algorithm names, transformed into uppercase.
427 >
428 >
429 > Manifest compression
430 > --------------------
431 >
432 > The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
433 > This section merely addresses interoperability issues between Manifest
434 > compression and this specification.
435 >
436 > The compressed Manifest files are required to be suffixed for their
437 > compression algorithm. This suffix should be used to recognize
438 > the compression and decompress Manifests transparently. The exact list
439 > of algorithms and their corresponding suffixes are outside the scope
440 > of this specification.
441 >
442 > The top-level Manifest file must not be compressed. Since the OpenPGP
443 > signature covers the uncompressed text and is compressed itself,
444 > the data would have to be decompressed without any prior verification.
445 > This could expose users e.g. to zip bombs or exploits on decompressor
446 > vulnerabilities.
447 >
448 > Whenever this specification refers to sub-Manifests, they can use any
449 > names but are also required to use a specific compression suffix.
450 > The ``MANIFEST`` entries are required to specify the full name including
451 > compression suffix, and the verification is performed on the compressed
452 > file.
453 >
454 > The specification permits uncompressed Manifests to exist alongside
455 > their compressed counterparts, and multiple compressed formats
456 > to coexist. If that is the case, the files must have the same
457 > uncompressed content and the specification is free to choose either
458 > of the files using the same base name.
459 >
460 >
461 > Combining multiple Manifest trees (informational)
462 > -------------------------------------------------
463 >
464 > This specification permits nesting multiple hierarchical Manifest trees.
465 > In this layout, the specific directories of the Manifest tree can
466 > be verified both as a part of another top-level Manifest,
467 > and as an independent Manifest tree (when obtained without the parent
468 > directory).
469 >
470 > For this to work, the sub-Manifest file in the directory must also
471 > satisfy the requirements for the top-level Manifest file. That is:
472 >
473 > - it must be named ``Manifest`` and not compressed,
474 >
475 > - it must cover all the files in this directory and its subdirectories
476 > (i.e. no files from the directory tree can be covered by parent
477 > Manifest),
478 >
479 > - if authenticity verification is desired, it must be OpenPGP-signed.
480 >
481 > It should be noted that if such a directory is a subdirectory of a valid
482 > Manifest tree, the sub-Manifest needs to be valid according
483 > to the top-level Manifest and the OpenPGP signature is disregarded
484 > as detailed in `Manifest file locations and nesting`_. The top-level
485 > behavior is exhibited only when the directory is obtained without parent
486 > directories.
487 >
488 >
489 > An example Manifest file (informational)
490 > ----------------------------------------
491 >
492 > An example top-level Manifest file for the Gentoo repository would have
493 > the following content::
494 >
495 > TIMESTAMP 2017-10-30T10:11:12Z
496 > IGNORE distfiles
497 > IGNORE local
498 > IGNORE lost+found
499 > IGNORE packages
500 > MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
501 > ...
502 > MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
503 > ...
504 >
505 > An example modern Manifest (disregarding backwards compatibility)
506 > for a package directory would have the following content::
507 >
508 > DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
509 > DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
510 > DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
511 > DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
512 > DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
513 > DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
514 > DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
515 >
516 >
517 > Rationale
518 > =========
519 >
520 > Stand-alone format
521 > ------------------
522 >
523 > The first question that needed to be asked before proceeding with
524 > the design was whether the Manifest file format was supposed to be
525 > stand-alone, or tightly bound to the repository format.
526 >
527 > The stand-alone format has been selected because of its three
528 > advantages:
529 >
530 > 1. It is more future-proof. If an incompatible change to the repository
531 > format is introduced, only developers need to upgrade the tools
532 > they use to generate the Manifests. The tools used to verify
533 > the updated Manifests will continue to work.
534 >
535 > 2. It is more flexible and universal. With a dedicated tool,
536 > the Manifest files can be used to sign and verify arbitrary file
537 > sets.
538 >
539 > 3. It keeps the verification tool simpler. In particular, we can easily
540 > write an independent verification tool that could work on any
541 > distribution without needing to depend on a package manager
542 > implementation or rewrite parts of it.
543 >
544 > Designing a stand-alone format requires that the Manifest carries enough
545 > information to perform the verification following all the rules specific
546 > to the Gentoo repository.
547 >
548 >
549 > Tree design
550 > -----------
551 >
552 > The second important point of the design was determining whether
553 > the Manifest files should be structured hierarchically, or independent.
554 > Both options have their advantages.
555 >
556 > In the hierarchical model, each sub-Manifest file is covered by a higher
557 > level Manifest. As a result, only the top-level Manifest has to be
558 > OpenPGP-signed, and subsequent Manifests need to be only verified by
559 > checksum stored in the parent Manifest. This has the following
560 > implications:
561 >
562 > - Verifying any set of files in the repository requires using checksums
563 > from the most relevant Manifests and the parent Manifests.
564 >
565 > - The OpenPGP signature of the top-level Manifest needs to be verified
566 > only once per process.
567 >
568 > - Altering any set of files requires updating the relevant Manifests,
569 > and their parent Manifests up to the top-level Manifest, and signing
570 > the last one.
571 >
572 > - As a result, the top-level Manifest changes on every commit,
573 > and various middle-level Manifests change (and need to be transferred)
574 > frequently.
575 >
576 > In the independent model, each sub-Manifest file is independent
577 > of the parent Manifests. As a result, each of them needs to be signed
578 > and verified independently. However, the parent Manifests still need
579 > to list sub-Manifests (albeit without verification data) in order
580 > to detect removal or replacement of subdirectories. This has
581 > the following implications:
582 >
583 > - Verifying any set of files in the repository requires using checksums
584 > and verifying signatures of the most relevant Manifest files.
585 >
586 > - Altering any set of files requires updating the relevant Manifests
587 > and signing them again.
588 >
589 > - Parent Manifests are updated only when Manifests are added or removed
590 > from subdirectories. As a result, they change infrequently.
591 >
592 > While both models have their advantages, the hierarchical model was
593 > selected because it reduces the number of OpenPGP operations
594 > (which are comparatively costly) to the minimum.
595 >
596 >
597 > Tree layout restrictions
598 > ------------------------
599 >
600 > The algorithm is meant to work primarily with ebuild repositories which
601 > normally contain only files and directories. Directories provide
602 > no useful metadata for verification, and specifying special entries
603 > for additional file types is purposeless. Therefore, the specification
604 > is restricted to dealing with regular files.
605 >
606 > The Gentoo repository does not use symbolic links. Some Gentoo
607 > repositories do, however. To provide a simple solution for dealing with
608 > symlinks without having to take care to implement special handling for
609 > them, the common behavior of implicitly resolving them is used.
610 > Therefore, symbolic links to files are stored as if they were regular
611 > files, and symbolic links to directories are followed as if they were
612 > regular directories.
613 >
614 > Dotfiles are implicitly ignored as that is a common notion used
615 > in software written for POSIX systems. All other filenames require
616 > explicit ``IGNORE`` lines.
617 >
618 > An ability to inject additional ignore entries is provided to account
619 > for site configuration affecting the repository tree -- placing
620 > additional files in it, skipping some of the categories from syncing.
621 > This configuration can extend beyond the limits of this GLEP,
622 > e.g. by allowing wildcards or regular expressions.
623 >
624 > The algorithm is restricted to work on a single filesystem. This is
625 > mostly relevant when scanning for top-level Manifest -- we do not want
626 > to cross filesystem boundaries then. However, to ensure consistent
627 > bidirectional behavior we need to also ban them when operating downwards
628 > the tree.
629 >
630 > The directories and files on different filesystems need to be ignored
631 > explicitly as implicitly skipping them would cause confusion.
632 > In particular, tools might then claim that a file does not exist when
633 > it clearly does because it was skipped due to filesystem boundaries.
634 >
635 >
636 > Filename character set restriction
637 > ----------------------------------
638 >
639 > The valid set of filename characters for the Gentoo repository
640 > is restricted by the devmanual 'File Naming Rules' section
641 > [#FILE-NAMING-RULES]_, and enforced via a git hook. The valid distfile
642 > names are not restricted explicitly -- however, the PMS dependency
643 > specification syntax [#PMS-FETCH]_ implicitly makes it impossible to use
644 > filenames containing whitespace.
645 >
646 > This specification aims to avoid arbitrary restrictions. For this
647 > reason, filename characters are only restricted by excluding three
648 > technically problematic groups:
649 >
650 > 1. The backwards slash character (``\``) is used as path separator
651 > on Windows systems, so it's extremely unlikely to be used in real
652 > filenames. For this reason it is used to implement character
653 > encoding with minimal risk of breaking backwards compatibility.
654 >
655 > 2. The control characters can trigger special behavior in various
656 > programs and confuse them from recognizing text files. In particular,
657 > the NULL character (``U+0000``) is normally used to indicate the end
658 > of a null-terminated string. Its use could therefore break
659 > implementations written in the C language. Other control characters
660 > could trigger various formatting routines, garbling text output.
661 >
662 > 3. Whitespace characters are used to separate Manifest fields
663 > and entries. While technically it would be enough to restrict space
664 > (``U+0020``) character that is normally used as the separator
665 > and newline (``U+000A``) character that is used to separate lines,
666 > all whitespace characters are forbidden to avoid confusion
667 > and implementation errors.
668 >
669 > Historically, Portage attempted to overcome the whitespace limitation
670 > by attempting to locate the size field and take everything before it
671 > as filename. This was terribly fragile and even if it worked, it would
672 > solve the problem only partially.
673 >
674 > To preserve compatibility with the current implementations and given
675 > that all of the listed characters are not allowed for the foreseeable
676 > Gentoo uses, extended encoding support is optional. If such support
677 > is not provided, the implementation must unconditionally reject any
678 > such files. Ignoring them implicitly would be confusing, and it is
679 > not possible to use them in explicit ``IGNORE`` entries.
680 >
681 > The character encoding method provides means to overcome the character
682 > restrictions to extend the tool usability beyond immediate Gentoo uses.
683 > The backslash escape form based on Python unicode strings is used
684 > since it can encode all characters within the Unicode range, the syntax
685 > is familiar to many programmers and the backwards slash character
686 > is extremely unlikely to appear in real filenames.
687 >
688 > Syntax is limited to the minimum necessary to implement the encoding.
689 > Shorthand forms (e.g. ``\t`` or ``\\``) are omitted to avoid unnecessary
690 > complexity, and to reduce the risk of shell users using backslash
691 > to escape space directly. The ``\x`` form is limited to ``\x00..\x7F``
692 > range to avoid ambiguity of higher values which might be interpreted
693 > either as UCS-2 code points or part of a UTF-8 encoded character.
694 >
695 > Encoding stores UCS-2/UCS-4 characters directly rather than hex-encoded
696 > UTF-8 string to simplify the implementation. In particular, it makes it
697 > possible to process the Manifest file as UTF-8 encoded text without
698 > having to perform additional UTF-8 decoding (and verification)
699 > of the escaped data.
700 >
701 > URL-encoding was considered as an alternative. However, it could collide
702 > with ``DIST`` entries that are implicitly named after the URL filename
703 > part where URL-encoding is pretty common.
704 >
705 >
706 > File verification model
707 > -----------------------
708 >
709 > The verification model aims to provide full coverage against different
710 > forms of attack. In particular, three different kinds of manipulation
711 > are considered:
712 >
713 > 1. Alteration of the file content.
714 >
715 > 2. Removal of a file.
716 >
717 > 3. Addition of a new file.
718 >
719 > In order to prevent against all three, the system requires that all
720 > files in the repository are listed in Manifests and verified against
721 > them.
722 >
723 > As a special case, ignores are allowed to account for directories
724 > that are not part of the repository but were traditionally placed inside
725 > it. Those directories were ``distfiles``, ``local`` and ``packages``. It
726 > could be also used to ignore VCS directories such as ``CVS``.
727 >
728 >
729 > Non-strict Manifest verification
730 > --------------------------------
731 >
732 > Originally the Manifest2 format provided a special ``MISC`` tag that
733 > was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
734 > indicated that the Manifest verification failures could be ignored for
735 > those files unless the package manager was working in strict mode.
736 >
737 > The first versions of this specification continued the use of this tag.
738 > However, after a long debate it was decided to deprecate it along with
739 > the non-strict behavior, and require all files to strictly match.
740 >
741 > Two arguments were mentioned for the usefulness of a ``MISC`` type:
742 >
743 > 1. being able to reduce the checkout size by stripping unnecessary
744 > files out, and
745 >
746 > 2. being able to update automatically generated files locally
747 > without causing unnecessary verification failures.
748 >
749 > However, the usefulness of ``MISC`` in both cases is doubtful.
750 >
751 > The cases for stripping unnecessary files mostly focused around space
752 > savings. For this purpose, stripping ``metadata.xml`` and similar files
753 > has little value. It is much more common for users to strip whole
754 > packages or categories. The ``MISC`` type is not suitable for that,
755 > and so a dedicated package manager mechanism needs to be developed
756 > instead. The same mechanism can also handle files that historically used
757 > the ``MISC`` type. As an example, the package manager may choose
758 > to generate both the rsync exclusion list and Manifest ignore list
759 > using a single source list.
760 >
761 > The cases for autogenerated files involve such cache files
762 > as ``use.local.desc``. However, we can not include ``md5-cache`` there
763 > due to security concerns which results in inconsistent cache handling.
764 > Furthermore, the tools were historically modified to provide stable
765 > output which means that their content can not change without
766 > a non-``MISC`` content being changed first. This practically defeats
767 > the purpose of using ``MISC``.
768 >
769 > Finally, the non-strict mode could be used as means to an attack.
770 > The allowance of missing or modified documentation file could be used
771 > to spread misinformation, resulting in bad decisions made by the user.
772 > A modified file could also be used, e.g. to exploit vulnerabilities
773 > of an XML parser.
774 >
775 >
776 > Timestamp field
777 > ---------------
778 >
779 > The top-level Manifest optionally allows using a ``TIMESTAMP`` tag
780 > to include a generation timestamp in the Manifest. A similar feature
781 > was originally proposed in GLEP 58 [#GLEP58]_.
782 >
783 > A malicious third-party may use the principles of exclusion or replay
784 > [#C08]_ to deny an update to clients, while at the same time recording
785 > the identity of clients to attack. The timestamp field can be used to
786 > detect that.
787 >
788 > In order to provide more complete protection, the Gentoo Infrastructure
789 > should provide an ability to obtain the timestamps of all Manifests
790 > from a recent timeframe over a secure channel from a trusted source
791 > for comparison.
792 >
793 > Strictly speaking, this information is provided by the various
794 > ``metadata/timestamp*`` files that are already present. However,
795 > including the value in the Manifest itself has a little cost
796 > and provides the ability to perform the verification stand-alone.
797 >
798 > Furthermore, some of the timestamp files are added very late
799 > in the distribution process, past the Manifest generation phase. Those
800 > files will most likely receive ``IGNORE`` entries and therefore
801 > be unsafe to use.
802 >
803 > The specification permits additional timestamps in sub-Manifest files
804 > for local use. A generic testing tool should ignore them.
805 >
806 >
807 > New vs deprecated tags
808 > ----------------------
809 >
810 > Out of the four types defined by Manifest2, only one is reused
811 > and the remaining three are replaced by a single, universal ``DATA``
812 > type.
813 >
814 > The ``DIST`` tag is reused since the specification does not change
815 > anything with regard to distfile handling.
816 >
817 > The ``EBUILD`` tag could potentially be reused for generic file
818 > verification data. However, it would be confusing if all the different
819 > data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
820 > type was introduced as a replacement.
821 >
822 > The ``MISC`` tag and the relevant non-strict mode has been removed
823 > as being of little value, as detailed in the `Non-strict Manifest
824 > verification`_ section.
825 >
826 > The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
827 > the limiting property of implicit ``files/`` path prefix.
828 >
829 >
830 > Finding top-level Manifest
831 > --------------------------
832 >
833 > The development of a reference implementation for this GLEP has brought
834 > the following problem: how to find all the relevant Manifests when
835 > the Manifest tool is run inside a subdirectory of the repository?
836 >
837 > One of the options would be to provide a bi-directional linking
838 > of Manifests via a ``PARENT`` tag. However, that would not solve
839 > the problem when a new Manifest file is being created.
840 >
841 > Instead, an algorithm for iterating over parent directories is proposed.
842 > Since there is no obligatory explicit indicator for the top-level
843 > Manifest, the algorithm assumes that the top-level Manifest
844 > is the highest ``Manifest`` in the directory hierarchy that can cover
845 > the current directory. This generally makes sense since the Manifest
846 > files are required to provide coverage for all subdirectories, so all
847 > Manifests starting from that one need to be updated.
848 >
849 > If independent Manifest trees are nested in the directory structure,
850 > then an ``IGNORE`` entry needs to be used to separate them.
851 >
852 > Since sub-Manifests can use any filenames, the Manifest finding
853 > algorithm must not short-cut the procedure by storing all ``Manifest``
854 > files along the parent directories. Instead, it needs to retrace
855 > the relevant sub-Manifest files along ``MANIFEST`` entries
856 > in the top-level Manifest.
857 >
858 >
859 > Injecting ChangeLogs into the checkout
860 > --------------------------------------
861 >
862 > One of the problems considered in the new Manifest format was injecting
863 > historical and autogenerated ChangeLog into the repository. We normally
864 > don't include those files, to reduce the checkout size. However, some
865 > users have shown interest in them and Infra is working on providing them
866 > via an additional rsync module.
867 >
868 > If such files were injected into the repository, they would cause
869 > verification failures of Manifests. To account for this, Infra could
870 > provide ``IGNORE`` entries to allow them to exist.
871 >
872 >
873 > Splitting distfile checksums from file checksums
874 > ------------------------------------------------
875 >
876 > Another problem with the current Manifest format is that the checksums
877 > for fetched files are combined with checksums for local files
878 > in a single file inside the package directory. It has been specifically
879 > pointed out that:
880 >
881 > - since distfiles are sometimes reused across different packages,
882 > the repeating checksums are redundant [#DIST]_.
883 >
884 > - mirror admins were interested in the possibility of verifying all
885 > the distfiles with a single tool.
886 >
887 > This specification does not provide a clean solution to this problem.
888 > It technically permits moving ``DIST`` entries to higher-level Manifests
889 > but the usefulness of such a solution is doubtful.
890 >
891 > However, for the second problem we will probably deliver a dedicated
892 > tool working with this Manifest format.
893 >
894 >
895 > Hash algorithms
896 > ---------------
897 >
898 > While maintaining a consistent supported hash set is important
899 > for interoperability, it is not a good fit for the generic layout
900 > of this GLEP. Furthermore, it would require updating the GLEP
901 > in the future every time the used algorithms change.
902 >
903 > Instead, the specification focuses on listing the currently used
904 > algorithm names for interoperability, and sets a recommendation
905 > for consistent naming of algorithms in the future. The Python
906 > ``hashlib`` module is used as a reference since it is used
907 > as the provider of hash functions for most of the Python software,
908 > including Portage and PkgCore.
909 >
910 > The basic rules for changing hash algorithms are defined in GLEP 59
911 > [#GLEP59]_. The implementations can focus only on those algorithms
912 > that are actually used or planned on being used. It may be feasible
913 > to devise a new GLEP that specifies the currently used hashes (or update
914 > GLEP 59 accordingly).
915 >
916 >
917 > Manifest compression
918 > --------------------
919 >
920 > The support for Manifest compression is introduced with minimal changes
921 > to the file format. The ``MANIFEST`` entries are required to provide
922 > the real (compressed) file path for compatibility with other file
923 > entries and to avoid confusion.
924 >
925 > The compression of top-level Manifest file has been prohibited
926 > as the specification currently does not provide any means of verifying
927 > the file prior to decompression. If the top-level Manifest is
928 > compressed, tooling will have to unpack the file before being able
929 > to verify the contents. This makes it possible for a malicious third
930 > party to attack the system by providing a compressed Manifest that
931 > exposes decompressor vulnerabilities, or a zip bomb.
932 >
933 > The OpenPGP cleartext signature covers the contents of the Manifest,
934 > and is therefore compressed along with them. The possibility of using
935 > a detached signature has been considered but it was rejected as
936 > unnecessary complexity for minor gain.
937 >
938 > Technically, a similar result could be effected via moving all the data
939 > into a compressed sub-Manifest in the top directory (e.g.
940 > ``Manifest.sub.gz``), and including a ``MANIFEST`` entry for this file
941 > in a signed, uncompressed top-level Manifest.
942 >
943 > The existence of additional entries for uncompressed Manifest checksums
944 > was debated. However, plain entries for the uncompressed file would
945 > be confusing if only the compressed file existed, and conflicting
946 > if both uncompressed and compressed variants existed. Furthermore,
947 > it has been pointed out that ``DIST`` entries do not have
948 > an uncompressed variant either.
949 >
950 >
951 > Performance considerations
952 > --------------------------
953 >
954 > Performing a full-tree verification on every sync raises some
955 > performance concerns for end-user systems. The initial testing has shown
956 > that a cold-cache verification on a btrfs file system can take up around
957 > 4 minutes, with the process being mostly I/O bound. On the other hand,
958 > it can be expected that the verification will be performed directly
959 > after syncing, taking advantage of a warm filesystem cache.
960 >
961 > To improve speed on I/O and/or CPU-restrained systems even further,
962 > the algorithms can be easily extended to perform incremental
963 > verification. Given that rsync does not preserve mtimes by default,
964 > the tool can take advantage of mtime and Manifest comparisons to recheck
965 > only the parts of the repository that have changed.
966 >
967 > Furthermore, the package manager implementations can restrict checking
968 > only to the parts of the repository that are actually being used.
969 >
970 >
971 > Backwards Compatibility
972 > =======================
973 >
974 > This GLEP provides optional means of preserving backwards compatibility.
975 > To preserve the backwards compatibility, the following needs to hold
976 > for the ``Manifest`` file in every package directory:
977 >
978 > - all files must be covered by the single ``Manifest`` file,
979 >
980 > - all distfiles used by the package must be included,
981 >
982 > - all files inside the ``files/`` subdirectory need to use
983 > the ``AUX`` tag (rather than ``DATA``),
984 >
985 > - all ``.ebuild`` files need to use the ``EBUILD`` tag,
986 >
987 > - the ``metadata.xml`` and ``ChangeLog`` files need to use
988 > the ``MISC`` tag,
989 >
990 > - the Manifest can be signed to provide authenticity verification,
991 >
992 > - an uncompressed Manifest must always exist, and a compressed Manifest
993 > of identical content may be present.
994 >
995 > Once the backwards compatibility is no longer a concern, the above
996 > no longer needs to hold and the deprecated tags can be removed.
997 >
998 >
999 > Reference Implementation
1000 > ========================
1001 >
1002 > The reference implementation for this GLEP is being developed
1003 > as the gemato project [#GEMATO]_.
1004 >
1005 >
1006 > Credits
1007 > =======
1008 >
1009 > Thanks to all the people whose contributions were invaluable
1010 > to the creation of this GLEP. This includes but is not limited to:
1011 >
1012 > - Robin Hugh Johnson,
1013 > - Ulrich Müller.
1014 >
1015 > Additionally, thanks to Robin Hugh Johnson for the original
1016 > MetaManifest GLEP series which served both as inspiration and source
1017 > of many concepts used in this GLEP. Recursively, also thanks to all
1018 > the people who contributed to the original GLEPs.
1019 >
1020 >
1021 > References
1022 > ==========
1023 >
1024 > .. [#GLEP44] GLEP 44: Manifest2 format
1025 > (https://www.gentoo.org/glep/glep-0044.html)
1026 >
1027 > .. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
1028 > - Overview
1029 > (https://www.gentoo.org/glep/glep-0057.html)
1030 >
1031 > .. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
1032 > - Infrastructure to User distribution - MetaManifest
1033 > (https://www.gentoo.org/glep/glep-0058.html)
1034 >
1035 > .. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
1036 > (https://www.gentoo.org/glep/glep-0059.html)
1037 >
1038 > .. [#GLEP60] GLEP 60: Manifest2 filetypes
1039 > (https://www.gentoo.org/glep/glep-0060.html)
1040 >
1041 > .. [#GLEP61] GLEP 61: Manifest2 compression
1042 > (https://www.gentoo.org/glep/glep-0061.html)
1043 >
1044 > .. [#UNICODE] The Unicode standard
1045 > (https://unicode.org/versions/latest/)
1046 >
1047 > .. [#PMS-FETCH] Package Manager Specification: Dependency Specification
1048 > Format - SRC_URI
1049 > (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)
1050 >
1051 > .. [#FILE-NAMING-RULES] Ebuild File Format -- Gentoo Development Guide
1052 > (https://devmanual.gentoo.org/ebuild-writing/file-format/#file-naming-rules)
1053 >
1054 > .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
1055 > (https://www.ietf.org/rfc/rfc1321.txt)
1056 >
1057 > .. [#RIPEMD160] The hash function RIPEMD-160
1058 > (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)
1059 >
1060 > .. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
1061 > (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)
1062 >
1063 > .. [#WHIRLPOOL] The WHIRLPOOL Hash Function
1064 > (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)
1065 >
1066 > .. [#BLAKE2] BLAKE2 -- fast secure hashing
1067 > (https://blake2.net/)
1068 >
1069 > .. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
1070 > and Extendable-Output Functions
1071 > (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)
1072 >
1073 > .. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
1074 > (https://www.streebog.net/)
1075 >
1076 > .. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
1077 > (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
1078 >
1079 > .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
1080 > at the time of writing are duplicate, representing 2 MiB
1081 > out of 25 MiB of DIST entries altogether.
1082 >
1083 > .. [#GEMATO] gemato: Gentoo Manifest Tool
1084 > (https://github.com/mgorny/gemato/)
1085 >
1086 >
1087 > Copyright
1088 > =========
1089 > This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
1090 > Unported License. To view a copy of this license, visit
1091 > http://creativecommons.org/licenses/by-sa/3.0/.
1092 >
1093 > --
1094 > Best regards,
1095 > Michał Górny
1096 >
1097 >
1098
1099 --
1100 Fabian Groffen
1101 Gentoo on a different level

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] [RFC] GLEP 74 post-Council review update [v5] "Michał Górny" <mgorny@g.o>