1 |
Hi, everyone. |
2 |
|
3 |
After a week of hard work, I'd like to request your comments |
4 |
on the draft of GLEP 74. This GLEP aims to replace the old tree-signing |
5 |
GLEPs 58 and 60 with a superior implementation and more complete |
6 |
specification. |
7 |
|
8 |
The original tree-signing GLEPs were accepted a few years back but they |
9 |
have never been implemented. This specification, on the other hand, |
10 |
comes with a working reference implementation for the verification |
11 |
algorithm. I expect to finish the update/generation part in a few days, |
12 |
then work on additional optimizations (threading, incremental |
13 |
verification, incremental updates). |
14 |
|
15 |
ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst |
16 |
HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html |
17 |
impl: https://github.com/mgorny/gemato/ |
18 |
|
19 |
Full text following for inline comments. |
20 |
|
21 |
|
22 |
--- |
23 |
GLEP: 74 |
24 |
Title: Full-tree verification using Manifest files |
25 |
Author: Michał Górny <mgorny@g.o> |
26 |
Type: Standards Track |
27 |
Status: Draft |
28 |
Version: 1 |
29 |
Created: 2017-10-21 |
30 |
Last-Modified: 2017-10-26 |
31 |
Post-History: 2017-10-26 |
32 |
Content-Type: text/x-rst |
33 |
Requires: 59, 61 |
34 |
Replaces: 44, 58, 60 |
35 |
--- |
36 |
|
37 |
Abstract |
38 |
======== |
39 |
|
40 |
This GLEP extends the Manifest file format to cover full-tree file |
41 |
integrity and authenticity checks.The format aims to be future-proof, |
42 |
efficient and provide means of backwards compatibility. |
43 |
|
44 |
|
45 |
Motivation |
46 |
========== |
47 |
|
48 |
The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current |
49 |
means of verifying the integrity of distfiles and package files |
50 |
in Gentoo. Combined with OpenPGP signatures, they provide means to |
51 |
ensure the authenticity of the covered files. However, as noted |
52 |
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree |
53 |
authenticity verification as they do not cover any files outside |
54 |
the package directory. In particular, they provide multiple ways |
55 |
for a third party to inject malicious code into the ebuild environment. |
56 |
|
57 |
Historically, the topic of providing authenticity coverage for the whole |
58 |
repository has been mentioned multiple times. The most noteworthy effort |
59 |
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008. |
60 |
They were accepted by the Council in 2010 but have never been |
61 |
implemented. When potential implementation work started in 2017, a new |
62 |
discussion about the specification arose. It prompted the creation |
63 |
of a competing GLEP that would provide a redesigned alternative to |
64 |
the old GLEPs. |
65 |
|
66 |
This specification is designed with the following goals in mind: |
67 |
|
68 |
1. It should provide means to ensure the authenticity of the complete |
69 |
repository, including preventing the injection of additional files. |
70 |
|
71 |
2. Alike the original Manifest2, the files should be split into two |
72 |
groups — files whose authenticity is critical, and those whose |
73 |
mismatch may be accepted in non-strict mode. The same classification |
74 |
should apply both to files listed in Manifests, and to stray files |
75 |
present only in the repository. |
76 |
|
77 |
3. The format should be universal enough to work both for the Gentoo |
78 |
repository and third-party repositories of different characteristics. |
79 |
|
80 |
4. The Manifest files should be verifiable stand-alone, that is without |
81 |
knowing any details about the underlying repository format. |
82 |
|
83 |
|
84 |
Specification |
85 |
============= |
86 |
|
87 |
Manifest file format |
88 |
-------------------- |
89 |
|
90 |
This specification reuses and extends the Manifest file format defined |
91 |
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is |
92 |
repurposed as a generic *tag* that could also indicate additional |
93 |
(non-checksum) metadata. Appropriately, those tags can be followed by |
94 |
other space-separated values. |
95 |
|
96 |
Unless specified otherwise, the paths used in the Manifest files |
97 |
are relative to the directory containing the Manifest file. The paths |
98 |
must not reference the parent directory (``..``). |
99 |
|
100 |
|
101 |
Manifest file locations and nesting |
102 |
----------------------------------- |
103 |
|
104 |
The ``Manifest`` file located in the root directory of the repository |
105 |
is called top-level Manifest, and it is used to perform the full-tree |
106 |
verification. In order to verify the authenticity, it must be signed |
107 |
using OpenPGP, using the armored cleartext format. |
108 |
|
109 |
The top-level Manifest may reference sub-Manifests contained |
110 |
in subdirectories of the repository. The sub-Manifests are traditionally |
111 |
named ``Manifest``; however, the implementation must support arbitrary |
112 |
names, including the possibility of multiple (split) Manifests |
113 |
for a single directory. The sub-Manifest can only cover the files inside |
114 |
the directory tree where it resides. |
115 |
|
116 |
The sub-Manifest can also be signed using OpenPGP armored cleartext |
117 |
format. However, the signature verification can be omitted if it is |
118 |
covered by a signed top-level Manifest. |
119 |
|
120 |
The Manifest files can also specify ``IGNORE`` entries to skip Manifest |
121 |
verification of subdirectories and/or files. Files and directories |
122 |
starting with a dot are always implicitly ignored. All files that |
123 |
are not ignored must be covered by at least one of the Manifests. |
124 |
|
125 |
A single file may be matched by multiple identical or equivalent |
126 |
Manifest entries, if and only if the entries have the same semantics, |
127 |
specify the same size and the checksums common to both entries match. |
128 |
It is an error for a single file to be matched by multiple entries |
129 |
of different semantics, file size or checksum values. It is an error |
130 |
to specify another entry for a file matching ``IGNORE``, or one of its |
131 |
subdirectories. |
132 |
|
133 |
The file entries (except for ``IGNORE``) can be specified for regular |
134 |
files only. Symbolic links are followed when opening files. It is |
135 |
an error to specify an entry for a different file type. |
136 |
|
137 |
All the files covered by a Manifest tree must reside on the same |
138 |
filesystem. It is an error to specify entries applying to files |
139 |
on another filesystem. If subdirectories of the Manifest tree reside |
140 |
on a different filesystem, they must be explicitly excluded |
141 |
via ``IGNORE``. |
142 |
|
143 |
|
144 |
File verification |
145 |
----------------- |
146 |
|
147 |
When verifying a file against the Manifest, the following rules are |
148 |
used: |
149 |
|
150 |
- if a file listed in Manifest is not present, then the verification |
151 |
for the file fails, |
152 |
|
153 |
- if a file listed in Manifest is present but has a different size |
154 |
or one of the checksums does not match, the verification fails, |
155 |
|
156 |
- if a file is present but not listed in Manifest, the verification |
157 |
fails, |
158 |
|
159 |
- otherwise, the verification succeeds. |
160 |
|
161 |
Unless specified otherwise, the package manager must not allow using |
162 |
any files for which the verification failed. The package manager may |
163 |
reject any package or even the whole repository if it may refer to files |
164 |
for which the verification failed. |
165 |
|
166 |
|
167 |
New Manifest tags |
168 |
----------------- |
169 |
|
170 |
The Manifest files can specify the following tags: |
171 |
|
172 |
``TIMESTAMP <iso8601>`` |
173 |
Specifies a timestamp of when the Manifest file was last updated. |
174 |
The timestamp must be a valid second-precision ISO8601 extended format |
175 |
combined date and time in UTC timezone, i.e. using the following |
176 |
``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used |
177 |
in the top-level Manifest file. The package manager can use it |
178 |
to detect an outdated repository checkout. |
179 |
|
180 |
``MANIFEST <path> <size> <checksums>…`` |
181 |
Specifies a sub-Manifest. The sub-Manifest must be verified like |
182 |
a regular file. If the verification succeeds, the entries from |
183 |
the sub-Manifest are included for verification as described |
184 |
in `Manifest file locations and nesting`_. |
185 |
|
186 |
``IGNORE <path>`` |
187 |
Ignores a subdirectory or file from Manifest checks. If the specified |
188 |
path is present, it and its contents are omitted from the Manifest |
189 |
verification (always pass). |
190 |
|
191 |
``DATA <path> <size> <checksums>…`` |
192 |
Specifies a file subject to obligatory Manifest verification. |
193 |
The file is required to pass verification. Used for all files directly |
194 |
affecting package manager operation (ebuilds, eclasses, profiles). |
195 |
|
196 |
``MISC <path> <size> <checksums>…`` |
197 |
Specifies a file subject to non-obligatory Manifest verification. |
198 |
The package manager may ignore a verification failure if operating |
199 |
in non-strict mode. Used for files that do not affect the installed |
200 |
packages (``metadata.xml``, ``use.desc``). |
201 |
|
202 |
``OPTIONAL <path>`` |
203 |
Specifies a file that would be subject to non-obligatory Manifest |
204 |
verification if it existed. The package may ignore a stray file |
205 |
matching this entry if operating in non-strict mode. Used for paths |
206 |
that would match ``MISC`` if they existed. |
207 |
|
208 |
``DIST <filename> <size> <checksums>…`` |
209 |
Specifies a distfile entry used to verify files fetched as part |
210 |
of ``SRC_URI``. The filename must match the filename used to store |
211 |
the fetched file as specified in the PMS [#PMS-FETCH]_. The package |
212 |
manager must reject the fetched file if it fails verification. |
213 |
``DIST`` entries apply to all packages below the Manifest file |
214 |
specifying them. |
215 |
|
216 |
|
217 |
Deprecated Manifest tags |
218 |
------------------------ |
219 |
|
220 |
For backwards compatibility, the following tags are additionally |
221 |
allowed at the package directory level: |
222 |
|
223 |
``EBUILD <filename> <size> <checksums>…`` |
224 |
Equivalent to the ``DATA`` type. |
225 |
|
226 |
``AUX <filename> <size> <checksums>…`` |
227 |
Equivalent to the ``DATA`` type, except that the filename is relative |
228 |
to ``files/`` subdirectory. |
229 |
|
230 |
|
231 |
Algorithm for full-tree verification |
232 |
------------------------------------ |
233 |
|
234 |
In order to perform full-tree verification, the following algorithm |
235 |
can be used: |
236 |
|
237 |
1. Collect all files present in the repository into *present* set. |
238 |
|
239 |
2. Start at the top-level Manifest file. Verify its OpenPGP signature. |
240 |
Optionally verify the ``TIMESTAMP`` entry if present. Remove |
241 |
the top-level Manifest from the *present* set. |
242 |
|
243 |
3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest |
244 |
files according to `file verification`_ section, and include their |
245 |
entries in the current Manifest entry list (using paths relative |
246 |
to directories containing the Manifests). |
247 |
|
248 |
4. Process all ``IGNORE`` entries. Remove any paths matching them |
249 |
from the *present* set. |
250 |
|
251 |
5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``, |
252 |
``EBUILD`` and ``AUX`` entries into the *covered* set. |
253 |
|
254 |
6. Verify all the files in the union of the *present* and *covered* |
255 |
sets, according to `file verification`_ section. |
256 |
|
257 |
|
258 |
Algorithm for finding parent Manifests |
259 |
-------------------------------------- |
260 |
|
261 |
In order to find the top-level Manifest from the current directory |
262 |
the following algorithm can be used: |
263 |
|
264 |
1. Store the current directory as *original* and the device ID |
265 |
of the containing filesystem (``st_dev``) as *startdev*, |
266 |
|
267 |
2. If the device ID of the containing filesystem (``st_dev``) |
268 |
of the current directory is different than *startdev*, stop. |
269 |
|
270 |
3. If the current directory contains a ``Manifest`` file: |
271 |
|
272 |
a. If a ``IGNORE`` entry in the ``Manifest`` file covers |
273 |
the *original* directory (or one of the parent directories), stop. |
274 |
|
275 |
b. Otherwise, store the current directory as *last_found*. |
276 |
|
277 |
4. If the current directory is the root system directory (``/``), stop. |
278 |
|
279 |
5. Otherwise, enter the parent directory and jump to step 2. |
280 |
|
281 |
Once the algorithm stops, *last_found* will contain the relevant |
282 |
top-level Manifest. If *last_found* is null, then the directory tree |
283 |
does not contain any valid top-level Manifest candidates and one should |
284 |
be created in the *original* directory. |
285 |
|
286 |
Once the top-level Manifest is found, its ``MANIFEST`` entries should |
287 |
be used to find any sub-Manifests below the top-level Manifest, |
288 |
up to and including the *original* directory. Note that those |
289 |
sub-Manifests can use different filenames than ``Manifest``. |
290 |
|
291 |
|
292 |
Checksum algorithms |
293 |
------------------- |
294 |
|
295 |
This section is informational only. Specifying the exact set |
296 |
of supported algorithms is outside the scope of this specification. |
297 |
|
298 |
The algorithm names reserved at the time of writing are: |
299 |
|
300 |
- ``MD5`` [#MD5]_, |
301 |
- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_, |
302 |
- ``SHA1`` [#SHS]_, |
303 |
- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_, |
304 |
- ``WHIRLPOOL`` [#WHIRLPOOL]_, |
305 |
- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_, |
306 |
- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_, |
307 |
- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes |
308 |
[#STREEBOG]_. |
309 |
|
310 |
The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_. |
311 |
It is recommended that any new hashes are named after the Python |
312 |
``hashlib`` module algorithm names, transformed into uppercase. |
313 |
|
314 |
|
315 |
Manifest compression |
316 |
-------------------- |
317 |
|
318 |
The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_. |
319 |
This section merely addresses interoperability issues between Manifest |
320 |
compression and this specification. |
321 |
|
322 |
The compressed Manifest files are required to be suffixed for their |
323 |
compression algorithm. This suffix should be used to recognize |
324 |
the compression and decompress Manifests transparently. The exact list |
325 |
of algorithms and their corresponding suffixes are outside the scope |
326 |
of this specification. |
327 |
|
328 |
Whenever this specification refers to top-level Manifest file, |
329 |
the implementation should account for compressed variants of this file |
330 |
with appropriate suffixes (e.g. ``Manifest.gz``). |
331 |
|
332 |
Whenever this specification refers to sub-Manifests, they can use any |
333 |
names but are also required to use a specific compression suffix. |
334 |
The ``MANIFEST`` entries are required to specify the full name including |
335 |
compression suffix, and the verification is performed on the compressed |
336 |
file. |
337 |
|
338 |
The specification permits uncompressed Manifests to exist alongside |
339 |
their compressed counterparts, and multiple compressed formats |
340 |
to coexist. If that is the case, the files must have the same |
341 |
uncompressed content and the specification is free to choose either |
342 |
of the files using the same base name. |
343 |
|
344 |
|
345 |
Rationale |
346 |
========= |
347 |
|
348 |
Stand-alone format |
349 |
------------------ |
350 |
|
351 |
The first question that needed to be asked before proceeding with |
352 |
the design was whether the Manifest file format was supposed to be |
353 |
stand-alone, or tightly bound to the repository format. |
354 |
|
355 |
The stand-alone format has been selected because of its three |
356 |
advantages: |
357 |
|
358 |
1. It is more future-proof. If an incompatible change to the repository |
359 |
format is introduced, only developers need to be upgrade the tools |
360 |
they use to generate the Manifests. The tools used to verify |
361 |
the updated Manifests will continue to work. |
362 |
|
363 |
2. It is more flexible and universal. With a dedicated tool, |
364 |
the Manifest files can be used to sign and verify arbitrary file |
365 |
sets. |
366 |
|
367 |
3. It keeps the verification tool simpler. In particular, we can easily |
368 |
write an independent verification tool that could work on any |
369 |
distribution without needing to depend on a package manager |
370 |
implementation or rewrite parts of it. |
371 |
|
372 |
Designing a stand-alone format requires that the Manifest carries enough |
373 |
information to perform the verification following all the rules specific |
374 |
to the Gentoo repository. |
375 |
|
376 |
|
377 |
Tree design |
378 |
----------- |
379 |
|
380 |
The second important point of the design was determining whether |
381 |
the Manifest files should be structured hierarchically, or independent. |
382 |
Both options have their advantages. |
383 |
|
384 |
In the hierarchical model, each sub-Manifest file is covered by a higher |
385 |
level Manifest. As a result, only the top-level Manifest has to be |
386 |
OpenPGP-signed, and subsequent Manifests need to be only verified by |
387 |
checksum stored in the parent Manifest. This has the following |
388 |
implications: |
389 |
|
390 |
- Verifying any set of files in the repository requires using checksums |
391 |
from the most relevant Manifests and the parent Manifests. |
392 |
|
393 |
- The OpenPGP signature of the top-level Manifest needs to be verified |
394 |
only once per process. |
395 |
|
396 |
- Altering any set of files requires updating the relevant Manifests, |
397 |
and their parent Manifests up to the top-level Manifest, and signing |
398 |
the last one. |
399 |
|
400 |
- As a result, the top-level Manifest changes on every commit, |
401 |
and various middle-level Manifests change (and need to be transferred) |
402 |
frequently. |
403 |
|
404 |
In the independent model, each sub-Manifest file is independent |
405 |
of the parent Manifests. As a result, each of them needs to be signed |
406 |
and verified independently. However, the parent Manifests still need |
407 |
to list sub-Manifests (albeit without verification data) in order |
408 |
to detect removal or replacement of subdirectories. This has |
409 |
the following implications: |
410 |
|
411 |
- Verifying any set of files in the repository requires using checksums |
412 |
and verifying signatures of the most relevant Manifest files. |
413 |
|
414 |
- Altering any set of files requires updating the relevant Manifests |
415 |
and signing them again. |
416 |
|
417 |
- Parent Manifests are updated only when Manifests are added or removed |
418 |
from subdirectories. As a result, they change infrequently. |
419 |
|
420 |
While both models have their advantages, the hierarchical model was |
421 |
selected because it reduces the number of OpenPGP operations |
422 |
which are comparatively costly to the minimum. |
423 |
|
424 |
|
425 |
Tree layout restrictions |
426 |
------------------------ |
427 |
|
428 |
The algorithm is meant to work primarily with ebuild repositories which |
429 |
normally contain only files and directories. Directories provide |
430 |
no useful metadata for verification, and specifying special entries |
431 |
for additional file types is purposeless. Therefore, the specification |
432 |
is restricted to dealing with regular files. |
433 |
|
434 |
The Gentoo repository does not use symbolic links. Some Gentoo |
435 |
repositories do, however. To provide a simple solution for dealing with |
436 |
symlinks without having to take care to implement special handling for |
437 |
them, the common behavior of implicitly resolving them is used. |
438 |
Therefore, symbolic links to files are stored as if they were regular |
439 |
files, and symbolic links to directories are followed as if they were |
440 |
regular directories. |
441 |
|
442 |
Dotfiles are implicitly ignored as that is a common notion used |
443 |
in software written for POSIX systems. All other filenames require |
444 |
explicit ``IGNORE`` lines. |
445 |
|
446 |
The algorithm is restricted to work on a single filesystem. This is |
447 |
mostly relevant when scanning for top-level Manifest — we do not want |
448 |
to cross filesystem boundaries then. However, to ensure consistent |
449 |
bidirectional behavior we need to also ban them when operating downwards |
450 |
the tree. |
451 |
|
452 |
The directories and files on different filesystems needs to be ignored |
453 |
explicitly as implicitly skipping them would cause confusion. |
454 |
In particular, tools might then claim that a file does not exist when |
455 |
it clearly does because it was skipped due to filesystem boundaries. |
456 |
|
457 |
|
458 |
File verification model |
459 |
----------------------- |
460 |
|
461 |
The verification model aims to provide full coverage against different |
462 |
forms of attack. In particular, three different kinds of manipulation |
463 |
are considered: |
464 |
|
465 |
1. Alteration of the file content. |
466 |
|
467 |
2. Removal of a file. |
468 |
|
469 |
3. Addition of a new file. |
470 |
|
471 |
In order to prevent against all three, the system requires that all |
472 |
files in the repository are listed in Manifests and verified against |
473 |
them. |
474 |
|
475 |
As a special case, ignores are allowed to account for directories |
476 |
that are not part of the repository but were traditionally placed inside |
477 |
it. Those directories were ``distfiles``, ``local`` and ``packages``. It |
478 |
could be also used to ignore VCS directories such as ``CVS``. |
479 |
|
480 |
|
481 |
Non-obligatory Manifest verification |
482 |
------------------------------------ |
483 |
|
484 |
While this specification recommends all tools to use strict verification |
485 |
by default, it allows declaring some files as non-obligatory like |
486 |
the original Manifest2 format did. This could be used on files that do |
487 |
not affect the normal package manager operation. |
488 |
|
489 |
It aims to account for two use cases: |
490 |
|
491 |
1. Stripping down files that are not strictly required to install |
492 |
packages from repository checkouts. |
493 |
|
494 |
2. Accounting for automatically generated files that might be updated |
495 |
by standard tooling. |
496 |
|
497 |
The traditional ``MISC`` type is amended with a complementary |
498 |
``OPTIONAL`` tag to account for files that are not provided |
499 |
in the specific repository. It aims to ensure that the same path would |
500 |
be non-fatal when provided by the repository but fatal when created |
501 |
by the user tooling. |
502 |
|
503 |
|
504 |
Timestamp field |
505 |
--------------- |
506 |
|
507 |
The top-level Manifests optionally allows using a ``TIMESTAMP`` tag |
508 |
to include a generation timestamp in the Manifest. A similar feature |
509 |
was originally proposed in GLEP 58 [#GLEP58]_. |
510 |
|
511 |
The timestamp can be used to detect delay or replay attacks against |
512 |
Gentoo mirrors. |
513 |
|
514 |
Strictly speaking, this is already provided by the various |
515 |
``metadata/timestamp.*`` files provided already by Gentoo which are also |
516 |
covered by the Manifest. However, including the value in the Manifest |
517 |
itself has a little cost and provides the ability to perform |
518 |
the verification stand-alone. |
519 |
|
520 |
|
521 |
New vs deprecated tags |
522 |
---------------------- |
523 |
|
524 |
Out of the four types defined by Manifest2, two are reused and two are |
525 |
marked deprecated. |
526 |
|
527 |
The ``DIST`` and ``MISC`` tags are reused since they can be relatively |
528 |
clearly marked into the new concept. |
529 |
|
530 |
The ``EBUILD`` tag could potentially be reused for generic file |
531 |
verification data. However, it would be confusing if all the different |
532 |
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA`` |
533 |
type was introduced as a replacement. |
534 |
|
535 |
The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has |
536 |
the limiting property of implicit ``files/`` path prefix. |
537 |
|
538 |
|
539 |
Finding top-level Manifest |
540 |
-------------------------- |
541 |
|
542 |
The development of a reference implementation for this GLEP has brought |
543 |
the following problem: how to find all the relevant Manifests when |
544 |
the Manifest tool is run inside a subdirectory of the repository? |
545 |
|
546 |
One of the options would be to provide a bi-directional linking |
547 |
of Manifests via a ``PARENT`` tag. However, that would not solve |
548 |
the problem when a new Manifest file is being created. |
549 |
|
550 |
Instead, an algorithm for iterating over parent directories is proposed. |
551 |
Since there is no obligatory explicit indicator for the top-level |
552 |
Manifest, the algorithm assumes that the top-level Manifest |
553 |
is the highest ``Manifest`` in the directory hierarchy that can cover |
554 |
the current directory. This generally makes sense since the Manifest |
555 |
files are required to provide coverage for all subdirectories, so all |
556 |
Manifests starting from that one need to be updated. |
557 |
|
558 |
If independent Manifest trees are nested in the directory structure, |
559 |
then an ``IGNORE`` entry needs to be used to separate them. |
560 |
|
561 |
Since sub-Manifests can use any filenames, the Manifest finding |
562 |
algorithm must not short-cut the procedure by storing all ``Manifest`` |
563 |
files along the parent directories. Instead, it needs to retrace |
564 |
the relevant sub-Manifest files along ``MANIFEST`` entries |
565 |
in the top-level Manifest. |
566 |
|
567 |
|
568 |
Injecting ChangeLogs into the checkout |
569 |
-------------------------------------- |
570 |
|
571 |
One of the problems considered in the new Manifest format was that |
572 |
of injecting historical and autogenerated ChangeLog into the repository. |
573 |
Normally we are not including those files to reduce the checkout size. |
574 |
However, some users have shown interest in them and Infra is working |
575 |
on providing them via an additional rsync module. |
576 |
|
577 |
If such files were injected into the repository, they would cause strict |
578 |
verification failures of Manifests. To account for this, Infra could |
579 |
provide either ``OPTIONAL`` entries for the Manifest files to allow them |
580 |
in non-strict verification mode, or ``IGNORE`` entries to allow them |
581 |
in the strict mode. |
582 |
|
583 |
|
584 |
Splitting distfile checksums from file checksums |
585 |
------------------------------------------------ |
586 |
|
587 |
Another problem with the current Manifest format is that the checksums |
588 |
for fetched files are combined with checksums for local files |
589 |
in a single file inside the package directory. It has been specifically |
590 |
pointed out that: |
591 |
|
592 |
- since distfiles are sometimes reused across different packages, |
593 |
the repeating checksums are redundant, |
594 |
|
595 |
- mirror admins were interested in the possibility of verifying all |
596 |
the distfiles with a single tool. |
597 |
|
598 |
This specification does not provide a clean solution to this problem. |
599 |
It technically permits moving ``DIST`` entries to higher-level Manifests |
600 |
but the usefulness of such a solution is doubtful. |
601 |
|
602 |
However, for the second problem we will probably deliver a dedicated |
603 |
tool working with this Manifest format. |
604 |
|
605 |
|
606 |
Hash algorithms |
607 |
--------------- |
608 |
|
609 |
While maintaining a consistent supported hash set is important |
610 |
for interoperability, it is no good fit for the generic layout of this |
611 |
GLEP. Furthermore, it would require updating the GLEP in the future |
612 |
every time the used algorithms change. |
613 |
|
614 |
Instead, the specification focuses on listing the currently used |
615 |
algorithm names for interoperability, and sets a recommendation |
616 |
for consistent naming of algorithms in the future. The Python |
617 |
``hashlib`` module is used as a reference since it is used |
618 |
as the provider of hash functions for most of the Python software, |
619 |
including Portage and PkgCore. |
620 |
|
621 |
The basic rules for changing hash algorithms are defined in GLEP 59 |
622 |
[#GLEP59]_. The implementations can focus only on those algorithms |
623 |
that are actually used or planned on being used. It may be feasible |
624 |
to devise a new GLEP that specifies the currently used hashes (or update |
625 |
GLEP 59 accordingly). |
626 |
|
627 |
|
628 |
Manifest compression |
629 |
-------------------- |
630 |
|
631 |
The support for Manifest compression is introduced with minimal changes |
632 |
to the file format. The ``MANIFEST`` entries are required to provide |
633 |
the real (compressed) file path for compatibility with other file |
634 |
entries and to avoid confusion. |
635 |
|
636 |
The existence of additional entries for uncompressed Manifest checksums |
637 |
was debated. However, plain entries for the uncompressed file would |
638 |
be confusing if only compressed file existed, and conflicting if both |
639 |
uncompressed and compressed variants existed. Furthermore, it has been |
640 |
pointed out that ``DIST`` entries do not have uncompressed variant |
641 |
either. |
642 |
|
643 |
|
644 |
Performance considerations |
645 |
-------------------------- |
646 |
|
647 |
Performing a full-tree verification on every sync raises some |
648 |
performance concerns for end-user systems. The initial testing has shown |
649 |
that a cold-cache verification on a btrfs file system can take up around |
650 |
4 minutes, with the process being mostly I/O bound. On the other hand, |
651 |
it can be expected that the verification will be performed directly |
652 |
after syncing, taking advantage of warm filesystem cache. |
653 |
|
654 |
To improve speed on I/O and/or CPU-restrained systems even further, |
655 |
the algorithms can be easily extended to perform incremental |
656 |
verification. Given that rsync does not preserve mtimes by default, |
657 |
the tool can take advantage of mtime and Manifest comparisons to recheck |
658 |
only the parts of the repository that have changed. |
659 |
|
660 |
Furthermore, the package manager implementations can restrict checking |
661 |
only to the parts of the repository that are actually being used. |
662 |
|
663 |
|
664 |
Backwards Compatibility |
665 |
======================= |
666 |
|
667 |
This GLEP provides optional means of preserving backwards compatibility. |
668 |
To preserve the backwards compatibility, the following needs to be |
669 |
ensured: |
670 |
|
671 |
- all files within the package directory must be covered by ``Manifest`` |
672 |
file inside that package directory, |
673 |
|
674 |
- all distfiles used by the package must be covered by ``Manifest`` |
675 |
file inside the package directory, |
676 |
|
677 |
- all files inside the ``files/`` subdirectory of a package directory |
678 |
need to be use the deprecated ``AUX`` tag (rather than ``DATA``), |
679 |
|
680 |
- all ``.ebuild`` files inside the package directory need to use |
681 |
the deprecated ``EBUILD`` tag (rather than ``DATA``), |
682 |
|
683 |
- the Manifest files inside the package directory can be signed |
684 |
to provide authenticity verification. |
685 |
|
686 |
Once the backwards compatibility is no longer a concern, the above |
687 |
no longer needs to hold and the deprecated tags can be removed. |
688 |
|
689 |
|
690 |
Reference Implementation |
691 |
======================== |
692 |
|
693 |
The reference implementation for this GLEP is being developed |
694 |
as the gemato project [#GEMATO]_. |
695 |
|
696 |
|
697 |
References |
698 |
========== |
699 |
|
700 |
.. [#GLEP44] GLEP 44: Manifest2 format |
701 |
(https://www.gentoo.org/glep/glep-0044.html) |
702 |
|
703 |
.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software |
704 |
- Overview |
705 |
(https://www.gentoo.org/glep/glep-0057.html) |
706 |
|
707 |
.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software |
708 |
- Infrastructure to User distribution - MetaManifest |
709 |
(https://www.gentoo.org/glep/glep-0058.html) |
710 |
|
711 |
.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications |
712 |
(https://www.gentoo.org/glep/glep-0059.html) |
713 |
|
714 |
.. [#GLEP60] GLEP 60: Manifest2 filetypes |
715 |
(https://www.gentoo.org/glep/glep-0060.html) |
716 |
|
717 |
.. [#GLEP61] GLEP 61: Manifest2 compression |
718 |
(https://www.gentoo.org/glep/glep-0061.html) |
719 |
|
720 |
.. [#PMS-FETCH] Package Manager Specification: Dependency Specification |
721 |
Format - SRC_URI |
722 |
(https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10) |
723 |
|
724 |
.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm |
725 |
(https://www.ietf.org/rfc/rfc1321.txt) |
726 |
|
727 |
.. [#RIPEMD160] The hash function RIPEMD-160 |
728 |
(https://homes.esat.kuleuven.be/~bosselae/ripemd160.html) |
729 |
|
730 |
.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS) |
731 |
(http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf) |
732 |
|
733 |
.. [#WHIRLPOOL] The WHIRLPOOL Hash Function |
734 |
(http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html) |
735 |
|
736 |
.. [#BLAKE2] BLAKE2 — fast secure hashing |
737 |
(https://blake2.net/) |
738 |
|
739 |
.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash |
740 |
and Extendable-Output Functions |
741 |
(http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf) |
742 |
|
743 |
.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function |
744 |
(https://www.streebog.net/) |
745 |
|
746 |
.. [#GEMATO] gemato: Gentoo Manifest Tool |
747 |
(https://github.com/mgorny/gemato/) |
748 |
|
749 |
Copyright |
750 |
========= |
751 |
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 |
752 |
Unported License. To view a copy of this license, visit |
753 |
http://creativecommons.org/licenses/by-sa/3.0/. |
754 |
|
755 |
|
756 |
-- |
757 |
Best regards, |
758 |
Michał Górny |