1 |
W dniu czw, 16.11.2017 o godzinie 11∶19 +0100, użytkownik Michał Górny |
2 |
napisał: |
3 |
> Hi, everyone. |
4 |
> |
5 |
> Here's the updated version of GLEP 74 taking into consideration |
6 |
> the points made during the Council pre-review. |
7 |
> |
8 |
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst |
9 |
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html |
10 |
> |
11 |
> Changes: |
12 |
|
13 |
27c2a9e glep-0074: Grammar corrections from Ulrich Müller |
14 |
d39f865 glep-0074: Make extended filename encoding optional |
15 |
ed111f8 glep-0074: Always exclude control characters |
16 |
|
17 |
--- |
18 |
GLEP: 74 |
19 |
Title: Full-tree verification using Manifest files |
20 |
Author: Michał Górny <mgorny@g.o>, |
21 |
Robin Hugh Johnson <robbat2@g.o>, |
22 |
Ulrich Müller <ulm@g.o> |
23 |
Type: Standards Track |
24 |
Status: Draft |
25 |
Version: 1 |
26 |
Created: 2017-10-21 |
27 |
Last-Modified: 2017-11-23 |
28 |
Post-History: 2017-10-26, 2017-11-16 |
29 |
Content-Type: text/x-rst |
30 |
Requires: 59, 61 |
31 |
Replaces: 44, 58, 60 |
32 |
--- |
33 |
|
34 |
Abstract |
35 |
======== |
36 |
|
37 |
This GLEP extends the Manifest file format to cover full-tree file |
38 |
integrity and authenticity checks. The format aims to be future-proof, |
39 |
efficient and provide means of backwards compatibility. |
40 |
|
41 |
|
42 |
Motivation |
43 |
========== |
44 |
|
45 |
The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current |
46 |
means of verifying the integrity of distfiles and package files |
47 |
in Gentoo. Combined with OpenPGP signatures, they provide means to |
48 |
ensure the authenticity of the covered files. However, as noted |
49 |
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree |
50 |
authenticity verification as they do not cover any files outside |
51 |
the package directory. In particular, they provide multiple ways |
52 |
for a third party to inject malicious code into the ebuild environment. |
53 |
|
54 |
Historically, the topic of providing authenticity coverage for the whole |
55 |
repository has been mentioned multiple times. The most noteworthy effort |
56 |
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008. |
57 |
They were accepted by the Council in 2010 but have never been |
58 |
implemented. When potential implementation work started in 2017, a new |
59 |
discussion about the specification arose. It prompted the creation |
60 |
of a competing GLEP that would provide a redesigned alternative to |
61 |
the old GLEPs. |
62 |
|
63 |
This specification is designed with the following goals in mind: |
64 |
|
65 |
1. It should provide means to ensure the authenticity of the complete |
66 |
repository, including preventing the injection of additional files. |
67 |
|
68 |
2. The format should be universal enough to work both for the Gentoo |
69 |
repository and third-party repositories of different characteristics. |
70 |
|
71 |
3. The Manifest files should be verifiable stand-alone, that is without |
72 |
knowing any details about the underlying repository format. |
73 |
|
74 |
|
75 |
Specification |
76 |
============= |
77 |
|
78 |
Manifest file format |
79 |
-------------------- |
80 |
|
81 |
This specification reuses and extends the Manifest file format defined |
82 |
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is |
83 |
repurposed as a generic *tag* that could also indicate additional |
84 |
(non-checksum) metadata. Appropriately, those tags can be followed by |
85 |
other space-separated values. |
86 |
|
87 |
Unless specified otherwise, the paths used in the Manifest files |
88 |
are relative to the directory containing the Manifest file. The paths |
89 |
must not reference the parent directory (``..``). Forward slash (``/``) |
90 |
is used as path component separator. |
91 |
|
92 |
The Manifest files use UTF-8 encoding. |
93 |
|
94 |
|
95 |
Manifest file locations and nesting |
96 |
----------------------------------- |
97 |
|
98 |
The ``Manifest`` file located in the root directory of the repository |
99 |
is called top-level Manifest, and it is used to perform the full-tree |
100 |
verification. In order to verify the authenticity, it must be signed |
101 |
using OpenPGP, using the armored cleartext format. |
102 |
|
103 |
The top-level Manifest may reference sub-Manifests contained |
104 |
in subdirectories of the repository. The sub-Manifests are traditionally |
105 |
named ``Manifest``; however, the implementation must support arbitrary |
106 |
names, including the possibility of multiple (split) Manifests |
107 |
for a single directory. The sub-Manifest can only cover the files inside |
108 |
the directory tree where it resides. |
109 |
|
110 |
The sub-Manifest can also be signed using OpenPGP armored cleartext |
111 |
format. However, the signature verification can be omitted since it |
112 |
already is covered by the signed top-level Manifest. |
113 |
|
114 |
|
115 |
Directory tree coverage |
116 |
----------------------- |
117 |
|
118 |
The specification provides three ways of skipping Manifest verification |
119 |
of specific files and directories (recursively): |
120 |
|
121 |
1. explicit ``IGNORE`` entries in Manifest files, |
122 |
|
123 |
2. injected ignore paths via package manager configuration, |
124 |
|
125 |
3. using names starting with a dot (``.``) which are always skipped. |
126 |
|
127 |
All files that are not ignored must be covered by at least one |
128 |
of the Manifests. |
129 |
|
130 |
A single file may be matched by multiple identical or equivalent |
131 |
Manifest entries, if and only if the entries have the same semantics, |
132 |
specify the same size and the checksums common to both entries match. |
133 |
It is an error for a single file to be matched by multiple entries |
134 |
of different semantics, file size or checksum values. It is an error |
135 |
to specify another entry for a file that matches ``IGNORE``, or that |
136 |
is located inside an ignored directory. |
137 |
|
138 |
The file entries (except for ``IGNORE``) can be specified for regular |
139 |
files only. Symbolic links are followed when opening files |
140 |
and traversing directories. It is an error to specify an entry for |
141 |
a different file type. If the tree contain files of other types |
142 |
that are not otherwise ignored, they need to be covered by an explicit |
143 |
``IGNORE``. |
144 |
|
145 |
All the local (non-``DIST``) files covered by a Manifest tree must |
146 |
reside on the same filesystem. It is an error to specify entries |
147 |
applying to files on another filesystem. If files or directories that |
148 |
are not otherwise ignored reside on a different filesystem, or symbolic |
149 |
links point to targets on a different filesystem, they must |
150 |
be explicitly excluded via ``IGNORE``. |
151 |
|
152 |
|
153 |
Path and filename encoding |
154 |
-------------------------- |
155 |
|
156 |
The path fields in the Manifest file must consist of characters |
157 |
corresponding to valid UTF-8 code points excluding the backwards slash |
158 |
(``\``) and characters classified as control characters or as whitespace |
159 |
in the current version of the Unicode standard [#UNICODE]_. |
160 |
|
161 |
The implementation can optionally support extended filename encoding |
162 |
to support those paths. If encoding is not supported, the implementation |
163 |
must reject directories containing any files using non-compliant names, |
164 |
as well as Manifest files whose filename field contains such filenames. |
165 |
|
166 |
If encoding is supported, then all of the excluded characters that |
167 |
are present in paths must be encoded using one of the following escape |
168 |
sequences: |
169 |
|
170 |
- characters in the ``U+0000`` to ``U+007F`` range can be encoded |
171 |
as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal |
172 |
character code, |
173 |
|
174 |
- characters in the ``U+0000`` to ``U+FFFF`` range can be encoded |
175 |
as ``\uHHHH`` where ``HHHH`` specifies the zero-padded, hexadecimal |
176 |
character code, |
177 |
|
178 |
- characters in the UCS-4 range can be encoded as ``\UHHHHHHHH`` |
179 |
where ``HHHHHHHH`` specifies the zero-padded, hexadecimal character |
180 |
code. |
181 |
|
182 |
It is invalid for the backwards slash to be used in any other context, |
183 |
and a backwards slash present in filename must be encoded. A backwards |
184 |
slash used as a path component separator should be replaced by a forward |
185 |
slash instead. |
186 |
|
187 |
The encoding can be used for other characters as well. In particular, |
188 |
escaping non-printable characters might be desirable. |
189 |
|
190 |
|
191 |
File verification |
192 |
----------------- |
193 |
|
194 |
When verifying a file against the Manifest, the following rules are |
195 |
used: |
196 |
|
197 |
1. If the file is covered directly or indirectly by an entry |
198 |
of the ``IGNORE`` type, the verification always succeeds. |
199 |
|
200 |
2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``, |
201 |
``MISC``, ``EBUILD`` or ``AUX`` type: |
202 |
|
203 |
a. if the file is not present, then the verification fails, |
204 |
|
205 |
b. if the file is present but has a different size or one |
206 |
of the checksums does not match, the verification fails, |
207 |
|
208 |
c. otherwise, the verification succeeds. |
209 |
|
210 |
3. If the file is present but not listed in Manifest, the verification |
211 |
fails. |
212 |
|
213 |
Unless specified otherwise, the package manager must not allow using |
214 |
any files for which the verification failed. The package manager may |
215 |
reject any package or even the whole repository if it may refer to files |
216 |
for which the verification failed. |
217 |
|
218 |
|
219 |
Timestamp verification |
220 |
---------------------- |
221 |
|
222 |
The top-level Manifest file can contain a ``TIMESTAMP`` entry to account |
223 |
for attacks against tree update distribution. If such an entry |
224 |
is present, it should be updated every time at least one |
225 |
of the Manifests changes. Every unique timestamp value must correspond |
226 |
to a single tree state. |
227 |
|
228 |
During the verification process, the client should compare the timestamp |
229 |
against the update time obtained from a local clock or a trusted time |
230 |
source. If the comparison result indicates that the Manifest at the time |
231 |
of receiving was already significantly outdated, the client should |
232 |
either fail the verification or require manual confirmation from |
233 |
the user. |
234 |
|
235 |
Furthermore, the Manifest provider may employ additional methods |
236 |
of distributing the timestamps of recently generated Manifests |
237 |
using a secure channel from a trusted source for exact comparison. |
238 |
The exact details of such a solution are outside the scope of this |
239 |
specification. |
240 |
|
241 |
``TIMESTAMP`` entries may also be present in sub-Manifests. Those |
242 |
timestamps must not be newer than the timestamp of the top-level |
243 |
Manifest (if present). This specification does not define any specific |
244 |
use for them. |
245 |
|
246 |
|
247 |
Modern Manifest tags |
248 |
-------------------- |
249 |
|
250 |
The Manifest files can specify the following tags: |
251 |
|
252 |
``TIMESTAMP <iso8601>`` |
253 |
Specifies a timestamp of when the Manifest file was last updated. |
254 |
The timestamp must be a valid second-precision ISO 8601 extended |
255 |
format combined date and time in UTC timezone, i.e. using |
256 |
the following ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. |
257 |
Optional. The package manager can use it to detect an outdated |
258 |
repository checkout as described in `Timestamp verification`_. |
259 |
|
260 |
``MANIFEST <path> <size> <checksums>...`` |
261 |
Specifies a sub-Manifest. The sub-Manifest must be verified like |
262 |
a regular file. If the verification succeeds, the entries from |
263 |
the sub-Manifest are included for verification as described |
264 |
in `Manifest file locations and nesting`_. |
265 |
|
266 |
``IGNORE <path>`` |
267 |
Ignores a subdirectory or file from Manifest checks. If the specified |
268 |
path is present, it and its contents are omitted from the Manifest |
269 |
verification (always pass). *Path* must be a plain file or directory |
270 |
path without a trailing slash. Wildcards are not supported |
271 |
and wildcard characters are interpreted literally. |
272 |
|
273 |
``DATA <path> <size> <checksums>...`` |
274 |
Specifies a regular file subject to Manifest verification. The file |
275 |
is required to pass verification. Used for all files that do not match |
276 |
any other type. |
277 |
|
278 |
``DIST <filename> <size> <checksums>...`` |
279 |
Specifies a distfile entry used to verify files fetched as part |
280 |
of ``SRC_URI``. The filename must match the filename used to store |
281 |
the fetched file as specified in the PMS [#PMS-FETCH]_. The package |
282 |
manager must reject the fetched file if it fails verification. |
283 |
``DIST`` entries apply to all packages below the Manifest file |
284 |
specifying them. |
285 |
|
286 |
|
287 |
Deprecated Manifest tags |
288 |
------------------------ |
289 |
|
290 |
For backwards compatibility, the following tags are additionally |
291 |
allowed at the package directory level: |
292 |
|
293 |
``EBUILD <filename> <size> <checksums>...`` |
294 |
Equivalent to the ``DATA`` type. |
295 |
|
296 |
``MISC <path> <size> <checksums>...`` |
297 |
Equivalent to the ``DATA`` type. Historically indicated that |
298 |
the package manager may ignore a verification failure if operating |
299 |
in non-strict mode. However, that behavior is deprecated. |
300 |
|
301 |
``AUX <filename> <size> <checksums>...`` |
302 |
Equivalent to the ``DATA`` type, except that the filename is relative |
303 |
to the ``files/`` subdirectory. |
304 |
|
305 |
|
306 |
Algorithm for full-tree verification |
307 |
------------------------------------ |
308 |
|
309 |
In order to perform full-tree verification, the following algorithm |
310 |
can be used: |
311 |
|
312 |
1. Collect all files present in the repository into *present* set. |
313 |
|
314 |
2. Start at the top-level Manifest file. Verify its OpenPGP signature. |
315 |
Optionally verify the ``TIMESTAMP`` entry if present as specified |
316 |
in `timestamp verification`. Remove the top-level Manifest |
317 |
from the *present* set. |
318 |
|
319 |
3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest |
320 |
files according to the `file verification`_ section, and include |
321 |
their entries in the current Manifest entry list (using paths |
322 |
relative to directories containing the Manifests). |
323 |
|
324 |
4. Process all ``IGNORE`` entries. Remove any paths matching them |
325 |
from the *present* set. |
326 |
|
327 |
5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD`` |
328 |
and ``AUX`` entries into the *covered* set. |
329 |
|
330 |
6. Verify the entries in the *covered* set for incompatible duplicates |
331 |
and collisions with ignored files as explained in `Manifest file |
332 |
locations and nesting`_. |
333 |
|
334 |
7. Verify all the files in the union of the *present* and *covered* |
335 |
sets, according to the `file verification`_ section. |
336 |
|
337 |
|
338 |
Algorithm for finding parent Manifests |
339 |
-------------------------------------- |
340 |
|
341 |
In order to find the top-level Manifest from the current directory |
342 |
the following algorithm can be used: |
343 |
|
344 |
1. Store the current directory as *original* and the device ID |
345 |
of the containing filesystem (``st_dev``) as *startdev*, |
346 |
|
347 |
2. If the device ID of the containing filesystem (``st_dev``) |
348 |
of the current directory is different than *startdev*, stop. |
349 |
|
350 |
3. If the current directory contains a ``Manifest`` file: |
351 |
|
352 |
a. If an ``IGNORE`` entry in the ``Manifest`` file covers |
353 |
the *original* directory (or one of the parent directories), stop. |
354 |
|
355 |
b. Otherwise, store the current directory as *last_found*. |
356 |
|
357 |
4. If the current directory is the root system directory (``/``), stop. |
358 |
|
359 |
5. Otherwise, enter the parent directory and jump to step 2. |
360 |
|
361 |
Once the algorithm stops, *last_found* will contain the relevant |
362 |
top-level Manifest. If *last_found* is null, then the directory tree |
363 |
does not contain any valid top-level Manifest candidates and one should |
364 |
be created in the *original* directory. |
365 |
|
366 |
Once the top-level Manifest is found, its ``MANIFEST`` entries should |
367 |
be used to find any sub-Manifests below the top-level Manifest, |
368 |
up to and including the *original* directory. Note that those |
369 |
sub-Manifests can use different filenames than ``Manifest``. |
370 |
|
371 |
|
372 |
Checksum algorithms |
373 |
------------------- |
374 |
|
375 |
This section is informational only. Specifying the exact set |
376 |
of supported algorithms is outside the scope of this specification. |
377 |
|
378 |
The algorithm names reserved at the time of writing are: |
379 |
|
380 |
- ``MD5`` [#MD5]_, |
381 |
- ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_, |
382 |
- ``SHA1`` [#SHS]_, |
383 |
- ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_, |
384 |
- ``WHIRLPOOL`` [#WHIRLPOOL]_, |
385 |
- ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_, |
386 |
- ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_, |
387 |
- ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes |
388 |
[#STREEBOG]_. |
389 |
|
390 |
The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_. |
391 |
It is recommended that any new hashes are named after the Python |
392 |
``hashlib`` module algorithm names, transformed into uppercase. |
393 |
|
394 |
|
395 |
Manifest compression |
396 |
-------------------- |
397 |
|
398 |
The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_. |
399 |
This section merely addresses interoperability issues between Manifest |
400 |
compression and this specification. |
401 |
|
402 |
The compressed Manifest files are required to be suffixed for their |
403 |
compression algorithm. This suffix should be used to recognize |
404 |
the compression and decompress Manifests transparently. The exact list |
405 |
of algorithms and their corresponding suffixes are outside the scope |
406 |
of this specification. |
407 |
|
408 |
The top-level Manifest file must not be compressed. Since the OpenPGP |
409 |
signature covers the uncompressed text and is compressed itself, |
410 |
the data would have to be decompressed without any prior verification. |
411 |
This could expose users e.g. to zip bombs or exploits on decompressor |
412 |
vulnerabilities. |
413 |
|
414 |
Whenever this specification refers to sub-Manifests, they can use any |
415 |
names but are also required to use a specific compression suffix. |
416 |
The ``MANIFEST`` entries are required to specify the full name including |
417 |
compression suffix, and the verification is performed on the compressed |
418 |
file. |
419 |
|
420 |
The specification permits uncompressed Manifests to exist alongside |
421 |
their compressed counterparts, and multiple compressed formats |
422 |
to coexist. If that is the case, the files must have the same |
423 |
uncompressed content and the specification is free to choose either |
424 |
of the files using the same base name. |
425 |
|
426 |
|
427 |
Combining multiple Manifest trees (informational) |
428 |
------------------------------------------------- |
429 |
|
430 |
This specification permits nesting multiple hierarchical Manifest trees. |
431 |
In this layout, the specific directories of the Manifest tree can |
432 |
be verified both as a part of another top-level Manifest, |
433 |
and as an independent Manifest tree (when obtained without the parent |
434 |
directory). |
435 |
|
436 |
For this to work, the sub-Manifest file in the directory must also |
437 |
satisfy the requirements for the top-level Manifest file. That is: |
438 |
|
439 |
- it must be named ``Manifest`` and not compressed, |
440 |
|
441 |
- it must cover all the files in this directory and its subdirectories |
442 |
(i.e. no files from the directory tree can be covered by parent |
443 |
Manifest), |
444 |
|
445 |
- if authenticity verification is desired, it must be OpenPGP-signed. |
446 |
|
447 |
It should be noted that if such a directory is a subdirectory of a valid |
448 |
Manifest tree, the sub-Manifest needs to be valid according |
449 |
to the top-level Manifest and the OpenPGP signature is disregarded |
450 |
as detailed in `Manifest file locations and nesting`_. The top-level |
451 |
behavior is exhibited only when the directory is obtained without parent |
452 |
directories. |
453 |
|
454 |
|
455 |
An example Manifest file (informational) |
456 |
---------------------------------------- |
457 |
|
458 |
An example top-level Manifest file for the Gentoo repository would have |
459 |
the following content:: |
460 |
|
461 |
TIMESTAMP 2017-10-30T10:11:12Z |
462 |
IGNORE distfiles |
463 |
IGNORE local |
464 |
IGNORE lost+found |
465 |
IGNORE packages |
466 |
MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb.. |
467 |
... |
468 |
MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915.. |
469 |
... |
470 |
|
471 |
An example modern Manifest (disregarding backwards compatibility) |
472 |
for a package directory would have the following content:: |
473 |
|
474 |
DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d.. |
475 |
DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749.. |
476 |
DATA metadata.xml 664 SHA256 97c6.. SHA512 1175.. |
477 |
DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468.. |
478 |
DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919.. |
479 |
DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33.. |
480 |
DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d.. |
481 |
|
482 |
|
483 |
Rationale |
484 |
========= |
485 |
|
486 |
Stand-alone format |
487 |
------------------ |
488 |
|
489 |
The first question that needed to be asked before proceeding with |
490 |
the design was whether the Manifest file format was supposed to be |
491 |
stand-alone, or tightly bound to the repository format. |
492 |
|
493 |
The stand-alone format has been selected because of its three |
494 |
advantages: |
495 |
|
496 |
1. It is more future-proof. If an incompatible change to the repository |
497 |
format is introduced, only developers need to upgrade the tools |
498 |
they use to generate the Manifests. The tools used to verify |
499 |
the updated Manifests will continue to work. |
500 |
|
501 |
2. It is more flexible and universal. With a dedicated tool, |
502 |
the Manifest files can be used to sign and verify arbitrary file |
503 |
sets. |
504 |
|
505 |
3. It keeps the verification tool simpler. In particular, we can easily |
506 |
write an independent verification tool that could work on any |
507 |
distribution without needing to depend on a package manager |
508 |
implementation or rewrite parts of it. |
509 |
|
510 |
Designing a stand-alone format requires that the Manifest carries enough |
511 |
information to perform the verification following all the rules specific |
512 |
to the Gentoo repository. |
513 |
|
514 |
|
515 |
Tree design |
516 |
----------- |
517 |
|
518 |
The second important point of the design was determining whether |
519 |
the Manifest files should be structured hierarchically, or independent. |
520 |
Both options have their advantages. |
521 |
|
522 |
In the hierarchical model, each sub-Manifest file is covered by a higher |
523 |
level Manifest. As a result, only the top-level Manifest has to be |
524 |
OpenPGP-signed, and subsequent Manifests need to be only verified by |
525 |
checksum stored in the parent Manifest. This has the following |
526 |
implications: |
527 |
|
528 |
- Verifying any set of files in the repository requires using checksums |
529 |
from the most relevant Manifests and the parent Manifests. |
530 |
|
531 |
- The OpenPGP signature of the top-level Manifest needs to be verified |
532 |
only once per process. |
533 |
|
534 |
- Altering any set of files requires updating the relevant Manifests, |
535 |
and their parent Manifests up to the top-level Manifest, and signing |
536 |
the last one. |
537 |
|
538 |
- As a result, the top-level Manifest changes on every commit, |
539 |
and various middle-level Manifests change (and need to be transferred) |
540 |
frequently. |
541 |
|
542 |
In the independent model, each sub-Manifest file is independent |
543 |
of the parent Manifests. As a result, each of them needs to be signed |
544 |
and verified independently. However, the parent Manifests still need |
545 |
to list sub-Manifests (albeit without verification data) in order |
546 |
to detect removal or replacement of subdirectories. This has |
547 |
the following implications: |
548 |
|
549 |
- Verifying any set of files in the repository requires using checksums |
550 |
and verifying signatures of the most relevant Manifest files. |
551 |
|
552 |
- Altering any set of files requires updating the relevant Manifests |
553 |
and signing them again. |
554 |
|
555 |
- Parent Manifests are updated only when Manifests are added or removed |
556 |
from subdirectories. As a result, they change infrequently. |
557 |
|
558 |
While both models have their advantages, the hierarchical model was |
559 |
selected because it reduces the number of OpenPGP operations |
560 |
(which are comparatively costly) to the minimum. |
561 |
|
562 |
|
563 |
Tree layout restrictions |
564 |
------------------------ |
565 |
|
566 |
The algorithm is meant to work primarily with ebuild repositories which |
567 |
normally contain only files and directories. Directories provide |
568 |
no useful metadata for verification, and specifying special entries |
569 |
for additional file types is purposeless. Therefore, the specification |
570 |
is restricted to dealing with regular files. |
571 |
|
572 |
The Gentoo repository does not use symbolic links. Some Gentoo |
573 |
repositories do, however. To provide a simple solution for dealing with |
574 |
symlinks without having to take care to implement special handling for |
575 |
them, the common behavior of implicitly resolving them is used. |
576 |
Therefore, symbolic links to files are stored as if they were regular |
577 |
files, and symbolic links to directories are followed as if they were |
578 |
regular directories. |
579 |
|
580 |
Dotfiles are implicitly ignored as that is a common notion used |
581 |
in software written for POSIX systems. All other filenames require |
582 |
explicit ``IGNORE`` lines. |
583 |
|
584 |
An ability to inject additional ignore entries is provided to account |
585 |
for site configuration affecting the repository tree -- placing |
586 |
additional files in it, skipping some of the categories from syncing. |
587 |
This configuration can extend beyond the limits of this GLEP, |
588 |
e.g. by allowing wildcards or regular expressions. |
589 |
|
590 |
The algorithm is restricted to work on a single filesystem. This is |
591 |
mostly relevant when scanning for top-level Manifest -- we do not want |
592 |
to cross filesystem boundaries then. However, to ensure consistent |
593 |
bidirectional behavior we need to also ban them when operating downwards |
594 |
the tree. |
595 |
|
596 |
The directories and files on different filesystems need to be ignored |
597 |
explicitly as implicitly skipping them would cause confusion. |
598 |
In particular, tools might then claim that a file does not exist when |
599 |
it clearly does because it was skipped due to filesystem boundaries. |
600 |
|
601 |
|
602 |
Filename character set restriction |
603 |
---------------------------------- |
604 |
|
605 |
The valid set of filename characters for the Gentoo repository |
606 |
is restricted by the devmanual 'File Naming Rules' section |
607 |
[#FILE-NAMING-RULES]_, and enforced via a git hook. The valid distfile |
608 |
names are not restricted explicitly -- however, the PMS dependency |
609 |
specification syntax [#PMS-FETCH]_ implicitly makes it impossible to use |
610 |
filenames containing whitespace. |
611 |
|
612 |
This specification aims to avoid arbitrary restrictions. For this |
613 |
reason, filename characters are only restricted by excluding three |
614 |
technically problematic groups: |
615 |
|
616 |
1. The backwards slash character (``\``) is used as path separator |
617 |
on Windows systems, so it's extremely unlikely to be used in real |
618 |
filenames. For this reason it is used to implement character |
619 |
encoding with minimal risk of breaking backwards compatibility. |
620 |
|
621 |
2. The control characters can trigger special behavior in various |
622 |
programs and confuse them from recognizing text files. In particular, |
623 |
the NULL character (``U+0000``) is normally used to indicate the end |
624 |
of a null-terminated string. Its use could therefore break |
625 |
implementations written in the C language. Other control characters |
626 |
could trigger various formatting routines, garbling text output. |
627 |
|
628 |
3. Whitespace characters are used to separate Manifest fields |
629 |
and entries. While technically it would be enough to restrict space |
630 |
(``U+0020``) character that is normally used as the separator |
631 |
and newline (``U+000A``) character that is used to separate lines, |
632 |
all whitespace characters are forbidden to avoid confusion |
633 |
and implementation errors. |
634 |
|
635 |
Historically, Portage attempted to overcome the whitespace limitation |
636 |
by attempting to locate the size field and take everything before it |
637 |
as filename. This was terribly fragile and even if it worked, it would |
638 |
solve the problem only partially. |
639 |
|
640 |
To preserve compatibility with the current implementations and given |
641 |
that all of the listed characters are not allowed for the foreseeable |
642 |
Gentoo uses, extended encoding support is optional. If such support |
643 |
is not provided, the implementation must unconditionally reject any |
644 |
such files. Ignoring them implicitly would be confusing, and it is |
645 |
not possible to use them in explicit ``IGNORE`` entries. |
646 |
|
647 |
The character encoding method provides means to overcome the character |
648 |
restrictions to extend the tool usability beyond immediate Gentoo uses. |
649 |
The backslash escape form based on Python unicode strings is used |
650 |
since it can encode all characters within the Unicode range, the syntax |
651 |
is familiar to many programmers and the backwards slash character |
652 |
is extremely unlikely to appear in real filenames. |
653 |
|
654 |
Syntax is limited to the minimum necessary to implement the encoding. |
655 |
Shorthand forms (e.g. ``\t`` or ``\\``) are omitted to avoid unnecessary |
656 |
complexity, and to reduce the risk of shell users using backslash |
657 |
to escape space directly. The ``\x`` form is limited to ``\x00..\x7F`` |
658 |
range to avoid ambiguity of higher values which might be interpreted |
659 |
either as UCS-2 code points or part of a UTF-8 encoded character. |
660 |
|
661 |
Encoding stores UCS-2/UCS-4 characters directly rather than hex-encoded |
662 |
UTF-8 string to simplify the implementation. In particular, it makes it |
663 |
possible to process the Manifest file as UTF-8 encoded text without |
664 |
having to perform additional UTF-8 decoding (and verification) |
665 |
of the escaped data. |
666 |
|
667 |
URL-encoding was considered as an alternative. However, it could collide |
668 |
with ``DIST`` entries that are implicitly named after the URL filename |
669 |
part where URL-encoding is pretty common. |
670 |
|
671 |
|
672 |
File verification model |
673 |
----------------------- |
674 |
|
675 |
The verification model aims to provide full coverage against different |
676 |
forms of attack. In particular, three different kinds of manipulation |
677 |
are considered: |
678 |
|
679 |
1. Alteration of the file content. |
680 |
|
681 |
2. Removal of a file. |
682 |
|
683 |
3. Addition of a new file. |
684 |
|
685 |
In order to prevent against all three, the system requires that all |
686 |
files in the repository are listed in Manifests and verified against |
687 |
them. |
688 |
|
689 |
As a special case, ignores are allowed to account for directories |
690 |
that are not part of the repository but were traditionally placed inside |
691 |
it. Those directories were ``distfiles``, ``local`` and ``packages``. It |
692 |
could be also used to ignore VCS directories such as ``CVS``. |
693 |
|
694 |
|
695 |
Non-strict Manifest verification |
696 |
-------------------------------- |
697 |
|
698 |
Originally the Manifest2 format provided a special ``MISC`` tag that |
699 |
was used for ``metadata.xml`` and ``ChangeLog`` files. This tag |
700 |
indicated that the Manifest verification failures could be ignored for |
701 |
those files unless the package manager was working in strict mode. |
702 |
|
703 |
The first versions of this specification continued the use of this tag. |
704 |
However, after a long debate it was decided to deprecate it along with |
705 |
the non-strict behavior, and require all files to strictly match. |
706 |
|
707 |
Two arguments were mentioned for the usefulness of a ``MISC`` type: |
708 |
|
709 |
1. being able to reduce the checkout size by stripping unnecessary |
710 |
files out, and |
711 |
|
712 |
2. being able to update automatically generated files locally |
713 |
without causing unnecessary verification failures. |
714 |
|
715 |
However, the usefulness of ``MISC`` in both cases is doubtful. |
716 |
|
717 |
The cases for stripping unnecessary files mostly focused around space |
718 |
savings. For this purpose, stripping ``metadata.xml`` and similar files |
719 |
has little value. It is much more common for users to strip whole |
720 |
packages or categories. The ``MISC`` type is not suitable for that, |
721 |
and so a dedicated package manager mechanism needs to be developed |
722 |
instead. The same mechanism can also handle files that historically used |
723 |
the ``MISC`` type. As an example, the package manager may choose |
724 |
to generate both the rsync exclusion list and Manifest ignore list |
725 |
using a single source list. |
726 |
|
727 |
The cases for autogenerated files involve such cache files |
728 |
as ``use.local.desc``. However, we can not include ``md5-cache`` there |
729 |
due to security concerns which results in inconsistent cache handling. |
730 |
Furthermore, the tools were historically modified to provide stable |
731 |
output which means that their content can not change without |
732 |
a non-``MISC`` content being changed first. This practically defeats |
733 |
the purpose of using ``MISC``. |
734 |
|
735 |
Finally, the non-strict mode could be used as means to an attack. |
736 |
The allowance of missing or modified documentation file could be used |
737 |
to spread misinformation, resulting in bad decisions made by the user. |
738 |
A modified file could also be used, e.g. to exploit vulnerabilities |
739 |
of an XML parser. |
740 |
|
741 |
|
742 |
Timestamp field |
743 |
--------------- |
744 |
|
745 |
The top-level Manifest optionally allows using a ``TIMESTAMP`` tag |
746 |
to include a generation timestamp in the Manifest. A similar feature |
747 |
was originally proposed in GLEP 58 [#GLEP58]_. |
748 |
|
749 |
A malicious third-party may use the principles of exclusion or replay |
750 |
[#C08]_ to deny an update to clients, while at the same time recording |
751 |
the identity of clients to attack. The timestamp field can be used to |
752 |
detect that. |
753 |
|
754 |
In order to provide more complete protection, the Gentoo Infrastructure |
755 |
should provide an ability to obtain the timestamps of all Manifests |
756 |
from a recent timeframe over a secure channel from a trusted source |
757 |
for comparison. |
758 |
|
759 |
Strictly speaking, this information is provided by the various |
760 |
``metadata/timestamp*`` files that are already present. However, |
761 |
including the value in the Manifest itself has a little cost |
762 |
and provides the ability to perform the verification stand-alone. |
763 |
|
764 |
Furthermore, some of the timestamp files are added very late |
765 |
in the distribution process, past the Manifest generation phase. Those |
766 |
files will most likely receive ``IGNORE`` entries and therefore |
767 |
be unsafe to use. |
768 |
|
769 |
The specification permits additional timestamps in sub-Manifest files |
770 |
for local use. A generic testing tool should ignore them. |
771 |
|
772 |
|
773 |
New vs deprecated tags |
774 |
---------------------- |
775 |
|
776 |
Out of the four types defined by Manifest2, only one is reused |
777 |
and the remaining three are replaced by a single, universal ``DATA`` |
778 |
type. |
779 |
|
780 |
The ``DIST`` tag is reused since the specification does not change |
781 |
anything with regard to distfile handling. |
782 |
|
783 |
The ``EBUILD`` tag could potentially be reused for generic file |
784 |
verification data. However, it would be confusing if all the different |
785 |
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA`` |
786 |
type was introduced as a replacement. |
787 |
|
788 |
The ``MISC`` tag and the relevant non-strict mode has been removed |
789 |
as being of little value, as detailed in the `Non-strict Manifest |
790 |
verification`_ section. |
791 |
|
792 |
The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has |
793 |
the limiting property of implicit ``files/`` path prefix. |
794 |
|
795 |
|
796 |
Finding top-level Manifest |
797 |
-------------------------- |
798 |
|
799 |
The development of a reference implementation for this GLEP has brought |
800 |
the following problem: how to find all the relevant Manifests when |
801 |
the Manifest tool is run inside a subdirectory of the repository? |
802 |
|
803 |
One of the options would be to provide a bi-directional linking |
804 |
of Manifests via a ``PARENT`` tag. However, that would not solve |
805 |
the problem when a new Manifest file is being created. |
806 |
|
807 |
Instead, an algorithm for iterating over parent directories is proposed. |
808 |
Since there is no obligatory explicit indicator for the top-level |
809 |
Manifest, the algorithm assumes that the top-level Manifest |
810 |
is the highest ``Manifest`` in the directory hierarchy that can cover |
811 |
the current directory. This generally makes sense since the Manifest |
812 |
files are required to provide coverage for all subdirectories, so all |
813 |
Manifests starting from that one need to be updated. |
814 |
|
815 |
If independent Manifest trees are nested in the directory structure, |
816 |
then an ``IGNORE`` entry needs to be used to separate them. |
817 |
|
818 |
Since sub-Manifests can use any filenames, the Manifest finding |
819 |
algorithm must not short-cut the procedure by storing all ``Manifest`` |
820 |
files along the parent directories. Instead, it needs to retrace |
821 |
the relevant sub-Manifest files along ``MANIFEST`` entries |
822 |
in the top-level Manifest. |
823 |
|
824 |
|
825 |
Injecting ChangeLogs into the checkout |
826 |
-------------------------------------- |
827 |
|
828 |
One of the problems considered in the new Manifest format was injecting |
829 |
historical and autogenerated ChangeLog into the repository. We normally |
830 |
don't include those files, to reduce the checkout size. However, some |
831 |
users have shown interest in them and Infra is working on providing them |
832 |
via an additional rsync module. |
833 |
|
834 |
If such files were injected into the repository, they would cause |
835 |
verification failures of Manifests. To account for this, Infra could |
836 |
provide ``IGNORE`` entries to allow them to exist. |
837 |
|
838 |
|
839 |
Splitting distfile checksums from file checksums |
840 |
------------------------------------------------ |
841 |
|
842 |
Another problem with the current Manifest format is that the checksums |
843 |
for fetched files are combined with checksums for local files |
844 |
in a single file inside the package directory. It has been specifically |
845 |
pointed out that: |
846 |
|
847 |
- since distfiles are sometimes reused across different packages, |
848 |
the repeating checksums are redundant [#DIST]_. |
849 |
|
850 |
- mirror admins were interested in the possibility of verifying all |
851 |
the distfiles with a single tool. |
852 |
|
853 |
This specification does not provide a clean solution to this problem. |
854 |
It technically permits moving ``DIST`` entries to higher-level Manifests |
855 |
but the usefulness of such a solution is doubtful. |
856 |
|
857 |
However, for the second problem we will probably deliver a dedicated |
858 |
tool working with this Manifest format. |
859 |
|
860 |
|
861 |
Hash algorithms |
862 |
--------------- |
863 |
|
864 |
While maintaining a consistent supported hash set is important |
865 |
for interoperability, it is not a good fit for the generic layout |
866 |
of this GLEP. Furthermore, it would require updating the GLEP |
867 |
in the future every time the used algorithms change. |
868 |
|
869 |
Instead, the specification focuses on listing the currently used |
870 |
algorithm names for interoperability, and sets a recommendation |
871 |
for consistent naming of algorithms in the future. The Python |
872 |
``hashlib`` module is used as a reference since it is used |
873 |
as the provider of hash functions for most of the Python software, |
874 |
including Portage and PkgCore. |
875 |
|
876 |
The basic rules for changing hash algorithms are defined in GLEP 59 |
877 |
[#GLEP59]_. The implementations can focus only on those algorithms |
878 |
that are actually used or planned on being used. It may be feasible |
879 |
to devise a new GLEP that specifies the currently used hashes (or update |
880 |
GLEP 59 accordingly). |
881 |
|
882 |
|
883 |
Manifest compression |
884 |
-------------------- |
885 |
|
886 |
The support for Manifest compression is introduced with minimal changes |
887 |
to the file format. The ``MANIFEST`` entries are required to provide |
888 |
the real (compressed) file path for compatibility with other file |
889 |
entries and to avoid confusion. |
890 |
|
891 |
The compression of top-level Manifest file has been prohibited |
892 |
as the specification currently does not provide any means of verifying |
893 |
the file prior to decompression. If the top-level Manifest is |
894 |
compressed, tooling will have to unpack the file before being able |
895 |
to verify the contents. This makes it possible for a malicious third |
896 |
party to attack the system by providing a compressed Manifest that |
897 |
exposes decompressor vulnerabilities, or a zip bomb. |
898 |
|
899 |
The OpenPGP cleartext signature covers the contents of the Manifest, |
900 |
and is therefore compressed along with them. The possibility of using |
901 |
a detached signature has been considered but it was rejected as |
902 |
unnecessary complexity for minor gain. |
903 |
|
904 |
Technically, a similar result could be effected via moving all the data |
905 |
into a compressed sub-Manifest in the top directory (e.g. |
906 |
``Manifest.sub.gz``), and including a ``MANIFEST`` entry for this file |
907 |
in a signed, uncompressed top-level Manifest. |
908 |
|
909 |
The existence of additional entries for uncompressed Manifest checksums |
910 |
was debated. However, plain entries for the uncompressed file would |
911 |
be confusing if only the compressed file existed, and conflicting |
912 |
if both uncompressed and compressed variants existed. Furthermore, |
913 |
it has been pointed out that ``DIST`` entries do not have |
914 |
an uncompressed variant either. |
915 |
|
916 |
|
917 |
Performance considerations |
918 |
-------------------------- |
919 |
|
920 |
Performing a full-tree verification on every sync raises some |
921 |
performance concerns for end-user systems. The initial testing has shown |
922 |
that a cold-cache verification on a btrfs file system can take up around |
923 |
4 minutes, with the process being mostly I/O bound. On the other hand, |
924 |
it can be expected that the verification will be performed directly |
925 |
after syncing, taking advantage of a warm filesystem cache. |
926 |
|
927 |
To improve speed on I/O and/or CPU-restrained systems even further, |
928 |
the algorithms can be easily extended to perform incremental |
929 |
verification. Given that rsync does not preserve mtimes by default, |
930 |
the tool can take advantage of mtime and Manifest comparisons to recheck |
931 |
only the parts of the repository that have changed. |
932 |
|
933 |
Furthermore, the package manager implementations can restrict checking |
934 |
only to the parts of the repository that are actually being used. |
935 |
|
936 |
|
937 |
Backwards Compatibility |
938 |
======================= |
939 |
|
940 |
This GLEP provides optional means of preserving backwards compatibility. |
941 |
To preserve the backwards compatibility, the following needs to hold |
942 |
for the ``Manifest`` file in every package directory: |
943 |
|
944 |
- all files must be covered by the single ``Manifest`` file, |
945 |
|
946 |
- all distfiles used by the package must be included, |
947 |
|
948 |
- all files inside the ``files/`` subdirectory need to use |
949 |
the ``AUX`` tag (rather than ``DATA``), |
950 |
|
951 |
- all ``.ebuild`` files need to use the ``EBUILD`` tag, |
952 |
|
953 |
- the ``metadata.xml`` and ``ChangeLog`` files need to use |
954 |
the ``MISC`` tag, |
955 |
|
956 |
- the Manifest can be signed to provide authenticity verification, |
957 |
|
958 |
- an uncompressed Manifest must always exist, and a compressed Manifest |
959 |
of identical content may be present. |
960 |
|
961 |
Once the backwards compatibility is no longer a concern, the above |
962 |
no longer needs to hold and the deprecated tags can be removed. |
963 |
|
964 |
|
965 |
Reference Implementation |
966 |
======================== |
967 |
|
968 |
The reference implementation for this GLEP is being developed |
969 |
as the gemato project [#GEMATO]_. |
970 |
|
971 |
|
972 |
Credits |
973 |
======= |
974 |
|
975 |
Thanks to all the people whose contributions were invaluable |
976 |
to the creation of this GLEP. This includes but is not limited to: |
977 |
|
978 |
- Robin Hugh Johnson, |
979 |
- Ulrich Müller. |
980 |
|
981 |
Additionally, thanks to Robin Hugh Johnson for the original |
982 |
MetaManifest GLEP series which served both as inspiration and source |
983 |
of many concepts used in this GLEP. Recursively, also thanks to all |
984 |
the people who contributed to the original GLEPs. |
985 |
|
986 |
|
987 |
References |
988 |
========== |
989 |
|
990 |
.. [#GLEP44] GLEP 44: Manifest2 format |
991 |
(https://www.gentoo.org/glep/glep-0044.html) |
992 |
|
993 |
.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software |
994 |
- Overview |
995 |
(https://www.gentoo.org/glep/glep-0057.html) |
996 |
|
997 |
.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software |
998 |
- Infrastructure to User distribution - MetaManifest |
999 |
(https://www.gentoo.org/glep/glep-0058.html) |
1000 |
|
1001 |
.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications |
1002 |
(https://www.gentoo.org/glep/glep-0059.html) |
1003 |
|
1004 |
.. [#GLEP60] GLEP 60: Manifest2 filetypes |
1005 |
(https://www.gentoo.org/glep/glep-0060.html) |
1006 |
|
1007 |
.. [#GLEP61] GLEP 61: Manifest2 compression |
1008 |
(https://www.gentoo.org/glep/glep-0061.html) |
1009 |
|
1010 |
.. [#UNICODE] The Unicode standard |
1011 |
(https://unicode.org/versions/latest/) |
1012 |
|
1013 |
.. [#PMS-FETCH] Package Manager Specification: Dependency Specification |
1014 |
Format - SRC_URI |
1015 |
(https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10) |
1016 |
|
1017 |
.. [#FILE-NAMING-RULES] Ebuild File Format -- Gentoo Development Guide |
1018 |
(https://devmanual.gentoo.org/ebuild-writing/file-format/#file-naming-rules) |
1019 |
|
1020 |
.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm |
1021 |
(https://www.ietf.org/rfc/rfc1321.txt) |
1022 |
|
1023 |
.. [#RIPEMD160] The hash function RIPEMD-160 |
1024 |
(https://homes.esat.kuleuven.be/~bosselae/ripemd160.html) |
1025 |
|
1026 |
.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS) |
1027 |
(http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf) |
1028 |
|
1029 |
.. [#WHIRLPOOL] The WHIRLPOOL Hash Function |
1030 |
(http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html) |
1031 |
|
1032 |
.. [#BLAKE2] BLAKE2 -- fast secure hashing |
1033 |
(https://blake2.net/) |
1034 |
|
1035 |
.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash |
1036 |
and Extendable-Output Functions |
1037 |
(http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf) |
1038 |
|
1039 |
.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function |
1040 |
(https://www.streebog.net/) |
1041 |
|
1042 |
.. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers" |
1043 |
(https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html) |
1044 |
|
1045 |
.. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries |
1046 |
at the time of writing are duplicate, representing 2 MiB |
1047 |
out of 25 MiB of DIST entries altogether. |
1048 |
|
1049 |
.. [#GEMATO] gemato: Gentoo Manifest Tool |
1050 |
(https://github.com/mgorny/gemato/) |
1051 |
|
1052 |
|
1053 |
Copyright |
1054 |
========= |
1055 |
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 |
1056 |
Unported License. To view a copy of this license, visit |
1057 |
http://creativecommons.org/licenses/by-sa/3.0/. |
1058 |
|
1059 |
-- |
1060 |
Best regards, |
1061 |
Michał Górny |