Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
Date: Thu, 02 Nov 2017 19:10:27
Message-Id: 1509649805.21210.10.camel@gentoo.org
In Reply to: Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files by "Robin H. Johnson"
1 W dniu pon, 30.10.2017 o godzinie 19∶56 +0000, użytkownik Robin H.
2 Johnson napisał:
3 > On Mon, Oct 30, 2017 at 05:51:36PM +0100, Michał Górny wrote:
4 > ...
5 > > Directory tree coverage
6 > > -----------------------
7 >
8 > This section should maybe cover OPTIONAL in more detail (see more
9 > below).
10
11 So I've removed OPTIONAL from the new version, so I'm going to skip all
12 the comments regarding it.
13
14 >
15 > > The Manifest files can also specify ``IGNORE`` entries to skip Manifest
16 > > verification of subdirectories and/or files. The package manager can
17 > > support injecting ignore paths to account for additional files created,
18 > > modified or removed by user's processes that would not be ignored
19 > > by existing rules. Files and directories starting with a dot are always
20 > > implicitly ignored. All files that are not ignored must be covered
21 > > by at least one of the Manifests.
22 >
23 > (English) There are multiple points in this paragraph, and I missed on first
24 > reading. The package manager part is esp. lost.
25
26 Replaced this with an enumeration.
27
28 > > > All files that are not ignored must be covered by at least one of the
29 > > > Manifests. Files may be ignored by several ways:
30 > > > - Files and directories starting with a dot are always implicitly
31 > > > ignored.
32 > > > - The Manifest files can specify ``IGNORE`` entries to skip
33 > > > verification of ubdirectories and/or files.
34 > > > - The package manager can support injecting ignore paths to account for
35 > > > additional files created modified or removed by user's processes that
36 > > > would not be ignored by existing rules.
37 > > File verification
38 > > -----------------
39 > > When verifying a file against the Manifest, the following rules are
40 > > used:
41 >
42 > ...
43 > > 3. If the file is covered by an entry of the ``OPTIONAL`` type:
44 > > a. if the file is present, then the verification fails,
45 > > b. otherwise, the verification succeeds.
46 >
47 > Move the OPTIONAL type further up in the verification list maybe? See
48 > the interpretation question below.
49 >
50 >
51 > > Modern Manifest tags
52 > > --------------------
53 >
54 > ...
55 > > ``IGNORE <path>``
56 > > Ignores a subdirectory or file from Manifest checks. If the specified
57 > > path is present, it and its contents are omitted from the Manifest
58 > > verification (always pass).
59 >
60 > Clarification Needed:
61 > Should subdirectories have a trailing slash in the Manifest or not?
62 > This affects matching of the type.
63 > Case 1.1:
64 > Manifest has 'IGNORE foo'; 'foo' is a file; => ignored.
65 > Case 1.2:
66 > Manifest has 'IGNORE foo'; 'foo' is a directory; => ignored.
67 > Case 2.1:
68 > Manifest has 'IGNORE foo/'; 'foo' is a file; => FAIL
69 > Case 2.2:
70 > Manifest has 'IGNORE foo/'; 'foo' is a directory; => ignored.
71
72 Added a syntax clarification, and noted that we don't support wildcards.
73
74 >
75 >
76 > > ``OPTIONAL <path>``
77 > > Specifies a file that does not exist in the distribution but if it
78 > > did, it would be marked as ``MISC``. In the strict mode, the file
79 > > must not exist for the verification to pass. The package manager
80 > > may ignore a stray file matching this entry if operating in non-strict
81 > > mode.
82 >
83 > This has gotten less clear.
84 > Is the following correct interpretation?
85 > if(package manager is strict) then
86 > Treat the OPTIONAL entry as NOT present in the Manifest.
87 > This will cause files to be in the present set but not the covered set.
88 > else
89 > Treat the OPTIONAL entry as 'IGNORE <path>'
90 > endif
91 >
92 > > ``DIST <filename> <size> <checksums>…``
93 > > Specifies a distfile entry used to verify files fetched as part
94 > > of ``SRC_URI``. The filename must match the filename used to store
95 > > the fetched file as specified in the PMS [#PMS-FETCH]_. The package
96 > > manager must reject the fetched file if it fails verification.
97 > > ``DIST`` entries apply to all packages below the Manifest file
98 > > specifying them.
99 >
100 > Nit: You have used a unicode ellipsis '…' in some places and plain ASCII
101 > ellipsis '...' in others. Stick to ASCII?
102
103 Well, I've actually used ASCII only in the code samples because it felt
104 more right but I've switched to Unicode everywhere now, to match dashes.
105
106 >
107 >
108 > > An example Manifest file (informational)
109 > > ----------------------------------------
110 >
111 > Can you add an example for OPTIONAL?
112 >
113 > > Tree layout restrictions
114 > > ------------------------
115 > > The Gentoo repository does not use symbolic links. Some Gentoo
116 > > repositories do, however. To provide a simple solution for dealing with
117 > > symlinks without having to take care to implement special handling for
118 > > them, the common behavior of implicitly resolving them is used.
119 > > Therefore, symbolic links to files are stored as if they were regular
120 > > files, and symbolic links to directories are followed as if they were
121 > > regular directories.
122 >
123 > Clarification: should cross-device symlinks be rejected? (perhaps
124 > implicit, but wanted to check)
125 > If so, need to add to 'Algorithm for full-tree verification' section.
126
127 Yes, it's implied by the rules in `Directory tree coverage`_. Not sure
128 about adding it there. We also don't explicitly tell people to verify
129 file type. And in any case, I think it belongs more in `File
130 verification`_ algorithm since that needs to be done separately for
131 every file.
132
133 > > Dotfiles are implicitly ignored as that is a common notion used
134 > > in software written for POSIX systems. All other filenames require
135 > > explicit ``IGNORE`` lines.
136 >
137 > This paragraph should re-iterate that the package manager may specify
138 > additional files to be ignored per the user.
139
140 Added extra paragraph about it, with example uses.
141
142 >
143 > > The algorithm is restricted to work on a single filesystem. This is
144 > > mostly relevant when scanning for top-level Manifest — we do not want
145 > > to cross filesystem boundaries then. However, to ensure consistent
146 > > bidirectional behavior we need to also ban them when operating downwards
147 > > the tree.
148 > >
149 > > The directories and files on different filesystems needs to be ignored
150 > > explicitly as implicitly skipping them would cause confusion.
151 > > In particular, tools might then claim that a file does not exist when
152 > > it clearly does because it was skipped due to filesystem boundaries.
153 >
154 > The downward path needs to check the device on files.
155 > Otherwise:
156 > cat/pn/Manifest
157 > cat/pn/files/ <-- different filesystem here
158
159 I don't understand how this comment is relevant here. It's required
160 earlier.
161
162 >
163 > > Non-obligatory Manifest verification
164 > > ------------------------------------
165 >
166 > ...
167 > > The traditional ``MISC`` type is amended with a complementary
168 > > ``OPTIONAL`` tag to account for files that are not provided
169 > > in the specific repository. It aims to ensure that the same path would
170 > > be non-fatal when provided by the repository but fatal when created
171 > > by the user tooling.
172 >
173 > Clarify the last sentence to be for strict mode only?
174
175 This wholes ection has been rewritten.
176
177 >
178 > > Splitting distfile checksums from file checksums
179 > > ------------------------------------------------
180 > >
181 > > Another problem with the current Manifest format is that the checksums
182 > > for fetched files are combined with checksums for local files
183 > > in a single file inside the package directory. It has been specifically
184 > > pointed out that:
185 > >
186 > > - since distfiles are sometimes reused across different packages,
187 > > the repeating checksums are redundant,
188 > >
189 > > - mirror admins were interested in the possibility of verifying all
190 > > the distfiles with a single tool.
191 > >
192 > > This specification does not provide a clean solution to this problem.
193 > > It technically permits moving ``DIST`` entries to higher-level Manifests
194 > > but the usefulness of such a solution is doubtful.
195 >
196 > Clarification of validity:
197 > If cat/pn1 and cat/pn2 share 1000 DIST files; would it be valid to
198 > have: the following:
199 > cat/pn1/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
200 > cat/pn1/Manifest:DIST unique-pn1-dist.tgz 1234 ...
201 > cat/pn2/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
202 > cat/pn2/Manifest:DIST unique-pn2-dist.tgz 1234 ...
203
204 Nope. Parent directory references are forbidden at the top. Also
205 MANIFEST is not a dumb INCLUDE but local path split.
206
207 > > Performance considerations
208 > > --------------------------
209 >
210 > ...
211 > > To improve speed on I/O and/or CPU-restrained systems even further,
212 > > the algorithms can be easily extended to perform incremental
213 > > verification. Given that rsync does not preserve mtimes by default,
214 > > the tool can take advantage of mtime and Manifest comparisons to recheck
215 > > only the parts of the repository that have changed.
216 >
217 > Implementation note, not for GLEP addition:
218 > If the package manager caches by filename,inode,mtime locally, it can
219 > then avoid repeat-checking of the hashes (it only needs a stat),
220 > provided that it is happy there was no local attacker who might perform
221 > an in-place modification of a file (mtime&inode remain the same).
222
223 I've went for mtime matching against a single value for all files, plus
224 size checks (since we need to read inode anyway). Also, I don't see
225 the case for inode change without mtime change.
226
227 > >
228 > > Furthermore, the package manager implementations can restrict checking
229 > > only to the parts of the repository that are actually being used.
230 > >
231 > >
232 > > Backwards Compatibility
233 > > =======================
234 > >
235 > > This GLEP provides optional means of preserving backwards compatibility.
236 > > To preserve the backwards compatibility, the following needs to be
237 > > ensured:
238 >
239 > "package directory" is common to all of the items here, if you move that
240 > to the list preamble, it's a lot cleaner to read.
241 > > > To preserve the backwards compatibility, the following needs to be
242 > > > ensured about package directories:
243 >
244 > And cleanup the list:
245 > s/in(side)? (that|the) package directory//
246 > s/of a package directory//
247 >
248
249 Done.
250
251 --
252 Best regards,
253 Michał Górny