Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
Date: Sun, 05 Nov 2017 21:10:44
Message-Id: 1509916232.21193.19.camel@gentoo.org
In Reply to: Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files by "Robin H. Johnson"
1 W dniu czw, 02.11.2017 o godzinie 23∶43 +0000, użytkownik Robin H.
2 Johnson napisał:
3 > On Thu, Nov 02, 2017 at 08:11:59PM +0100, Michał Górny wrote:
4 > > Next version. Now without MISC/OPTIONAL, and with many clarifications.
5 >
6 > Huge improvements in this version, I found it much easier to understand.
7 >
8 > Nits:
9 > - please stick to ASCII ellipsis. The unicode ellipsis is unreadable in
10 > some monospace fonts.
11
12 Done. Also replaced '—' for consistency.
13
14 >
15 > Further items inline:
16 > > Directory tree coverage
17 > > -----------------------
18 >
19 > ...
20 > > The file entries (except for ``IGNORE``) can be specified for regular
21 > > files only. Symbolic links are followed when opening files
22 > > and traversing directories. It is an error to specify an entry for
23 > > a different file type. If the tree contain files of other types
24 > > that are not otherwise ignored, they need to be covered by an explicit
25 > > ``IGNORE``.
26 > >
27 > > All the local (non-``DIST``) files covered by a Manifest tree must
28 > > reside on the same filesystem. It is an error to specify entries
29 > > applying to files on another filesystem. If subdirectories
30 > > that are not otherwise ignored reside on a different filesystem, they
31 > > must be explicitly excluded via ``IGNORE``.
32 >
33 > I would prefer this to say:
34 > 'If files that are not otherwise ignored reside on a different
35 > filesystem', as expanded from sub-directories.
36 > This implicitly forbids following a symlink that crosses a filesystem
37 > boundary, and then matches the similar part of 'Tree layout
38 > restrictions'.
39
40 I've went for something even more explicit:
41
42 | If files or directories that are not otherwise ignored reside
43 | on a different filesystem, or symbolic links point to targets
44 | on a different filesystem, they must be explicitly excluded
45 | via ``IGNORE``.
46
47
48 >
49 > > Rationale
50 > > =========
51 >
52 > ...
53 > > Tree layout restrictions
54 > > ------------------------
55 > >
56 > > The algorithm is meant to work primarily with ebuild repositories which
57 > > normally contain only files and directories. Directories provide
58 > > no useful metadata for verification, and specifying special entries
59 > > for additional file types is purposeless. Therefore, the specification
60 > > is restricted to dealing with regular files.
61 > >
62 > > The Gentoo repository does not use symbolic links. Some Gentoo
63 > > repositories do, however. To provide a simple solution for dealing with
64 > > symlinks without having to take care to implement special handling for
65 > > them, the common behavior of implicitly resolving them is used.
66 > > Therefore, symbolic links to files are stored as if they were regular
67 > > files, and symbolic links to directories are followed as if they were
68 > > regular directories.
69 > >
70 > > Dotfiles are implicitly ignored as that is a common notion used
71 > > in software written for POSIX systems. All other common filenames
72 > > require explicit ``IGNORE`` lines.
73 >
74 > 'common' in the second sentence seems odd. What about uncommon
75 > filenames? Maybe just s/other common filenames/other filenames/.
76
77 Done. The idea was to say 'do not put IGNORE for corner cases which are
78 better handled via PM config' but I guess it's not necessary here.
79
80 >
81 > > An ability to inject additional ignore entries is provided to account
82 > > for site configuration affecting the repository tree — placing
83 > > additional files in it, skipping some of the categories from syncing.
84 >
85 > Mention that the package manager may provide wildcards or regex in the
86 > additional entries. Eg: 'IGNORE **/metadata.xml'
87
88 Done.
89
90 | This configuration can extend beyond the limits of this GLEP,
91 | e.g. by allowing wildcards or regular expressions.
92
93 >
94 > > Non-strict Manifest verification
95 > > --------------------------------
96 >
97 > ...
98 > > The cases for stripping unnecessary files mostly focused around space
99 > > savings. For this purpose, stripping ``metadata.xml`` and similar files
100 > > has little value. It is much more common for users to strip whole
101 > > categories which can not be handled via the ``MISC`` type, and needs
102 > > a dedicated package manager mechanism. The same mechanism can also
103 > > handle files that used the ``MISC`` type.
104 >
105 > Exclusion by package does happen as well. A list of categories or
106 > packages can be used for both the rsync exclusion and the IGNORE.
107
108 Rewritten to:
109
110 | It is much more common for users to strip whole packages
111 | or categories. The ``MISC`` type is not suitable for that,
112 | and so a dedicated package manager mechanism needs to be developed
113 | instead; possibly combining it with rsync exclusion list. The same
114 | mechanism can also handle files that historically used the ``MISC``
115 | type.
116
117 But it's merely a rationale, so I'd rather not spend another hour trying
118 to cover every corner case in it.
119
120 >
121 > > Splitting distfile checksums from file checksums
122 > > ------------------------------------------------
123 > >
124 > > Another problem with the current Manifest format is that the checksums
125 > > for fetched files are combined with checksums for local files
126 > > in a single file inside the package directory. It has been specifically
127 > > pointed out that:
128 > >
129 > > - since distfiles are sometimes reused across different packages,
130 > > the repeating checksums are redundant,
131 >
132 > Comment: 8.4% of all DIST entries are duplicate, representing a 2MiB
133 > saving in tree size (25MiB of DIST entries altogether).
134
135 Included as footnote:
136
137 .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
138 at the time of writing are duplicate, representing a 2 MiB
139 out of 25 MiB of DIST entries altogether.
140
141 >
142 > > - mirror admins were interested in the possibility of verifying all
143 > > the distfiles with a single tool.
144 > >
145 > > This specification does not provide a clean solution to this problem.
146 > > It technically permits moving ``DIST`` entries to higher-level Manifests
147 > > but the usefulness of such a solution is doubtful.
148 >
149 > This solution would require the packager manager to consider
150 > higher-level Manifests or all Manifests in the tree when searching for
151 > the DIST entry. The most useful implementation of this would be for the
152 > git->rsync process to move all DIST entries elsewhere (metadata/ maybe).
153
154 Technically speaking, the package manager needs to consider parent
155 Manifests anyway in order to verify the deeper Manifests, and I think we
156 can reasonably assume it will keep them cached.
157
158 >
159 > Either way, this would have many downsides, and make manual work on the
160 > Manifest DIST entries painful.
161
162 That's what 'doubtful usefulness' means ;-P.
163
164 --
165 Best regards,
166 Michał Górny

Replies