1 |
On Sun, 2018-11-18 at 12:00 +0100, Fabian Groffen wrote: |
2 |
> On 18-11-2018 10:38:51 +0100, Michał Górny wrote: |
3 |
> > On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote: |
4 |
> > > On 17-11-2018 12:21:40 +0100, Michał Górny wrote: |
5 |
> > > > Problems with the current binary package format |
6 |
> > > > ----------------------------------------------- |
7 |
> > > > |
8 |
> > > > The following problems were identified with the package format currently |
9 |
> > > > in use: |
10 |
> > > > |
11 |
> > > > 1. **The packages rely on custom binary archive format to store |
12 |
> > > > metadata.** It is entirely Gentoo invented, and requires dedicated |
13 |
> > > > tooling to work with it. In fact, the reference implementation |
14 |
> > > > in Portage does not even include a CLI tool to work with tbz2 |
15 |
> > > > packages; an unofficial implementation is provided as part |
16 |
> > > > of portage-utils toolkit [#PORTAGE-UTILS]_. |
17 |
> > > |
18 |
> > > I think you should rewrite this section to the argument that the |
19 |
> > > metadata is hard to edit, and that there is only one tool to do so |
20 |
> > > (except a python interface from Portage?). |
21 |
> > > On a separate note, I don't think portage-utils can be considered |
22 |
> > > "unofficial", it is a Gentoo official project as far as I am aware. |
23 |
> > |
24 |
> > In this context, Portage is 'official'. Portage-utils is a project |
25 |
> > that's developed entirely separately from Portage and doesn't use |
26 |
> > Portage APIs but instead reinvents everything. As such, it is easy for |
27 |
> > the two to go out of sync. Or for one of them to have bugs that |
28 |
> > the other one doesn't have (say, with endianness). |
29 |
> |
30 |
> I'm not sure if it's actually true, I was under the impression the same |
31 |
> author(s) worked on the Portage as well as portage-utils code. Anyway, |
32 |
> aren't quickpkg and emerge enough from a user's perspective? |
33 |
|
34 |
Gentoo users have a wide perspective. Assuming that you can think of |
35 |
all things the users need and you don't need to care beyond that |
36 |
is plain wrong and results in Windows. |
37 |
|
38 |
> > > > 2. **The format relies on obscure compressor feature of ignoring |
39 |
> > > > trailing garbage**. While this behavior is traditionally implemented |
40 |
> > > > by many compressors, the original reasons for it have become long |
41 |
> > > > irrelevant and it is not surprising that new compressors do not |
42 |
> > > > support it. In particular, Portage already hit this problem twice: |
43 |
> > > > once when users replaced bzip2 with parallel-capable pbzip2 |
44 |
> > > > implementation [#PBZIP2]_, and the second time when support for zstd |
45 |
> > > > compressor was added [#ZSTD]_. |
46 |
> > > |
47 |
> > > I think this is actually the result of a rather opportunistic |
48 |
> > > implementation. The fault is that we chose to use an extension that |
49 |
> > > suggests the file is a regular compressed tarball. |
50 |
> > > When one detects that a file is xpak padded, it is trivial to feed the |
51 |
> > > decompressor just the relevant part of the datastream. The format |
52 |
> > > itself isn't bad, and doesn't rely on obscure behaviour. |
53 |
> > |
54 |
> > Except if you don't have the proper tools installed. In which case |
55 |
> > the 'opportunistic' behavior made it possible to extract the contents |
56 |
> > without special tools... except when it actually happens not to work |
57 |
> > anymore. Roy's reply indicates that there is actually interest in this |
58 |
> > design feature. |
59 |
> |
60 |
> Your point is that the format is broken (== relies on obscure compressor |
61 |
> feature). My point is that the format simply requires a special tool. |
62 |
> The fact that we prefer to use existing tools doesn't imply in any way |
63 |
> that the format is broken to me. |
64 |
> I think you should rewrite your point to mention that you don't want to |
65 |
> use a tool that doesn't exist in @system (?) to unpack a binpkg. My |
66 |
> guess is that you could use some head/tail magic in a script if the |
67 |
> trailing block is upsetting the decompressor. |
68 |
> |
69 |
> I'm not saying this may look ugly, I'm just saying that your point seems |
70 |
> biased. |
71 |
|
72 |
I've spent a significant effort rewriting those point to make it clear |
73 |
what the problem is, and separating it from other changes 'worth doing |
74 |
while we're changing stuff'. Hope that satisfies your nitpicking. |
75 |
|
76 |
> > > > 3. **Placing metadata at the end of file makes partial fetches |
77 |
> > > > complex.** While it is technically possible to obtain package |
78 |
> > > > metadata remotely without fetching the whole package, it usually |
79 |
> > > > requires e.g. 2-3 HTTP requests with rather complex driver. For |
80 |
> > > > comparison, if metadata was placed at the beginning of the file, |
81 |
> > > > early-terminated pipeline with a single fetch request would suffice. |
82 |
> > > |
83 |
> > > I think this point needs to be quantified somewhat why it is so |
84 |
> > > important. |
85 |
> > > I may be wrong, but the average binpkg is small, <1MiB, bigger packages |
86 |
> > > are <50MiB. |
87 |
> > > So what is the gain to be saved here? A "few" MiBs for what operation |
88 |
> > > exactly? I say "few" because I know for some users this is actually not |
89 |
> > > just a blib before it's downloaded. So if this is possible to achieve, |
90 |
> > > in what scenarios is this going to be used (and is this often?). |
91 |
> > |
92 |
> > Last I checked, Gentoo aimed to support more users than the 'majority' |
93 |
> > of people with high-throughput Internet access. If there's no cost |
94 |
> > in doing things better, why not do them better? |
95 |
> |
96 |
> You didn't address the critical question, but instead just repeated what |
97 |
> I said. |
98 |
> So again, why do you need to read just the metadata? |
99 |
|
100 |
The original idea was to provide the ability of indexing remote packages |
101 |
without having a server-side cache available (or up-to-date). In order |
102 |
to do that, the package manager would need to fetch the metadata of all |
103 |
packages (but there's no necessity in fetching the whole packages). |
104 |
However, that's merely a possible future idea. It's not worth debating |
105 |
today. |
106 |
|
107 |
Today I really understood the point of avoiding premature optimization. |
108 |
Even if the change is practically zero-cost and harmless (as it's simply |
109 |
reordering files), it's going to cost you a lot of time because someone |
110 |
will keep nitpicking on it, even though any other order will not change |
111 |
anything. |
112 |
|
113 |
> > > > 4. **Extending the format with OpenPGP signatures is non-trivial.** |
114 |
> > > > Depending on the implementation details, it either requires fetching |
115 |
> > > > additional detached signature, breaking backwards compatibility or |
116 |
> > > > introducing more custom logic to reassemble OpenPGP packets. |
117 |
> > > |
118 |
> > > I think one could add an extra key to the xpak that holds a gpg sig or |
119 |
> > > something. Perhaps this point is better phrased as that current binpkgs |
120 |
> > > don't have any validation options defined. |
121 |
> > |
122 |
> > ...which extra key would mean that the two disjoint implementations |
123 |
> > in use would need more custom code that extracts the signature, |
124 |
> > reconstructs signed data for verification and verifies it. Or, in other |
125 |
> > words, that user needs even more custom tooling to manually verify |
126 |
> > the package he just fetched. |
127 |
> |
128 |
> I don't see your point. If you define what the package format looks |
129 |
> like, you just need to implement that. There is no point in having a |
130 |
> binpkg format that Portage doesn't implement properly. Portage is |
131 |
> well-equipped to implement any of the approaches. A user should use |
132 |
> Portage to install a package. A poweruser could use a separate tool for |
133 |
> a scenario where he/she's in charge of keeping things sane. Relevancy? |
134 |
> |
135 |
> I just don't agree that extending the format is non-trivial. You seem |
136 |
> to have no arguments other than adding "custom logic", which is what you |
137 |
> eventually also do in the reference implementation of your new approach. |
138 |
|
139 |
The difference is that my format is transparent. You file(1) it, you |
140 |
see a .tar archive. You extract the archive, you see subarchives |
141 |
and .sig which are widely recognized. You don't have to read the spec, |
142 |
you don't have to get special tools. If you ever verified detached |
143 |
signature, you know how to proceed. If you didn't, you'll learn |
144 |
something you can reuse. |
145 |
|
146 |
Now, implementing signatures on top of XPAK is more effort, and yields |
147 |
something that is more fragile and in the end doesn't benefit anyone. |
148 |
|
149 |
> |
150 |
> > > > 5. **Metadata is not compressed.** This is not a significant problem, |
151 |
> > > > it is just listed for completeness. |
152 |
> > > > |
153 |
> > > > |
154 |
> > > > Goals for a new container format |
155 |
> > > > -------------------------------- |
156 |
> > > > |
157 |
> > > > The following goals have been set for a replacement format: |
158 |
> > > > |
159 |
> > > > 1. **The packages must remain contained in a single file.** As a matter |
160 |
> > > > of user convenience, it should be possible to transfer binary |
161 |
> > > > packages without having to use multiple files, and to install them |
162 |
> > > > from any location. |
163 |
> > > > |
164 |
> > > > 2. **The file format must be entirely based on common file formats, |
165 |
> > > > respecting best practices, with as little customization as necessary |
166 |
> > > > to satisfy the requirements.** In particular, it is unacceptable |
167 |
> > > > to create new binary formats. |
168 |
> > > |
169 |
> > > I take this as your personal opinion. I don't quite get why it is |
170 |
> > > unacceptable to create a new binary format though. In particular when |
171 |
> > > you're looking for efficiency, such format could serve your purposes. |
172 |
> > > As long as it's clearly defined, I don't see the problem with a binary |
173 |
> > > format either. |
174 |
> > > Could you add why it is you think binary formats are unacceptable here? |
175 |
> > |
176 |
> > Because custom binary formats require specialized tooling, and are |
177 |
> > a royal PITA when the user wants to do something that the author of |
178 |
> > specialized tooling just happened not to think worthwhile, or when |
179 |
> > the tooling is not available for some reason. And before you ask really |
180 |
> > silly questions, yes, I did fight binary packages over hex editor |
181 |
> > at some point. |
182 |
> |
183 |
> Which I still don't understand, to be frank. I think even Portage |
184 |
> exposes python APIs to get to the data. |
185 |
|
186 |
Compare the time needed to make a trivial (but unforeseen) change |
187 |
on a format that's transparent vs a format that requires you to learn |
188 |
its spec and/or API, write a program and debug it. |
189 |
|
190 |
> > The most trivial case is an attempted recovery of a broken system. |
191 |
> > If you don't have Portage working and don't have portage-utils |
192 |
> > installed, do you really prefer a custom format which will require you |
193 |
> > to fetch and compile special tools? Or is one that can be processed |
194 |
> > with tools you're quite likely to have on every system, like tar? |
195 |
> |
196 |
> Well, I think the idea behind the original binpkg format was to use tar |
197 |
> directly on the files in emergency scenarios like these... |
198 |
> The assumption was bzip2 decompressor and tar being available. |
199 |
> I think it is an example of how you add something, while still allowing |
200 |
> to fallback on existing tools. |
201 |
|
202 |
Except progress in compressors has made it work less and less reliably. |
203 |
It's mostly an example how to be *clever*. However, being clever |
204 |
usually doesn't pay off in the long term, compared to doing things *in a |
205 |
simple way*. |
206 |
|
207 |
> > > > 3. **The file format should provide for partial fetching of binary |
208 |
> > > > packages.** It should be possible to easily fetch and read |
209 |
> > > > the package metadata without having to download the whole package. |
210 |
> > > |
211 |
> > > Like above, what is the use-case here? Why would you want this? I |
212 |
> > > think I'm missing something here. |
213 |
> > |
214 |
> > Does this harm anything? Even if there's little real use for this, is |
215 |
> > there any harm in supporting it? Are we supposed to do things the other |
216 |
> > way around with no benefit just because you don't see any real use for |
217 |
> > it? |
218 |
> |
219 |
> Well, you make a huge point out of it. And if it isn't used, then why |
220 |
> bother so much about it. Then it just looks like you want to use it as |
221 |
> an argument to get rid of something you just don't like. |
222 |
> |
223 |
> In my opinion you better just say "hey I would like to implement this |
224 |
> binpkg format, because I think it would be easier to support with |
225 |
> minimal tools since it doesn't have custom features". I would have |
226 |
> nothing against that. Simple and elegant is nice, you don't need to |
227 |
> invent arguments for that, in my opinion. |
228 |
|
229 |
The spec is now more focused on that. |
230 |
|
231 |
> |
232 |
> Fabian |
233 |
> |
234 |
> > > > 4. **The file format must provide support for OpenPGP signatures.** |
235 |
> > > > Preferably, it should use standard OpenPGP message formats. |
236 |
> > > > |
237 |
> > > > 5. **The file format must allow for efficient metadata updates.** |
238 |
> > > > In particular, it should be possible to update the metadata without |
239 |
> > > > having to recompress package files. |
240 |
> > > > |
241 |
> > > > 6. **The file format should account for easy recognition both through |
242 |
> > > > filename and through contents.** Preferably, it should have distinct |
243 |
> > > > features making it possible to detect it via file(1). |
244 |
> > > > |
245 |
> > > > 7. **The file format should allow for metadata compression.** |
246 |
> > > > |
247 |
> > > > 8. **The file format should make future extensions easily possible |
248 |
> > > > without breaking backwards compatibility.** |
249 |
> > > |
250 |
> > > |
251 |
> > |
252 |
> > -- |
253 |
> > Best regards, |
254 |
> > Michał Górny |
255 |
> |
256 |
> |
257 |
> |
258 |
|
259 |
-- |
260 |
Best regards, |
261 |
Michał Górny |