1 |
On 18-11-2018 10:38:51 +0100, Michał Górny wrote: |
2 |
> On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote: |
3 |
> > On 17-11-2018 12:21:40 +0100, Michał Górny wrote: |
4 |
> > > Problems with the current binary package format |
5 |
> > > ----------------------------------------------- |
6 |
> > > |
7 |
> > > The following problems were identified with the package format currently |
8 |
> > > in use: |
9 |
> > > |
10 |
> > > 1. **The packages rely on custom binary archive format to store |
11 |
> > > metadata.** It is entirely Gentoo invented, and requires dedicated |
12 |
> > > tooling to work with it. In fact, the reference implementation |
13 |
> > > in Portage does not even include a CLI tool to work with tbz2 |
14 |
> > > packages; an unofficial implementation is provided as part |
15 |
> > > of portage-utils toolkit [#PORTAGE-UTILS]_. |
16 |
> > |
17 |
> > I think you should rewrite this section to the argument that the |
18 |
> > metadata is hard to edit, and that there is only one tool to do so |
19 |
> > (except a python interface from Portage?). |
20 |
> > On a separate note, I don't think portage-utils can be considered |
21 |
> > "unofficial", it is a Gentoo official project as far as I am aware. |
22 |
> |
23 |
> In this context, Portage is 'official'. Portage-utils is a project |
24 |
> that's developed entirely separately from Portage and doesn't use |
25 |
> Portage APIs but instead reinvents everything. As such, it is easy for |
26 |
> the two to go out of sync. Or for one of them to have bugs that |
27 |
> the other one doesn't have (say, with endianness). |
28 |
|
29 |
I'm not sure if it's actually true, I was under the impression the same |
30 |
author(s) worked on the Portage as well as portage-utils code. Anyway, |
31 |
aren't quickpkg and emerge enough from a user's perspective? |
32 |
|
33 |
> > > 2. **The format relies on obscure compressor feature of ignoring |
34 |
> > > trailing garbage**. While this behavior is traditionally implemented |
35 |
> > > by many compressors, the original reasons for it have become long |
36 |
> > > irrelevant and it is not surprising that new compressors do not |
37 |
> > > support it. In particular, Portage already hit this problem twice: |
38 |
> > > once when users replaced bzip2 with parallel-capable pbzip2 |
39 |
> > > implementation [#PBZIP2]_, and the second time when support for zstd |
40 |
> > > compressor was added [#ZSTD]_. |
41 |
> > |
42 |
> > I think this is actually the result of a rather opportunistic |
43 |
> > implementation. The fault is that we chose to use an extension that |
44 |
> > suggests the file is a regular compressed tarball. |
45 |
> > When one detects that a file is xpak padded, it is trivial to feed the |
46 |
> > decompressor just the relevant part of the datastream. The format |
47 |
> > itself isn't bad, and doesn't rely on obscure behaviour. |
48 |
> |
49 |
> Except if you don't have the proper tools installed. In which case |
50 |
> the 'opportunistic' behavior made it possible to extract the contents |
51 |
> without special tools... except when it actually happens not to work |
52 |
> anymore. Roy's reply indicates that there is actually interest in this |
53 |
> design feature. |
54 |
|
55 |
Your point is that the format is broken (== relies on obscure compressor |
56 |
feature). My point is that the format simply requires a special tool. |
57 |
The fact that we prefer to use existing tools doesn't imply in any way |
58 |
that the format is broken to me. |
59 |
I think you should rewrite your point to mention that you don't want to |
60 |
use a tool that doesn't exist in @system (?) to unpack a binpkg. My |
61 |
guess is that you could use some head/tail magic in a script if the |
62 |
trailing block is upsetting the decompressor. |
63 |
|
64 |
I'm not saying this may look ugly, I'm just saying that your point seems |
65 |
biased. |
66 |
|
67 |
> > > 3. **Placing metadata at the end of file makes partial fetches |
68 |
> > > complex.** While it is technically possible to obtain package |
69 |
> > > metadata remotely without fetching the whole package, it usually |
70 |
> > > requires e.g. 2-3 HTTP requests with rather complex driver. For |
71 |
> > > comparison, if metadata was placed at the beginning of the file, |
72 |
> > > early-terminated pipeline with a single fetch request would suffice. |
73 |
> > |
74 |
> > I think this point needs to be quantified somewhat why it is so |
75 |
> > important. |
76 |
> > I may be wrong, but the average binpkg is small, <1MiB, bigger packages |
77 |
> > are <50MiB. |
78 |
> > So what is the gain to be saved here? A "few" MiBs for what operation |
79 |
> > exactly? I say "few" because I know for some users this is actually not |
80 |
> > just a blib before it's downloaded. So if this is possible to achieve, |
81 |
> > in what scenarios is this going to be used (and is this often?). |
82 |
> |
83 |
> Last I checked, Gentoo aimed to support more users than the 'majority' |
84 |
> of people with high-throughput Internet access. If there's no cost |
85 |
> in doing things better, why not do them better? |
86 |
|
87 |
You didn't address the critical question, but instead just repeated what |
88 |
I said. |
89 |
So again, why do you need to read just the metadata? |
90 |
|
91 |
> > > 4. **Extending the format with OpenPGP signatures is non-trivial.** |
92 |
> > > Depending on the implementation details, it either requires fetching |
93 |
> > > additional detached signature, breaking backwards compatibility or |
94 |
> > > introducing more custom logic to reassemble OpenPGP packets. |
95 |
> > |
96 |
> > I think one could add an extra key to the xpak that holds a gpg sig or |
97 |
> > something. Perhaps this point is better phrased as that current binpkgs |
98 |
> > don't have any validation options defined. |
99 |
> |
100 |
> ...which extra key would mean that the two disjoint implementations |
101 |
> in use would need more custom code that extracts the signature, |
102 |
> reconstructs signed data for verification and verifies it. Or, in other |
103 |
> words, that user needs even more custom tooling to manually verify |
104 |
> the package he just fetched. |
105 |
|
106 |
I don't see your point. If you define what the package format looks |
107 |
like, you just need to implement that. There is no point in having a |
108 |
binpkg format that Portage doesn't implement properly. Portage is |
109 |
well-equipped to implement any of the approaches. A user should use |
110 |
Portage to install a package. A poweruser could use a separate tool for |
111 |
a scenario where he/she's in charge of keeping things sane. Relevancy? |
112 |
|
113 |
I just don't agree that extending the format is non-trivial. You seem |
114 |
to have no arguments other than adding "custom logic", which is what you |
115 |
eventually also do in the reference implementation of your new approach. |
116 |
|
117 |
> > > 5. **Metadata is not compressed.** This is not a significant problem, |
118 |
> > > it is just listed for completeness. |
119 |
> > > |
120 |
> > > |
121 |
> > > Goals for a new container format |
122 |
> > > -------------------------------- |
123 |
> > > |
124 |
> > > The following goals have been set for a replacement format: |
125 |
> > > |
126 |
> > > 1. **The packages must remain contained in a single file.** As a matter |
127 |
> > > of user convenience, it should be possible to transfer binary |
128 |
> > > packages without having to use multiple files, and to install them |
129 |
> > > from any location. |
130 |
> > > |
131 |
> > > 2. **The file format must be entirely based on common file formats, |
132 |
> > > respecting best practices, with as little customization as necessary |
133 |
> > > to satisfy the requirements.** In particular, it is unacceptable |
134 |
> > > to create new binary formats. |
135 |
> > |
136 |
> > I take this as your personal opinion. I don't quite get why it is |
137 |
> > unacceptable to create a new binary format though. In particular when |
138 |
> > you're looking for efficiency, such format could serve your purposes. |
139 |
> > As long as it's clearly defined, I don't see the problem with a binary |
140 |
> > format either. |
141 |
> > Could you add why it is you think binary formats are unacceptable here? |
142 |
> |
143 |
> Because custom binary formats require specialized tooling, and are |
144 |
> a royal PITA when the user wants to do something that the author of |
145 |
> specialized tooling just happened not to think worthwhile, or when |
146 |
> the tooling is not available for some reason. And before you ask really |
147 |
> silly questions, yes, I did fight binary packages over hex editor |
148 |
> at some point. |
149 |
|
150 |
Which I still don't understand, to be frank. I think even Portage |
151 |
exposes python APIs to get to the data. |
152 |
|
153 |
> The most trivial case is an attempted recovery of a broken system. |
154 |
> If you don't have Portage working and don't have portage-utils |
155 |
> installed, do you really prefer a custom format which will require you |
156 |
> to fetch and compile special tools? Or is one that can be processed |
157 |
> with tools you're quite likely to have on every system, like tar? |
158 |
|
159 |
Well, I think the idea behind the original binpkg format was to use tar |
160 |
directly on the files in emergency scenarios like these... |
161 |
The assumption was bzip2 decompressor and tar being available. |
162 |
I think it is an example of how you add something, while still allowing |
163 |
to fallback on existing tools. |
164 |
|
165 |
> > > 3. **The file format should provide for partial fetching of binary |
166 |
> > > packages.** It should be possible to easily fetch and read |
167 |
> > > the package metadata without having to download the whole package. |
168 |
> > |
169 |
> > Like above, what is the use-case here? Why would you want this? I |
170 |
> > think I'm missing something here. |
171 |
> |
172 |
> Does this harm anything? Even if there's little real use for this, is |
173 |
> there any harm in supporting it? Are we supposed to do things the other |
174 |
> way around with no benefit just because you don't see any real use for |
175 |
> it? |
176 |
|
177 |
Well, you make a huge point out of it. And if it isn't used, then why |
178 |
bother so much about it. Then it just looks like you want to use it as |
179 |
an argument to get rid of something you just don't like. |
180 |
|
181 |
In my opinion you better just say "hey I would like to implement this |
182 |
binpkg format, because I think it would be easier to support with |
183 |
minimal tools since it doesn't have custom features". I would have |
184 |
nothing against that. Simple and elegant is nice, you don't need to |
185 |
invent arguments for that, in my opinion. |
186 |
|
187 |
Fabian |
188 |
|
189 |
> > > 4. **The file format must provide support for OpenPGP signatures.** |
190 |
> > > Preferably, it should use standard OpenPGP message formats. |
191 |
> > > |
192 |
> > > 5. **The file format must allow for efficient metadata updates.** |
193 |
> > > In particular, it should be possible to update the metadata without |
194 |
> > > having to recompress package files. |
195 |
> > > |
196 |
> > > 6. **The file format should account for easy recognition both through |
197 |
> > > filename and through contents.** Preferably, it should have distinct |
198 |
> > > features making it possible to detect it via file(1). |
199 |
> > > |
200 |
> > > 7. **The file format should allow for metadata compression.** |
201 |
> > > |
202 |
> > > 8. **The file format should make future extensions easily possible |
203 |
> > > without breaking backwards compatibility.** |
204 |
> > |
205 |
> > |
206 |
> |
207 |
> -- |
208 |
> Best regards, |
209 |
> Michał Górny |
210 |
|
211 |
|
212 |
|
213 |
-- |
214 |
Fabian Groffen |
215 |
Gentoo on a different level |