1 |
On 11/10/2018 06:37 AM, Alec Warner wrote: |
2 |
> |
3 |
> On Sat, Nov 10, 2018 at 8:09 AM Michał Górny <mgorny@g.o |
4 |
> <mailto:mgorny@g.o>> wrote: |
5 |
> |
6 |
> Hi, everyone. |
7 |
> |
8 |
> The Gentoo's tbz2/xpak package format is quite old. We've made a few |
9 |
> incompatible changes in the past (most notably, allowing non-bzip2 |
10 |
> compression and multi-instance naming) but the core design stayed |
11 |
> the same. I think we should consider changing it, for the reasons |
12 |
> outlined below. |
13 |
> |
14 |
> The rough format description can be found in xpak(5). Basically, it's |
15 |
> a regular compressed tarball with binary metadata blob appended |
16 |
> to the end. As such, it looks like a regular compressed tarball |
17 |
> to the compression tools (with some ignored junk at the end). |
18 |
> The metadata is entirely custom format and needs dedicated tools |
19 |
> to manipulate. |
20 |
> |
21 |
> |
22 |
> The current format has a few advantages whose preserving would probably |
23 |
> be worthwhile: |
24 |
> |
25 |
> + The binary package is a single flat file. |
26 |
> |
27 |
> + It is reasonably compatible with regular compressed tarball, |
28 |
> so the users can unpack it using standard tools (except for metadata). |
29 |
> |
30 |
> + The metadata is uncompressed and can be quickly found without touching |
31 |
> the compressed data. |
32 |
> |
33 |
> + The metadata can be updated (e.g. as result of pkgmove) without |
34 |
> touching the compressed data. |
35 |
> |
36 |
> |
37 |
> However, it has a few disadvantages as well: |
38 |
> |
39 |
> - The metadata is entirely custom binary format, requiring dedicated |
40 |
> tools to read or edit. |
41 |
> |
42 |
> - The metadata format is relying on customary behavior of compression |
43 |
> tools that ignore junk following the compressed data. |
44 |
> |
45 |
> |
46 |
> I agree this is a problem in theory, but I haven't seen it as a problem |
47 |
> in practice. Have you observed any problems around this setup? |
48 |
|
49 |
In portage we use head -c to selected the compressed data, since zstd |
50 |
doesn't handle the xpak trailer well. |
51 |
|
52 |
> |
53 |
> - By placing the metadata at the end of file, we make it rather hard to |
54 |
> read the metadata from remote location (via FTP, HTTP) without fetching |
55 |
> the whole file. [NB: it's technically possible but probably not worth |
56 |
> the effort] |
57 |
> |
58 |
> |
59 |
> - By requiring the custom format to be at the end of file, we make it |
60 |
> impossible to trivially cover it with a OpenPGP signature without |
61 |
> introducing another custom format. |
62 |
> |
63 |
> |
64 |
> Its trivial to cover with a detached sig, no? |
65 |
> |
66 |
> |
67 |
> |
68 |
> - While the format might allow for some extensibility, it's rather |
69 |
> evolutionary dead end. |
70 |
> |
71 |
> |
72 |
> I'm not even sure how to quantify this, it just sounds like your |
73 |
> subjective opinion (which is fine, but its not factual.) |
74 |
|
75 |
Yeah the xpak trailer is flexible enough, but I'm not opposed to |
76 |
supporting a different format. |
77 |
|
78 |
> |
79 |
> I think the key points of the new format should be: |
80 |
> |
81 |
> 1. It should reuse common file formats as much as possible, with |
82 |
> inventing as little custom code as possible. |
83 |
> |
84 |
> 2. It should allow for easy introspection and editing by users without |
85 |
> dedicated tools. |
86 |
> |
87 |
> |
88 |
> So I'm less confident in the editing use cases; do users edit their |
89 |
> binpkgs on a regular basis? |
90 |
|
91 |
Yes, gentoo/profiles/updates package renames an slot moves are a form of |
92 |
this. |
93 |
|
94 |
> |
95 |
> 3. The metadata should allow for lookup without fetching the whole |
96 |
> binary package. |
97 |
> |
98 |
> 4. The format should allow for some extensions without having to |
99 |
> reinvent the wheel every time. |
100 |
> |
101 |
> 5. It would be nice to preserve the existing advantages. |
102 |
> |
103 |
> |
104 |
> My proposal |
105 |
> =========== |
106 |
> |
107 |
> Basic format |
108 |
> ------------ |
109 |
> The base of the format is a regular compressed tarball. There's no junk |
110 |
> appended to it but the metadata is stored inside it as |
111 |
> /var/db/pkg/${PF}. The contents are as compatible with the actual vdb |
112 |
> format as possible. |
113 |
> |
114 |
> |
115 |
> Just to clarify, you are suggesting we store the metadata inside the |
116 |
> contents of the binary package itself (e.g. where the other files that |
117 |
> get merged to the liveFS are?) What about collisions? |
118 |
> |
119 |
> E.g. I install 'machine-images/gentoo-disk-image-1.2.3' on a machine |
120 |
> that already has 'machine-images/gentoo-disk-image-1.2.3' installed, |
121 |
> won't it overwrite files in the VDB at qmerge time? |
122 |
|
123 |
I haven't looked into it but maybe we can use "nil control directory |
124 |
names" to embed things, like http://savannah.gnu.org/projects/swbis |
125 |
claims to use. |
126 |
|
127 |
> This has the following advantages: |
128 |
> |
129 |
> + Binary package is still stored as a single file. |
130 |
> |
131 |
> + It uses a standard compressed .tar format, with minimal customization. |
132 |
> |
133 |
> + The user can easily inspect and modify the packages with standard |
134 |
> tools (tar and the compressor). |
135 |
> |
136 |
> + If we can maintain reasonable level of vdb compatibility, the user can |
137 |
> even emergency-install a package without causing too much hassle (as it |
138 |
> will be recorded in vdb); ideally Portage would detect this vdb entry |
139 |
> and support fixing the install afterwards. |
140 |
> |
141 |
> |
142 |
> I'm not certain this is really desired. |
143 |
|
144 |
Yeah I don't like it either, I'd prefer to keep the metadata someplace |
145 |
where it can't overwrite files in the installed package database. |
146 |
|
147 |
> |
148 |
> Optimizing for easy recognition |
149 |
> ------------------------------- |
150 |
> In order to make it possible for magic-based tools such as file(1) to |
151 |
> easily distinguish Gentoo binary packages from regular tarballs, we |
152 |
> could (ab)use the volume label field, e.g. use: |
153 |
> |
154 |
> $ tar -V 'gpkg: app-foo/bar-1' -c ... |
155 |
> |
156 |
> This will add a volume label as the first file entry inside the tarball, |
157 |
> which does not affect extracting but can be trivially matched via magic |
158 |
> rules. |
159 |
> |
160 |
> Note: this is meant to be used as a method for fast binary package |
161 |
> recognition; I don't think we should reject (hand-modified) binary |
162 |
> packages that lack this label. |
163 |
> |
164 |
> |
165 |
> Optimizing for metadata reading/manipulation performance |
166 |
> -------------------------------------------------------- |
167 |
> The main problem with using a single tarball for both metadata and data |
168 |
> is that normally you'd have to decompress everything to reliably unpack |
169 |
> metadata, and recompress everything to update it. This problem can be |
170 |
> addressed by a few optimization tricks. |
171 |
> |
172 |
> |
173 |
> These performance goals seem a little bit ill defined. |
174 |
> |
175 |
> 1) Where are users reporting slowness in binpkg operations? |
176 |
> 2) What is the cause of the slowness? |
177 |
|
178 |
Yeah I'd like more information here too. |
179 |
|
180 |
> Like I could easily see a potential user with many large binpkgs, and |
181 |
> the current implementation causing them issues because |
182 |
> they have to decompress and seek a bunch to read the metadata out of |
183 |
> their 1.2GB binpkg. But i'm pretty sure this isn't most users. |
184 |
> |
185 |
> |
186 |
> |
187 |
> Firstly, all metadata files are packed to the archive before data files. |
188 |
> With a slightly customized unpacker, we can stop decompressing as soon |
189 |
> as we're past metadata and avoid decompressing the whole archive. This |
190 |
> will also make it possible to read metadata from remote files without |
191 |
> fetching far past the compressed metadata block. |
192 |
> |
193 |
> |
194 |
> So this seems to basically go against your goals of simple common tooling? |
195 |
> |
196 |
> |
197 |
> |
198 |
> Secondly, if we're up for some more tricks, we could technically split |
199 |
> the tarball into metadata and data blocks compressed separately. This |
200 |
> will need a bit of archiver customization but it will make it possible |
201 |
> to decompress the metadata part without even touching compressed data, |
202 |
> and to replace it without recompressing data. |
203 |
> |
204 |
> What's important is that both tricks proposed maintain backwards |
205 |
> compatibility with regular compressed tarballs. That is, the user will |
206 |
> still be able to extract it with regular archiving tools. |
207 |
> |
208 |
> |
209 |
> So my recollection is that debian uses common format AR files for the |
210 |
> main deb. |
211 |
> Then they have 2 compressed tarballs, one for metadata, and one for data. |
212 |
> |
213 |
> This format seems to jive with many of your requirements: |
214 |
> |
215 |
> - 'ar' can retrieve individual files from the archive. |
216 |
> - The deb file itself is not compressed, but the tarballs inside *are* |
217 |
> compressed. |
218 |
> - The metadata and data are compressed separately. |
219 |
> - Anyone can edit this with normal tooling (ar, tar) |
220 |
> |
221 |
> In short; why should we event a new format? |
222 |
|
223 |
Maybe we can borrow some ideas from |
224 |
http://savannah.gnu.org/projects/swbis which claims to be capable of |
225 |
creating and verifying a tarball with GPG signatures embedded in the |
226 |
tarball. |
227 |
|
228 |
> |
229 |
> Adding OpenPGP signatures |
230 |
> ------------------------- |
231 |
> This is the main XXX here. |
232 |
> |
233 |
> Technically, the most obvious solution is to cover the entire tarball |
234 |
> with OpenPGP signature. However, this has the disadvantage that |
235 |
> the verification requires fetching the whole file. |
236 |
> |
237 |
> I will look into possibility of having partial signatures. |
238 |
> |
239 |
> |
240 |
> -- |
241 |
> Best regards, |
242 |
> Michał Górny |
243 |
> |
244 |
|
245 |
|
246 |
-- |
247 |
Thanks, |
248 |
Zac |