1 |
On 17-11-2018 12:21:40 +0100, Michał Górny wrote: |
2 |
> Problems with the current binary package format |
3 |
> ----------------------------------------------- |
4 |
> |
5 |
> The following problems were identified with the package format currently |
6 |
> in use: |
7 |
> |
8 |
> 1. **The packages rely on custom binary archive format to store |
9 |
> metadata.** It is entirely Gentoo invented, and requires dedicated |
10 |
> tooling to work with it. In fact, the reference implementation |
11 |
> in Portage does not even include a CLI tool to work with tbz2 |
12 |
> packages; an unofficial implementation is provided as part |
13 |
> of portage-utils toolkit [#PORTAGE-UTILS]_. |
14 |
|
15 |
I think you should rewrite this section to the argument that the |
16 |
metadata is hard to edit, and that there is only one tool to do so |
17 |
(except a python interface from Portage?). |
18 |
On a separate note, I don't think portage-utils can be considered |
19 |
"unofficial", it is a Gentoo official project as far as I am aware. |
20 |
|
21 |
> 2. **The format relies on obscure compressor feature of ignoring |
22 |
> trailing garbage**. While this behavior is traditionally implemented |
23 |
> by many compressors, the original reasons for it have become long |
24 |
> irrelevant and it is not surprising that new compressors do not |
25 |
> support it. In particular, Portage already hit this problem twice: |
26 |
> once when users replaced bzip2 with parallel-capable pbzip2 |
27 |
> implementation [#PBZIP2]_, and the second time when support for zstd |
28 |
> compressor was added [#ZSTD]_. |
29 |
|
30 |
I think this is actually the result of a rather opportunistic |
31 |
implementation. The fault is that we chose to use an extension that |
32 |
suggests the file is a regular compressed tarball. |
33 |
When one detects that a file is xpak padded, it is trivial to feed the |
34 |
decompressor just the relevant part of the datastream. The format |
35 |
itself isn't bad, and doesn't rely on obscure behaviour. |
36 |
|
37 |
> 3. **Placing metadata at the end of file makes partial fetches |
38 |
> complex.** While it is technically possible to obtain package |
39 |
> metadata remotely without fetching the whole package, it usually |
40 |
> requires e.g. 2-3 HTTP requests with rather complex driver. For |
41 |
> comparison, if metadata was placed at the beginning of the file, |
42 |
> early-terminated pipeline with a single fetch request would suffice. |
43 |
|
44 |
I think this point needs to be quantified somewhat why it is so |
45 |
important. |
46 |
I may be wrong, but the average binpkg is small, <1MiB, bigger packages |
47 |
are <50MiB. |
48 |
So what is the gain to be saved here? A "few" MiBs for what operation |
49 |
exactly? I say "few" because I know for some users this is actually not |
50 |
just a blib before it's downloaded. So if this is possible to achieve, |
51 |
in what scenarios is this going to be used (and is this often?). |
52 |
|
53 |
> 4. **Extending the format with OpenPGP signatures is non-trivial.** |
54 |
> Depending on the implementation details, it either requires fetching |
55 |
> additional detached signature, breaking backwards compatibility or |
56 |
> introducing more custom logic to reassemble OpenPGP packets. |
57 |
|
58 |
I think one could add an extra key to the xpak that holds a gpg sig or |
59 |
something. Perhaps this point is better phrased as that current binpkgs |
60 |
don't have any validation options defined. |
61 |
|
62 |
> 5. **Metadata is not compressed.** This is not a significant problem, |
63 |
> it is just listed for completeness. |
64 |
> |
65 |
> |
66 |
> Goals for a new container format |
67 |
> -------------------------------- |
68 |
> |
69 |
> The following goals have been set for a replacement format: |
70 |
> |
71 |
> 1. **The packages must remain contained in a single file.** As a matter |
72 |
> of user convenience, it should be possible to transfer binary |
73 |
> packages without having to use multiple files, and to install them |
74 |
> from any location. |
75 |
> |
76 |
> 2. **The file format must be entirely based on common file formats, |
77 |
> respecting best practices, with as little customization as necessary |
78 |
> to satisfy the requirements.** In particular, it is unacceptable |
79 |
> to create new binary formats. |
80 |
|
81 |
I take this as your personal opinion. I don't quite get why it is |
82 |
unacceptable to create a new binary format though. In particular when |
83 |
you're looking for efficiency, such format could serve your purposes. |
84 |
As long as it's clearly defined, I don't see the problem with a binary |
85 |
format either. |
86 |
Could you add why it is you think binary formats are unacceptable here? |
87 |
|
88 |
> 3. **The file format should provide for partial fetching of binary |
89 |
> packages.** It should be possible to easily fetch and read |
90 |
> the package metadata without having to download the whole package. |
91 |
|
92 |
Like above, what is the use-case here? Why would you want this? I |
93 |
think I'm missing something here. |
94 |
|
95 |
> 4. **The file format must provide support for OpenPGP signatures.** |
96 |
> Preferably, it should use standard OpenPGP message formats. |
97 |
> |
98 |
> 5. **The file format must allow for efficient metadata updates.** |
99 |
> In particular, it should be possible to update the metadata without |
100 |
> having to recompress package files. |
101 |
> |
102 |
> 6. **The file format should account for easy recognition both through |
103 |
> filename and through contents.** Preferably, it should have distinct |
104 |
> features making it possible to detect it via file(1). |
105 |
> |
106 |
> 7. **The file format should allow for metadata compression.** |
107 |
> |
108 |
> 8. **The file format should make future extensions easily possible |
109 |
> without breaking backwards compatibility.** |
110 |
|
111 |
-- |
112 |
Fabian Groffen |
113 |
Gentoo on a different level |