1 |
On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote: |
2 |
> On 17-11-2018 12:21:40 +0100, Michał Górny wrote: |
3 |
> > Problems with the current binary package format |
4 |
> > ----------------------------------------------- |
5 |
> > |
6 |
> > The following problems were identified with the package format currently |
7 |
> > in use: |
8 |
> > |
9 |
> > 1. **The packages rely on custom binary archive format to store |
10 |
> > metadata.** It is entirely Gentoo invented, and requires dedicated |
11 |
> > tooling to work with it. In fact, the reference implementation |
12 |
> > in Portage does not even include a CLI tool to work with tbz2 |
13 |
> > packages; an unofficial implementation is provided as part |
14 |
> > of portage-utils toolkit [#PORTAGE-UTILS]_. |
15 |
> |
16 |
> I think you should rewrite this section to the argument that the |
17 |
> metadata is hard to edit, and that there is only one tool to do so |
18 |
> (except a python interface from Portage?). |
19 |
> On a separate note, I don't think portage-utils can be considered |
20 |
> "unofficial", it is a Gentoo official project as far as I am aware. |
21 |
|
22 |
In this context, Portage is 'official'. Portage-utils is a project |
23 |
that's developed entirely separately from Portage and doesn't use |
24 |
Portage APIs but instead reinvents everything. As such, it is easy for |
25 |
the two to go out of sync. Or for one of them to have bugs that |
26 |
the other one doesn't have (say, with endianness). |
27 |
|
28 |
> > 2. **The format relies on obscure compressor feature of ignoring |
29 |
> > trailing garbage**. While this behavior is traditionally implemented |
30 |
> > by many compressors, the original reasons for it have become long |
31 |
> > irrelevant and it is not surprising that new compressors do not |
32 |
> > support it. In particular, Portage already hit this problem twice: |
33 |
> > once when users replaced bzip2 with parallel-capable pbzip2 |
34 |
> > implementation [#PBZIP2]_, and the second time when support for zstd |
35 |
> > compressor was added [#ZSTD]_. |
36 |
> |
37 |
> I think this is actually the result of a rather opportunistic |
38 |
> implementation. The fault is that we chose to use an extension that |
39 |
> suggests the file is a regular compressed tarball. |
40 |
> When one detects that a file is xpak padded, it is trivial to feed the |
41 |
> decompressor just the relevant part of the datastream. The format |
42 |
> itself isn't bad, and doesn't rely on obscure behaviour. |
43 |
|
44 |
Except if you don't have the proper tools installed. In which case |
45 |
the 'opportunistic' behavior made it possible to extract the contents |
46 |
without special tools... except when it actually happens not to work |
47 |
anymore. Roy's reply indicates that there is actually interest in this |
48 |
design feature. |
49 |
|
50 |
> |
51 |
> > 3. **Placing metadata at the end of file makes partial fetches |
52 |
> > complex.** While it is technically possible to obtain package |
53 |
> > metadata remotely without fetching the whole package, it usually |
54 |
> > requires e.g. 2-3 HTTP requests with rather complex driver. For |
55 |
> > comparison, if metadata was placed at the beginning of the file, |
56 |
> > early-terminated pipeline with a single fetch request would suffice. |
57 |
> |
58 |
> I think this point needs to be quantified somewhat why it is so |
59 |
> important. |
60 |
> I may be wrong, but the average binpkg is small, <1MiB, bigger packages |
61 |
> are <50MiB. |
62 |
> So what is the gain to be saved here? A "few" MiBs for what operation |
63 |
> exactly? I say "few" because I know for some users this is actually not |
64 |
> just a blib before it's downloaded. So if this is possible to achieve, |
65 |
> in what scenarios is this going to be used (and is this often?). |
66 |
|
67 |
Last I checked, Gentoo aimed to support more users than the 'majority' |
68 |
of people with high-throughput Internet access. If there's no cost |
69 |
in doing things better, why not do them better? |
70 |
|
71 |
> |
72 |
> > 4. **Extending the format with OpenPGP signatures is non-trivial.** |
73 |
> > Depending on the implementation details, it either requires fetching |
74 |
> > additional detached signature, breaking backwards compatibility or |
75 |
> > introducing more custom logic to reassemble OpenPGP packets. |
76 |
> |
77 |
> I think one could add an extra key to the xpak that holds a gpg sig or |
78 |
> something. Perhaps this point is better phrased as that current binpkgs |
79 |
> don't have any validation options defined. |
80 |
|
81 |
...which extra key would mean that the two disjoint implementations |
82 |
in use would need more custom code that extracts the signature, |
83 |
reconstructs signed data for verification and verifies it. Or, in other |
84 |
words, that user needs even more custom tooling to manually verify |
85 |
the package he just fetched. |
86 |
|
87 |
> |
88 |
> > 5. **Metadata is not compressed.** This is not a significant problem, |
89 |
> > it is just listed for completeness. |
90 |
> > |
91 |
> > |
92 |
> > Goals for a new container format |
93 |
> > -------------------------------- |
94 |
> > |
95 |
> > The following goals have been set for a replacement format: |
96 |
> > |
97 |
> > 1. **The packages must remain contained in a single file.** As a matter |
98 |
> > of user convenience, it should be possible to transfer binary |
99 |
> > packages without having to use multiple files, and to install them |
100 |
> > from any location. |
101 |
> > |
102 |
> > 2. **The file format must be entirely based on common file formats, |
103 |
> > respecting best practices, with as little customization as necessary |
104 |
> > to satisfy the requirements.** In particular, it is unacceptable |
105 |
> > to create new binary formats. |
106 |
> |
107 |
> I take this as your personal opinion. I don't quite get why it is |
108 |
> unacceptable to create a new binary format though. In particular when |
109 |
> you're looking for efficiency, such format could serve your purposes. |
110 |
> As long as it's clearly defined, I don't see the problem with a binary |
111 |
> format either. |
112 |
> Could you add why it is you think binary formats are unacceptable here? |
113 |
|
114 |
Because custom binary formats require specialized tooling, and are |
115 |
a royal PITA when the user wants to do something that the author of |
116 |
specialized tooling just happened not to think worthwhile, or when |
117 |
the tooling is not available for some reason. And before you ask really |
118 |
silly questions, yes, I did fight binary packages over hex editor |
119 |
at some point. |
120 |
|
121 |
The most trivial case is an attempted recovery of a broken system. |
122 |
If you don't have Portage working and don't have portage-utils |
123 |
installed, do you really prefer a custom format which will require you |
124 |
to fetch and compile special tools? Or is one that can be processed |
125 |
with tools you're quite likely to have on every system, like tar? |
126 |
|
127 |
> |
128 |
> > 3. **The file format should provide for partial fetching of binary |
129 |
> > packages.** It should be possible to easily fetch and read |
130 |
> > the package metadata without having to download the whole package. |
131 |
> |
132 |
> Like above, what is the use-case here? Why would you want this? I |
133 |
> think I'm missing something here. |
134 |
|
135 |
Does this harm anything? Even if there's little real use for this, is |
136 |
there any harm in supporting it? Are we supposed to do things the other |
137 |
way around with no benefit just because you don't see any real use for |
138 |
it? |
139 |
|
140 |
> |
141 |
> > 4. **The file format must provide support for OpenPGP signatures.** |
142 |
> > Preferably, it should use standard OpenPGP message formats. |
143 |
> > |
144 |
> > 5. **The file format must allow for efficient metadata updates.** |
145 |
> > In particular, it should be possible to update the metadata without |
146 |
> > having to recompress package files. |
147 |
> > |
148 |
> > 6. **The file format should account for easy recognition both through |
149 |
> > filename and through contents.** Preferably, it should have distinct |
150 |
> > features making it possible to detect it via file(1). |
151 |
> > |
152 |
> > 7. **The file format should allow for metadata compression.** |
153 |
> > |
154 |
> > 8. **The file format should make future extensions easily possible |
155 |
> > without breaking backwards compatibility.** |
156 |
> |
157 |
> |
158 |
|
159 |
-- |
160 |
Best regards, |
161 |
Michał Górny |