Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format
Date: Sun, 18 Nov 2018 09:39:02
Message-Id: 1542533931.1293.23.camel@gentoo.org
In Reply to: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format by Fabian Groffen
1 On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
2 > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
3 > > Problems with the current binary package format
4 > > -----------------------------------------------
5 > >
6 > > The following problems were identified with the package format currently
7 > > in use:
8 > >
9 > > 1. **The packages rely on custom binary archive format to store
10 > > metadata.** It is entirely Gentoo invented, and requires dedicated
11 > > tooling to work with it. In fact, the reference implementation
12 > > in Portage does not even include a CLI tool to work with tbz2
13 > > packages; an unofficial implementation is provided as part
14 > > of portage-utils toolkit [#PORTAGE-UTILS]_.
15 >
16 > I think you should rewrite this section to the argument that the
17 > metadata is hard to edit, and that there is only one tool to do so
18 > (except a python interface from Portage?).
19 > On a separate note, I don't think portage-utils can be considered
20 > "unofficial", it is a Gentoo official project as far as I am aware.
21
22 In this context, Portage is 'official'. Portage-utils is a project
23 that's developed entirely separately from Portage and doesn't use
24 Portage APIs but instead reinvents everything. As such, it is easy for
25 the two to go out of sync. Or for one of them to have bugs that
26 the other one doesn't have (say, with endianness).
27
28 > > 2. **The format relies on obscure compressor feature of ignoring
29 > > trailing garbage**. While this behavior is traditionally implemented
30 > > by many compressors, the original reasons for it have become long
31 > > irrelevant and it is not surprising that new compressors do not
32 > > support it. In particular, Portage already hit this problem twice:
33 > > once when users replaced bzip2 with parallel-capable pbzip2
34 > > implementation [#PBZIP2]_, and the second time when support for zstd
35 > > compressor was added [#ZSTD]_.
36 >
37 > I think this is actually the result of a rather opportunistic
38 > implementation. The fault is that we chose to use an extension that
39 > suggests the file is a regular compressed tarball.
40 > When one detects that a file is xpak padded, it is trivial to feed the
41 > decompressor just the relevant part of the datastream. The format
42 > itself isn't bad, and doesn't rely on obscure behaviour.
43
44 Except if you don't have the proper tools installed. In which case
45 the 'opportunistic' behavior made it possible to extract the contents
46 without special tools... except when it actually happens not to work
47 anymore. Roy's reply indicates that there is actually interest in this
48 design feature.
49
50 >
51 > > 3. **Placing metadata at the end of file makes partial fetches
52 > > complex.** While it is technically possible to obtain package
53 > > metadata remotely without fetching the whole package, it usually
54 > > requires e.g. 2-3 HTTP requests with rather complex driver. For
55 > > comparison, if metadata was placed at the beginning of the file,
56 > > early-terminated pipeline with a single fetch request would suffice.
57 >
58 > I think this point needs to be quantified somewhat why it is so
59 > important.
60 > I may be wrong, but the average binpkg is small, <1MiB, bigger packages
61 > are <50MiB.
62 > So what is the gain to be saved here? A "few" MiBs for what operation
63 > exactly? I say "few" because I know for some users this is actually not
64 > just a blib before it's downloaded. So if this is possible to achieve,
65 > in what scenarios is this going to be used (and is this often?).
66
67 Last I checked, Gentoo aimed to support more users than the 'majority'
68 of people with high-throughput Internet access. If there's no cost
69 in doing things better, why not do them better?
70
71 >
72 > > 4. **Extending the format with OpenPGP signatures is non-trivial.**
73 > > Depending on the implementation details, it either requires fetching
74 > > additional detached signature, breaking backwards compatibility or
75 > > introducing more custom logic to reassemble OpenPGP packets.
76 >
77 > I think one could add an extra key to the xpak that holds a gpg sig or
78 > something. Perhaps this point is better phrased as that current binpkgs
79 > don't have any validation options defined.
80
81 ...which extra key would mean that the two disjoint implementations
82 in use would need more custom code that extracts the signature,
83 reconstructs signed data for verification and verifies it. Or, in other
84 words, that user needs even more custom tooling to manually verify
85 the package he just fetched.
86
87 >
88 > > 5. **Metadata is not compressed.** This is not a significant problem,
89 > > it is just listed for completeness.
90 > >
91 > >
92 > > Goals for a new container format
93 > > --------------------------------
94 > >
95 > > The following goals have been set for a replacement format:
96 > >
97 > > 1. **The packages must remain contained in a single file.** As a matter
98 > > of user convenience, it should be possible to transfer binary
99 > > packages without having to use multiple files, and to install them
100 > > from any location.
101 > >
102 > > 2. **The file format must be entirely based on common file formats,
103 > > respecting best practices, with as little customization as necessary
104 > > to satisfy the requirements.** In particular, it is unacceptable
105 > > to create new binary formats.
106 >
107 > I take this as your personal opinion. I don't quite get why it is
108 > unacceptable to create a new binary format though. In particular when
109 > you're looking for efficiency, such format could serve your purposes.
110 > As long as it's clearly defined, I don't see the problem with a binary
111 > format either.
112 > Could you add why it is you think binary formats are unacceptable here?
113
114 Because custom binary formats require specialized tooling, and are
115 a royal PITA when the user wants to do something that the author of
116 specialized tooling just happened not to think worthwhile, or when
117 the tooling is not available for some reason. And before you ask really
118 silly questions, yes, I did fight binary packages over hex editor
119 at some point.
120
121 The most trivial case is an attempted recovery of a broken system.
122 If you don't have Portage working and don't have portage-utils
123 installed, do you really prefer a custom format which will require you
124 to fetch and compile special tools? Or is one that can be processed
125 with tools you're quite likely to have on every system, like tar?
126
127 >
128 > > 3. **The file format should provide for partial fetching of binary
129 > > packages.** It should be possible to easily fetch and read
130 > > the package metadata without having to download the whole package.
131 >
132 > Like above, what is the use-case here? Why would you want this? I
133 > think I'm missing something here.
134
135 Does this harm anything? Even if there's little real use for this, is
136 there any harm in supporting it? Are we supposed to do things the other
137 way around with no benefit just because you don't see any real use for
138 it?
139
140 >
141 > > 4. **The file format must provide support for OpenPGP signatures.**
142 > > Preferably, it should use standard OpenPGP message formats.
143 > >
144 > > 5. **The file format must allow for efficient metadata updates.**
145 > > In particular, it should be possible to update the metadata without
146 > > having to recompress package files.
147 > >
148 > > 6. **The file format should account for easy recognition both through
149 > > filename and through contents.** Preferably, it should have distinct
150 > > features making it possible to detect it via file(1).
151 > >
152 > > 7. **The file format should allow for metadata compression.**
153 > >
154 > > 8. **The file format should make future extensions easily possible
155 > > without breaking backwards compatibility.**
156 >
157 >
158
159 --
160 Best regards,
161 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies