Gentoo Archives: gentoo-dev

From: Fabian Groffen <grobian@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format
Date: Sun, 18 Nov 2018 09:16:55
Message-Id: 20181118091644.GA880@gentoo.org
In Reply to: [gentoo-dev] [pre-GLEP] Gentoo binary package container format by "Michał Górny"
1 On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
2 > Problems with the current binary package format
3 > -----------------------------------------------
4 >
5 > The following problems were identified with the package format currently
6 > in use:
7 >
8 > 1. **The packages rely on custom binary archive format to store
9 > metadata.** It is entirely Gentoo invented, and requires dedicated
10 > tooling to work with it. In fact, the reference implementation
11 > in Portage does not even include a CLI tool to work with tbz2
12 > packages; an unofficial implementation is provided as part
13 > of portage-utils toolkit [#PORTAGE-UTILS]_.
14
15 I think you should rewrite this section to the argument that the
16 metadata is hard to edit, and that there is only one tool to do so
17 (except a python interface from Portage?).
18 On a separate note, I don't think portage-utils can be considered
19 "unofficial", it is a Gentoo official project as far as I am aware.
20
21 > 2. **The format relies on obscure compressor feature of ignoring
22 > trailing garbage**. While this behavior is traditionally implemented
23 > by many compressors, the original reasons for it have become long
24 > irrelevant and it is not surprising that new compressors do not
25 > support it. In particular, Portage already hit this problem twice:
26 > once when users replaced bzip2 with parallel-capable pbzip2
27 > implementation [#PBZIP2]_, and the second time when support for zstd
28 > compressor was added [#ZSTD]_.
29
30 I think this is actually the result of a rather opportunistic
31 implementation. The fault is that we chose to use an extension that
32 suggests the file is a regular compressed tarball.
33 When one detects that a file is xpak padded, it is trivial to feed the
34 decompressor just the relevant part of the datastream. The format
35 itself isn't bad, and doesn't rely on obscure behaviour.
36
37 > 3. **Placing metadata at the end of file makes partial fetches
38 > complex.** While it is technically possible to obtain package
39 > metadata remotely without fetching the whole package, it usually
40 > requires e.g. 2-3 HTTP requests with rather complex driver. For
41 > comparison, if metadata was placed at the beginning of the file,
42 > early-terminated pipeline with a single fetch request would suffice.
43
44 I think this point needs to be quantified somewhat why it is so
45 important.
46 I may be wrong, but the average binpkg is small, <1MiB, bigger packages
47 are <50MiB.
48 So what is the gain to be saved here? A "few" MiBs for what operation
49 exactly? I say "few" because I know for some users this is actually not
50 just a blib before it's downloaded. So if this is possible to achieve,
51 in what scenarios is this going to be used (and is this often?).
52
53 > 4. **Extending the format with OpenPGP signatures is non-trivial.**
54 > Depending on the implementation details, it either requires fetching
55 > additional detached signature, breaking backwards compatibility or
56 > introducing more custom logic to reassemble OpenPGP packets.
57
58 I think one could add an extra key to the xpak that holds a gpg sig or
59 something. Perhaps this point is better phrased as that current binpkgs
60 don't have any validation options defined.
61
62 > 5. **Metadata is not compressed.** This is not a significant problem,
63 > it is just listed for completeness.
64 >
65 >
66 > Goals for a new container format
67 > --------------------------------
68 >
69 > The following goals have been set for a replacement format:
70 >
71 > 1. **The packages must remain contained in a single file.** As a matter
72 > of user convenience, it should be possible to transfer binary
73 > packages without having to use multiple files, and to install them
74 > from any location.
75 >
76 > 2. **The file format must be entirely based on common file formats,
77 > respecting best practices, with as little customization as necessary
78 > to satisfy the requirements.** In particular, it is unacceptable
79 > to create new binary formats.
80
81 I take this as your personal opinion. I don't quite get why it is
82 unacceptable to create a new binary format though. In particular when
83 you're looking for efficiency, such format could serve your purposes.
84 As long as it's clearly defined, I don't see the problem with a binary
85 format either.
86 Could you add why it is you think binary formats are unacceptable here?
87
88 > 3. **The file format should provide for partial fetching of binary
89 > packages.** It should be possible to easily fetch and read
90 > the package metadata without having to download the whole package.
91
92 Like above, what is the use-case here? Why would you want this? I
93 think I'm missing something here.
94
95 > 4. **The file format must provide support for OpenPGP signatures.**
96 > Preferably, it should use standard OpenPGP message formats.
97 >
98 > 5. **The file format must allow for efficient metadata updates.**
99 > In particular, it should be possible to update the metadata without
100 > having to recompress package files.
101 >
102 > 6. **The file format should account for easy recognition both through
103 > filename and through contents.** Preferably, it should have distinct
104 > features making it possible to detect it via file(1).
105 >
106 > 7. **The file format should allow for metadata compression.**
107 >
108 > 8. **The file format should make future extensions easily possible
109 > without breaking backwards compatibility.**
110
111 --
112 Fabian Groffen
113 Gentoo on a different level

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format "Michał Górny" <mgorny@g.o>