Gentoo Archives: gentoo-portage-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-portage-dev <gentoo-portage-dev@l.g.o>
Subject: [gentoo-portage-dev] [RFC] Improving Gentoo package format
Date: Sat, 10 Nov 2018 13:09:13
Message-Id: 1541855343.23469.13.camel@gentoo.org
1 Hi, everyone.
2
3 The Gentoo's tbz2/xpak package format is quite old. We've made a few
4 incompatible changes in the past (most notably, allowing non-bzip2
5 compression and multi-instance naming) but the core design stayed
6 the same. I think we should consider changing it, for the reasons
7 outlined below.
8
9 The rough format description can be found in xpak(5). Basically, it's
10 a regular compressed tarball with binary metadata blob appended
11 to the end. As such, it looks like a regular compressed tarball
12 to the compression tools (with some ignored junk at the end).
13 The metadata is entirely custom format and needs dedicated tools
14 to manipulate.
15
16
17 The current format has a few advantages whose preserving would probably
18 be worthwhile:
19
20 + The binary package is a single flat file.
21
22 + It is reasonably compatible with regular compressed tarball,
23 so the users can unpack it using standard tools (except for metadata).
24
25 + The metadata is uncompressed and can be quickly found without touching
26 the compressed data.
27
28 + The metadata can be updated (e.g. as result of pkgmove) without
29 touching the compressed data.
30
31
32 However, it has a few disadvantages as well:
33
34 - The metadata is entirely custom binary format, requiring dedicated
35 tools to read or edit.
36
37 - The metadata format is relying on customary behavior of compression
38 tools that ignore junk following the compressed data.
39
40 - By placing the metadata at the end of file, we make it rather hard to
41 read the metadata from remote location (via FTP, HTTP) without fetching
42 the whole file. [NB: it's technically possible but probably not worth
43 the effort]
44
45 - By requiring the custom format to be at the end of file, we make it
46 impossible to trivially cover it with a OpenPGP signature without
47 introducing another custom format.
48
49 - While the format might allow for some extensibility, it's rather
50 evolutionary dead end.
51
52
53 I think the key points of the new format should be:
54
55 1. It should reuse common file formats as much as possible, with
56 inventing as little custom code as possible.
57
58 2. It should allow for easy introspection and editing by users without
59 dedicated tools.
60
61 3. The metadata should allow for lookup without fetching the whole
62 binary package.
63
64 4. The format should allow for some extensions without having to
65 reinvent the wheel every time.
66
67 5. It would be nice to preserve the existing advantages.
68
69
70 My proposal
71 ===========
72
73 Basic format
74 ------------
75 The base of the format is a regular compressed tarball. There's no junk
76 appended to it but the metadata is stored inside it as
77 /var/db/pkg/${PF}. The contents are as compatible with the actual vdb
78 format as possible.
79
80 This has the following advantages:
81
82 + Binary package is still stored as a single file.
83
84 + It uses a standard compressed .tar format, with minimal customization.
85
86 + The user can easily inspect and modify the packages with standard
87 tools (tar and the compressor).
88
89 + If we can maintain reasonable level of vdb compatibility, the user can
90 even emergency-install a package without causing too much hassle (as it
91 will be recorded in vdb); ideally Portage would detect this vdb entry
92 and support fixing the install afterwards.
93
94
95 Optimizing for easy recognition
96 -------------------------------
97 In order to make it possible for magic-based tools such as file(1) to
98 easily distinguish Gentoo binary packages from regular tarballs, we
99 could (ab)use the volume label field, e.g. use:
100
101 $ tar -V 'gpkg: app-foo/bar-1' -c ...
102
103 This will add a volume label as the first file entry inside the tarball,
104 which does not affect extracting but can be trivially matched via magic
105 rules.
106
107 Note: this is meant to be used as a method for fast binary package
108 recognition; I don't think we should reject (hand-modified) binary
109 packages that lack this label.
110
111
112 Optimizing for metadata reading/manipulation performance
113 --------------------------------------------------------
114 The main problem with using a single tarball for both metadata and data
115 is that normally you'd have to decompress everything to reliably unpack
116 metadata, and recompress everything to update it. This problem can be
117 addressed by a few optimization tricks.
118
119 Firstly, all metadata files are packed to the archive before data files.
120 With a slightly customized unpacker, we can stop decompressing as soon
121 as we're past metadata and avoid decompressing the whole archive. This
122 will also make it possible to read metadata from remote files without
123 fetching far past the compressed metadata block.
124
125 Secondly, if we're up for some more tricks, we could technically split
126 the tarball into metadata and data blocks compressed separately. This
127 will need a bit of archiver customization but it will make it possible
128 to decompress the metadata part without even touching compressed data,
129 and to replace it without recompressing data.
130
131 What's important is that both tricks proposed maintain backwards
132 compatibility with regular compressed tarballs. That is, the user will
133 still be able to extract it with regular archiving tools.
134
135
136 Adding OpenPGP signatures
137 -------------------------
138 This is the main XXX here.
139
140 Technically, the most obvious solution is to cover the entire tarball
141 with OpenPGP signature. However, this has the disadvantage that
142 the verification requires fetching the whole file.
143
144 I will look into possibility of having partial signatures.
145
146
147 --
148 Best regards,
149 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies