Gentoo Archives: gentoo-portage-dev

From: Fabian Groffen <grobian@g.o>
To: gentoo-portage-dev@l.g.o
Cc: Tim Harder <radhermit@g.o>
Subject: Re: [gentoo-portage-dev] Changing the VDB format
Date: Mon, 14 Mar 2022 12:26:57
Message-Id: Yi8zlPpzE+KjS8Uu@gentoo.org
In Reply to: [gentoo-portage-dev] Changing the VDB format by Matt Turner
1 Hi,
2
3 I've recently been thinking about this too.
4
5 On 13-03-2022 18:06:21 -0700, Matt Turner wrote:
6 > The VDB uses a one-file-per-variable format. This has some
7 > inefficiencies, with many file systems. For example the 'EAPI' file
8 > that contains a single character will consume a 4K block on disk.
9 > I recommend json and think it is the best choice because:
10
11 [snip]
12
13 > - json provides the smallest on-disk footprint
14 > - json is part of Python's standard library (so is yaml, and toml will
15 > be in Python 3.11)
16 > - Every programming language has multiple json parsers
17 > -- lots of effort has been spent making them extremely fast.
18
19 I would like to suggest to use "tar". The reason behind this is a bit
20 convoluted, but I try to be as clear and sound as I can:
21 - "new style" bin-packages use tar too
22 - tar-file allows to keep all individual files/members, e.g. for legacy
23 tools to unpack and look at the VDB that way
24 - tar-file allows streaming, so single file read, for efficient
25 retrieval
26 - single tar-file for entire VDB, allows to make it "atomic", one can
27 modify tar archives lateron to add new vdb entries, or perform
28 updates -- again, without inplace (e.g. memory backing) this could be
29 done atomic)
30 - tar-file could be used for (rsync) tree metadata (md5-cache) in the
31 same way, e.g. re-use streaming approach, or unpack for legacy tools
32 - tar-file could be used for Packages file, instead of flat file with
33 keys, basically just write VDB entries with some additional keys, very
34 similar in practise.
35 - tar-files are slightly easier to manage from command line, tools to do
36 so exist for a long time and are installed. (jq isn't pulled in by
37 @system these days, I think)
38 - tar-files can easily (optionally) be compressed retaining streaming
39 abilities (this is for these usages very likely to pay off), a much
40 higher dictionary benefit for a single tar vs many files.
41 - single tar-file is much more efficient to GPG-sign (which would allow
42 some securing of the VDB, not sure if useful though)
43 - going back to the first point, vdb entry from binary package could
44 simply be dropped into the vdb tar, and vice-versa
45 - going back to metadata, dep-resolving could simply load the entire
46 system available/installed packages with two reads in memory (if it
47 has enough of that -- pretty common these days), which should allow
48 for vast speedups, especially on cold(ish) filesystems.
49
50 > I think we would have a significant time period for the transition. I
51 > think I would include support for the new format in Portage, and ship
52 > a tool with portage to switch back and forth between old and new
53 > formats on-disk. Maybe after a year, drop the code from Portage to
54 > support the old format?
55
56 Here I believe that with tar-format, initially code could be written to
57 instead of accessing a file directly, it could open up the tar-file,
58 locate the member it needs, and then retrieve that instead. This is a
59 bit naive, but probably sort of managable, and allows to having a switch
60 that specifies which format to write. It's easy to detect which form
61 you have automatically. E.g. nothing has to change for users unless
62 they actively make a change for it.
63
64 Like you, I think the main reason for doing this should be performance,
65 basically allowing faster operations.
66
67 I feel though that we should aim to use a single solution to maintain a
68 number of "trees" that we have: metadata, vdb, Packages/binpkgs, for
69 they all seem to exhibit a similar (IO) behaviour when being employed.
70
71 Thanks,
72 Fabian
73
74 --
75 Fabian Groffen
76 Gentoo on a different level

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-portage-dev] Changing the VDB format Florian Schmaus <flow@g.o>