Gentoo Archives: gentoo-portage-dev

From: Florian Schmaus <flow@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Changing the VDB format
Date: Mon, 14 Mar 2022 15:35:15
Message-Id: 17083fb4-ea69-f145-a74d-4f21e56f4ee0@gentoo.org
In Reply to: Re: [gentoo-portage-dev] Changing the VDB format by Fabian Groffen
1 On 14/03/2022 13.22, Fabian Groffen wrote:
2 > Hi,
3 >
4 > I've recently been thinking about this too.
5 >
6 > On 13-03-2022 18:06:21 -0700, Matt Turner wrote:
7 >> The VDB uses a one-file-per-variable format. This has some
8 >> inefficiencies, with many file systems. For example the 'EAPI' file
9 >> that contains a single character will consume a 4K block on disk.
10 >> I recommend json and think it is the best choice because:
11 >
12 > [snip]
13 >
14 >> - json provides the smallest on-disk footprint
15 >> - json is part of Python's standard library (so is yaml, and toml will
16 >> be in Python 3.11)
17 >> - Every programming language has multiple json parsers
18 >> -- lots of effort has been spent making them extremely fast.
19 >
20 > I would like to suggest to use "tar".
21
22 Your idea sounds very appealing and I am by no means an expert to the
23 tar file format but
24 https://www.gnu.org/software/tar/manual/html_node/Standard.html states
25
26 """
27 …an archive consists of a series of file entries terminated by an
28 end-of-archive entry, which consists of two 512 blocks of zero bytes.
29 """
30
31 and the Wikipedia entry of 'tar' [1] states
32
33 """
34 Each file object includes any file data, and is preceded by a 512-byte
35 header record. The file data is written unaltered except that its length
36 is rounded up to a multiple of 512 bytes.
37 """
38
39 and furthermore
40
41 """
42 The end of an archive is marked by at least two consecutive zero-filled
43 records.
44 """
45
46 Which sounds like a lot of overhead if no compression is involved. Not
47 sure if this can be considered a knock out criteria for tar.
48
49 - Flow