Gentoo Archives: gentoo-portage-dev

From:	Florian Schmaus <flow@g.o>
To:	gentoo-portage-dev@l.g.o
Subject:	Re: [gentoo-portage-dev] Changing the VDB format
Date:	Mon, 14 Mar 2022 15:35:15
Message-Id:	`17083fb4-ea69-f145-a74d-4f21e56f4ee0@gentoo.org`
In Reply to:	Re: [gentoo-portage-dev] Changing the VDB format by Fabian Groffen

1	On 14/03/2022 13.22, Fabian Groffen wrote:
2	> Hi,
3	>
4	> I've recently been thinking about this too.
5	>
6	> On 13-03-2022 18:06:21 -0700, Matt Turner wrote:
7	>> The VDB uses a one-file-per-variable format. This has some
8	>> inefficiencies, with many file systems. For example the 'EAPI' file
9	>> that contains a single character will consume a 4K block on disk.
10	>> I recommend json and think it is the best choice because:
11	>
12	> [snip]
13	>
14	>> - json provides the smallest on-disk footprint
15	>> - json is part of Python's standard library (so is yaml, and toml will
16	>> be in Python 3.11)
17	>> - Every programming language has multiple json parsers
18	>> -- lots of effort has been spent making them extremely fast.
19	>
20	> I would like to suggest to use "tar".
21
22	Your idea sounds very appealing and I am by no means an expert to the
23	tar file format but
24	https://www.gnu.org/software/tar/manual/html_node/Standard.html states
25
26	"""
27	…an archive consists of a series of file entries terminated by an
28	end-of-archive entry, which consists of two 512 blocks of zero bytes.
29	"""
30
31	and the Wikipedia entry of 'tar' [1] states
32
33	"""
34	Each file object includes any file data, and is preceded by a 512-byte
35	header record. The file data is written unaltered except that its length
36	is rounded up to a multiple of 512 bytes.
37	"""
38
39	and furthermore
40
41	"""
42	The end of an archive is marked by at least two consecutive zero-filled
43	records.
44	"""
45
46	Which sounds like a lot of overhead if no compression is involved. Not
47	sure if this can be considered a knock out criteria for tar.
48
49	- Flow

Report Message

Find on MARC Find on Google Groups