1 |
On 14/03/2022 13.22, Fabian Groffen wrote: |
2 |
> Hi, |
3 |
> |
4 |
> I've recently been thinking about this too. |
5 |
> |
6 |
> On 13-03-2022 18:06:21 -0700, Matt Turner wrote: |
7 |
>> The VDB uses a one-file-per-variable format. This has some |
8 |
>> inefficiencies, with many file systems. For example the 'EAPI' file |
9 |
>> that contains a single character will consume a 4K block on disk. |
10 |
>> I recommend json and think it is the best choice because: |
11 |
> |
12 |
> [snip] |
13 |
> |
14 |
>> - json provides the smallest on-disk footprint |
15 |
>> - json is part of Python's standard library (so is yaml, and toml will |
16 |
>> be in Python 3.11) |
17 |
>> - Every programming language has multiple json parsers |
18 |
>> -- lots of effort has been spent making them extremely fast. |
19 |
> |
20 |
> I would like to suggest to use "tar". |
21 |
|
22 |
Your idea sounds very appealing and I am by no means an expert to the |
23 |
tar file format but |
24 |
https://www.gnu.org/software/tar/manual/html_node/Standard.html states |
25 |
|
26 |
""" |
27 |
…an archive consists of a series of file entries terminated by an |
28 |
end-of-archive entry, which consists of two 512 blocks of zero bytes. |
29 |
""" |
30 |
|
31 |
and the Wikipedia entry of 'tar' [1] states |
32 |
|
33 |
""" |
34 |
Each file object includes any file data, and is preceded by a 512-byte |
35 |
header record. The file data is written unaltered except that its length |
36 |
is rounded up to a multiple of 512 bytes. |
37 |
""" |
38 |
|
39 |
and furthermore |
40 |
|
41 |
""" |
42 |
The end of an archive is marked by at least two consecutive zero-filled |
43 |
records. |
44 |
""" |
45 |
|
46 |
Which sounds like a lot of overhead if no compression is involved. Not |
47 |
sure if this can be considered a knock out criteria for tar. |
48 |
|
49 |
- Flow |