Gentoo Archives: gentoo-dev

From: Heiko Wundram <heikowu@×××××.de>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] portage with a filesystem in a box
Date: Thu, 22 Apr 2004 10:06:19
Message-Id: 200404221206.14454.heikowu@ceosg.de
1 Hi to all!
2
3 I've been implementing a filesystem in a box (in Python) for a project of mine
4 (the filesystem in a box part is called PyVFS, btw., will be on SF soon
5 enough). The filesystem part isn't production ready yet at the moment, but
6 "works good enough" (TM) for me to entrust it to do backend storage for a
7 web-script I write (that's the actual project I'm working on).
8
9 I've seen several attempts out there to have Python store the portage DB in
10 something other than the normal filesystem (which makes sense, as the portage
11 DB's disk-usage compared to actual size is considerably different, and the
12 portage DB is also quite compressible), but all of them were using something
13 like an SQL database or other external modules which would require more
14 things to be compiled during bootstrapping.
15
16 The module I've written is completely written in Python, requires 2.3.x to run
17 (I've only tested it on x=3), and has only moderate overhead. Basically it
18 creates a real filesystem (ext2-like), for which you can set a considerably
19 smaller block-size, in a file. Accessing the filesystem (loading/storing
20 files) is done using an interface which closely resembles the standard Python
21 way of working with files.
22
23 As a test, I've loaded the portage DB into the filesystem, and the output was
24 an astonishing size decrease from about 300MB (on a ReiserFS-partition with
25 block-size 4KB) to about 120MB for the single file (filesystem created with a
26 block-size of 512b).
27
28 I have yet to implement compression for parts of the filesystem (this would
29 decrease size even further), as this would mean that compressed parts would
30 have to be completely preloaded into memory before access, but I'm planning
31 on implementing something like attributes which can be set on a
32 directory-basis (e.g. compress all sub-elements of this directory into a
33 single virtual filesystem which gets loaded into memory, decompressed, and
34 then mounted at that point of the tree transparently).
35
36 I can already hear a lot of objections coming (for example what about rsync),
37 but implementing an rsync-like protocol on top of this filesystem in Python
38 is nothing that's undoable, even reimplementing rsync in Python (or at least
39 the necessary subset of the protocol which is necessary to load the tree, as
40 we're not doing uploads to the server, only downloads) shouldn't be a real
41 problem.
42
43 Now, what I'm posting for is just for asking if someone out there among the
44 other gentoo developers has an interest in following this project with me,
45 working on it together, or any of that. Feel free to mail me.
46
47 As a side-note, I can't release PyVFS yet, as I'm bound to my employer on
48 that, but I've had green lights to release it under LGPL sometime at the
49 beginning of next week... I just need that little signature from by
50 boss... :)
51
52 Heiko Wundram.
53
54 --
55 gentoo-dev@g.o mailing list