Gentoo Archives: gentoo-dev

From: Ned Ludd <solar@g.o>
To: "Robin H. Johnson" <robbat2@g.o>
Cc: Gentoo Developers <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] A few modest suggestions regarding tree size
Date: Thu, 14 Oct 2004 14:08:43
Message-Id: 1097762832.1944.39.camel@simple
In Reply to: Re: [gentoo-dev] A few modest suggestions regarding tree size by "Robin H. Johnson"
1 On Wed, 2004-10-13 at 03:31, Robin H. Johnson wrote:
2 > On Wed, Oct 13, 2004 at 09:01:06AM +0200, Spider wrote:
3 > > > For real benefits, reducing the number of files, or using a filesystem
4 > > > that performs tail packing reduces the amount of disk seek that must
5 > > > be done, really increases performance given the number of small files.
6 > This is still applicable to your method as well.
7 >
8 > The one thing that your (previously known) method does bring out is that
9 > reducing the I/O required really helps.
10 >
11 > > Well, here's another method ;)
12 > >
13 > > /root/portage.img on /usr/portage type ext2 (rw,noatime,loop=/dev/loop0)
14 > > -rw-r--r-- 1 root root 293M Oct 12 23:17 /root/portage.img
15 > > /root/portage.img 257M 195M 62M 77% /usr/portage
16 > >
17 > >
18 > > some varied interesting things from tune2fs -l
19 > > Filesystem features: dir_index sparse_super
20 > > Inode count: 300144
21 > > Block count: 300000
22 > > Free blocks: 62825
23 > > Free inodes: 154512
24 > > Block size: 1024
25 > > Fragment size: 1024
26 > Pack it into a loopback reiserfs instead, way better performance. For
27 > an even bigger boost, put the loop file into tmpfs or use some other
28 > direct memory scheme.
29 >
30 > See:
31 > http://dev.gentoo.org/~robbat2/fastcvstest
32 >
33 > I developed the above when I was working on super-fast CVS repositories,
34 > as I needed my client to not be the bottleneck ;-). My record for a
35 > complete CVS checkout of gentoo-x86 (over the network to a remote
36 > client), stands at 65 seconds. This is quite a bit more work than an
37 > rsync checkout as well.
38 >
39 > Provided you can assure only a single client is using the loopback
40 > system, here is a very good way of keeping it fast, but not needing the
41 > network traffic of a full checkout:
42 > portage loop file is usually on disk, when a sync is needed:
43 > 1. umount loop file
44 > 2. copy loop file to /dev/shm or other fast place
45 > 3. mount loop file again (from new location)
46 > 4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge sync')
47 > 5. umount loop file, copy back to disk
48 > 6. mount loop file again
49 >
50 > The optimal reiserfs mount options are approximately:
51 > noexec,nosuid,nodev,noatime,nodiratime,nolog
52 >
53 > Your performance may vary with nolog, I use it for the workload of the
54 > CVS server tmpdir, which is a very frequent creation of 50,000 tiny
55 > files [for every checkout/update].
56 >
57 > Solar has been doing work on putting the contents of the tree into a
58 > read-only squashfs filesystem and distributing that.
59
60 New loopback size is 11M after reading this thread and dumping ChangeLog
61 & metadata.xml files which does seem like a perfectly feasible thing for
62 us to do. Removing leading/trailing whitespace and erroneous newlines
63 yielded no noticeable gains.
64
65 For fun I took it a step further to see what we could get if we moved
66 away from having locally stored digest/Manifest files then re-compressed
67 and got the portage tree down 8.5M. Yeah that's 8.5M down from Spiders
68 195M at a cost savings of 187.5M. I don't think dumping the
69 digest/Manifest would be to feasible at this time however.
70
71 --
72 Ned Ludd <solar@g.o>
73 Gentoo (hardened,security,infrastructure,embedded,toolchain) Developer

Attachments

File name MIME type
signature.asc application/pgp-signature