On Wed, 2004-10-13 at 03:31, Robin H. Johnson wrote:
> On Wed, Oct 13, 2004 at 09:01:06AM +0200, Spider wrote:
> > > For real benefits, reducing the number of files, or using a filesystem
> > > that performs tail packing reduces the amount of disk seek that must
> > > be done, really increases performance given the number of small files.
> This is still applicable to your method as well.
>
> The one thing that your (previously known) method does bring out is that
> reducing the I/O required really helps.
>
> > Well, here's another method ;)
> >
> > /root/portage.img on /usr/portage type ext2 (rw,noatime,loop=/dev/loop0)
> > -rw-r--r-- 1 root root 293M Oct 12 23:17 /root/portage.img
> > /root/portage.img 257M 195M 62M 77% /usr/portage
> >
> >
> > some varied interesting things from tune2fs -l
> > Filesystem features: dir_index sparse_super
> > Inode count: 300144
> > Block count: 300000
> > Free blocks: 62825
> > Free inodes: 154512
> > Block size: 1024
> > Fragment size: 1024
> Pack it into a loopback reiserfs instead, way better performance. For
> an even bigger boost, put the loop file into tmpfs or use some other
> direct memory scheme.
>
> See:
> http://dev.gentoo.org/~robbat2/fastcvstest
>
> I developed the above when I was working on super-fast CVS repositories,
> as I needed my client to not be the bottleneck ;-). My record for a
> complete CVS checkout of gentoo-x86 (over the network to a remote
> client), stands at 65 seconds. This is quite a bit more work than an
> rsync checkout as well.
>
> Provided you can assure only a single client is using the loopback
> system, here is a very good way of keeping it fast, but not needing the
> network traffic of a full checkout:
> portage loop file is usually on disk, when a sync is needed:
> 1. umount loop file
> 2. copy loop file to /dev/shm or other fast place
> 3. mount loop file again (from new location)
> 4. run updates to loop filesystem ('cvs up; emerge metadata' or 'emerge sync')
> 5. umount loop file, copy back to disk
> 6. mount loop file again
>
> The optimal reiserfs mount options are approximately:
> noexec,nosuid,nodev,noatime,nodiratime,nolog
>
> Your performance may vary with nolog, I use it for the workload of the
> CVS server tmpdir, which is a very frequent creation of 50,000 tiny
> files [for every checkout/update].
>
> Solar has been doing work on putting the contents of the tree into a
> read-only squashfs filesystem and distributing that.
New loopback size is 11M after reading this thread and dumping ChangeLog
& metadata.xml files which does seem like a perfectly feasible thing for
us to do. Removing leading/trailing whitespace and erroneous newlines
yielded no noticeable gains.
For fun I took it a step further to see what we could get if we moved
away from having locally stored digest/Manifest files then re-compressed
and got the portage tree down 8.5M. Yeah that's 8.5M down from Spiders
195M at a cost savings of 187.5M. I don't think dumping the
digest/Manifest would be to feasible at this time however.
--
Ned Ludd <solar@g.o>
Gentoo (hardened,security,infrastructure,embedded,toolchain) Developer
|