I originally sent the comments below to GWN as feedback on the recent call
for rsync etiquette. Kurt Lieber [firstname.lastname@example.org] asked me to forward
on to this list, for further discussion.
Please be gentle - this is my first posting to the list ;-)
From: Stuart Herbert [mailto:stuart@...]
Sent: 09 May 2003 11:24
Subject: rsync etiquette follow-up
I read with great interest your article on the need for rsync etiquette.
I'm sure lots of people have already written to you making the same point.
I'd like to add my voice to this point.
Gentoo's 'emerge' is a wonderful tool in many ways - but when it comes to a
site containing multiple machines, improvements in its design could help
reduce the load on your rsync servers.
I run four Gentoo boxes at the moment, and they're all connected to the
Internet through a single firewall configured with masquerading. If I were
to rsync each machine just once a day, I wonder whether it would look like
one machine had rsync'd four times from the point of view of the Gentoo
rsync mirror that I use?
Anyway, whenever I install a new package onto a machine, it is important to
my work that *each* of these machines has the same version of the package.
The machines are different x86 architectures, so building a binary package
on the one machine isn't my preferred choice. The whole point of using
Gentoo is that each machine runs code that is specifically optimised for the
To achieve this, there's no getting away from it. I have to 'emerge rsync'
on each machine, and 'emerge <package>'. The 'emerge rsync' is there to
ensure that each machine picks up the same version of both the package, and
its dependencies. This can be done via cron once every couple of days. But
it still means that I end up hitting your rsync mirrors *once* for every
Gentoo machine I run. I also end up hitting your distfile mirrors *once*
per machine as well.
I know that I could run an rsync mirror just for internal use - and that
would help a lot. Running a distfiles mirror is a lot less practical. It
would be much better if there was a way to share '/usr/portage' across
multiple machines. You can't do this safely via NFS. If two machines try
to download the same distfile at the same time, they interfere with each
It'd be much better if emerge could be changed to a client/server model,
where the emerge command becomes a client that contacts the (possibly
remote) emerge server to do all the downloading and rsyncing. For people
running both client/server on the same machine, there's no noticable
difference. For people like me, trying to run a site of machines, I'm able
to reduce the amount of load I place on your rsync servers and your distfile
This gives me another advantage. I don't have to have a (potentially large)
/usr/portage/distfiles on each machine. If three of my machines are
clients, and the server only runs on the fourth machine, I can setup a cron
job to clear out /usr/portage/distfiles on each of the clients on a nightly
basis - and keep my diskpage usage down. (As an aside, it'd be great to see
/usr/portage moved into /var. One of my few true disappointments with
Gentoo is having to have /usr mounted read-write a lot of the time)
I hope that I've explained it right - and that some intrepid emerge
developer will take on the task. And soon.
email@example.com mailing list