Gentoo Logo
Gentoo Spaceship

Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-dev
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
To: gentoo-dev@g.o
From: Ed W <lists@...>
Subject: Re: avoiding urgent stabilizations
Date: Sat, 26 Feb 2011 11:45:37 +0000

> But, for me, even a trimmed-down Gentoo is still too large
> (has to contain the whole base packages, from portage to
> toolchain, includes, etc). I'd prefer having only the essential
> runtime stuff within the containers.

I'm just building some embedded devices on the side using gentoo and my 
minimal builds are only a few MB? Curious why you feel you need to move 
from Gentoo to get the size smaller?

Seems like your complaint is that you have gentoo installs which are 
full featured with a toolchain and portage, which you are comparing to 
an installation you built with a different tool that doesn't have a 
toolchain installed?  However, you can do the same using gentoo if you 
wish? (you just need a lightweight package installer to avoid installing 

I think your main options are:

1) Build your base images without a toolchain or portage and use a 
minimal package installer to install pre-built binary packages.  This 
seems fraught with issues long term though...

2) Build your base images without a toolchain, but with portage (and 
perhaps a very minimal python). This gives you full dependency tracking 
and obviously bind mount/nfs mount the actual portage tree to avoid 
space used there. This seems workable and minimal?

3) If we are talking virtual machines then who cares if your containers 
are individually quite large, if the files in them are duplicated across 
all containers?  Simply use an appropriate de-duplication strategy to 
coalesce the space and most of the disadvantages disappear?  eg 
linux-vserver you can simply hardlink all the common files across your 
installations and allow the COW patch to break hardlinks if anyone 
alters a file in a single instance. Or you could use aufs to mount a 
writeable layer over your common base VM instance?  Or you could use one 
of the filesystems which de-duplicates files in the background (some 
caveats apply here to avoid memory still being used multiple times in 
each VM).  Or under KVM there is the memory coalescing feature which 
merges similar code pages (forget it's name?)

Personally I think option 3) is quite interesting from the medium number 
of virtual machines, ie in the 10s to hundreds, ie simply don't worry 
about it, let the OS do the work.  In the hundreds to thousands plus 
level I guess you have unique challenges and I would be wrong to try and 
suggest a solution from the comfort of a laptop without having that 
responsibility, but I would have thought there was some advantage in a 
very rigidly deployed base OS generated and updated very precisely?

> For this we need a different approach (strictly separating build
> and production environments). Binary distros (eg. Debian) might
> be one option, but they're lacking the configurability and mostly
> are still too large. So I'm going a different route using my own
> buildsystem - called Briegel - which originally was designed for
> embedded/small-device targets.
> For now I didn't have the spare time to port all the packages
> required for complete server systems (most of it is making
> them all cleanly crosscompile'able, as this is a fundamental
> concept of Briegel). But maybe you'd like to join in and try it :)

Sounds like an interesting challenge, but I'm unconvinced you can't 
solve 90% of your problem within the constraints of Gentoo? This saves 
you a bunch of time that could be invested in the last 10% through more 
traditional means?

>> It does appear like managing large numbers of virtual machines is one
>> are that gentoo could score very well?  Interested to see any chatter on
>> how others solve this problem, or any general advocacy?  Probably we
>> should start a new thread though...
> I'm not sure if Gentoo really is the right distro for that purpose,
> as it's targeted to very different systems (i.g. Gentoo boxes are
> expected to be quite unique, beginning with different per-package
> useflags, even down to cflags, etc). But it might still be a good
> basis for building specific system images (let's call them stage5 ;-))

I won't disagree on your "where it's targeted", but just to re-iterate 
why I think Gentoo works well is that it does have a very workable 
binary package feature!

My way of working is to use (several) shared binary package repos and 
the guests largely pull from those shared package directories.  In fact 
what I do is have a minimal number of shared "/usr/portage/package" 
directories and I mount an appropriate one to the guest type at boot 
time.  At the moment my main two options are "32bit" and "64bit" for the 
package mounts, but I recently introduced a new machine type which is 
held back to perl 5.8 and that guest gets it's own package mount since 
it's obviously linking a lot of binaries differently

So, my process is to test an update on a small number of guests, either 
dedicated test guests or less important live guests.  If this looks good 
then I run the upgrade against all other Vms of the same type and they 
will update quickly from package binaries

Now, the icing is that this works extremely well even once you decide to 
lightly customise machine types.  So for example my binary packages are 
very high level (eg 32/64bit), my "profiles" would be fairly high level, 
eg I have www-apache and www-nginx base profiles.  However, a specific 
virtual machine running say nginx might itself need a specific PHP 
application installed, and that itself might need some dependencies, 
which in turn might require a specific set of customisation of use flags 
and versions.

Now, the neat thing is that the binary upgrade options are *either* to 
use *only* binary packages, OR to use binary packages *if* they were 
built with the correct USE flags. So for example I haven't bothered to 
split out my packages directory to be specific to the nginx/apache 
machines, however, this causes the PHP package to be regularly rebuilt 
depending on whether it was last used to upgrade an nginx or apache 
guest (different use flags needed for each guest).  I could fix this 
easily enough, but it's not a problem for me and it's automatically 
handled through the portage binary package updates

So the end result is that you can make efficient use of binary updates, 
but portage will still customise the odd package here or there where a 
local machine requires something which differs from the norm.  To my eye 
this keeps most of the benefits of an RPM/DEB style binary updater, with 
the flexibility of a per machine, customised USE flag gentoo installation?

> An setup for 100 equal webserver vm's could look like this:
> * run a normal Gentoo vm (tailored for the webserver appliance),
>    where do you do regular updates (emerge, revdep-rebuild, etc, etc)
> * from time to time take a snapshot, strip off the buildtime-only
>    stuff (hmm, could turn out to be a bit tricky ;-o)
> * this stripped snapshot now goes into testing vm's
> * when approved, the individual production vm's are switched over
>    to the new image (maybe using some mount magic, unionfs, etc)

This could work and perhaps for 100 identical Vms you have enough meat 
to work on something quite customised anyway?

Personally for 20-80 identical VMs running very limited variety of web 
software I would go for:
- Slightly cut down gentoo VM
- Hardlinked across all instances OR single installation which is read only
- Writeable data areas mounted to their own space (/var/www, /tmp, 
/home, etc)

By separating the data from the OS you have a lot of flexibility to 
upgrade the base webserver install and mount the data back on the new 
VM?  With linux-vservers or other container style, you will find that 
the OS shares code segments across all virtual machines (due to all 
files sharing the same inode) and the memory usage should be much lower 
and nearer to firing up an instance of the shared app and it then 
forking (ie data is duplicated, but the code segment is shared)

For 100+ Vms I guess I would be looking very strongly at a common 
read-only OS partition and container style virtualisation

For 20-80 near identical VMs, but running a wider variety of web 
software I would go for the hardlinked option with a straightforward 
"emerge" upgrade option across them.  Hardlinking keeps the memory usage 
sane where possible, without the pain of trying to keep the base install 
absolutely identical and read-only to make the common mount option work?

> At this point I've got a question for to the other folks here:
> emerge has an --root option which allows to (un)merge in a separate
> system image. So it should be possible to unmerge a lot of system
> packages which are just required for updating/building (even
> portage itself), but this still will be manual - what about
> dependency handling ?

This is correct.  In fact this is how you build a stage 1,2,3 etc and 
how catalyst works!

The information is a bit spread out over several out of date wiki 
articles, but perhaps start with:

Roughly speaking you could "freshen" your current installation with 
(from memory):
     ROOT="/tmp/new_build" emerge -av world

This has minor gremlins when I test it, probably due to some symlinks 
being created differently if you follow the current catalyst build 
script through stage 1,2,3 etc, but roughly speaking it does the same 
thing only jumping straight to the end result and building a completely 
new identical install to your current OS...

Even more special is that you can set an alternative portage source, so 
if you want to build your new ROOT with alternative make.conf, 
/etc/portage/*, etc then just put your new files somewhere and set 
PORTAGE_CONFIGROOT to point to it.  Cross compiling is also done through 
an extension of this basic method

So, following your chain of thought - yes it's not too hard to quickly 
generate a customised base OS installation to use for your future VMs.  
Further, if you wish you can make those VMs have a reduced or missing 
toolchain etc.  In fact if you google a bit I think you will find some 
recipes for very minimal VMs using this method where the base VM is a 
very minimal install...

> Is there some way to drop at least parts of the standard system set,
> so eg. portage, python, gcc, etc, etc get unmerged by --depclean
> if nobody else (in world set) doesn't explicitly require them ?

You are almost thinking about it all wrong.  ("There is no spoon...")

This is gentoo, so at this more advanced level, stop thinking about 
"standard system set" and instead free your mind to start with 
"nothing".  Go read the old bootstrap from stage 1 instructions, plus 
the TinyGentoo pages and you can quickly see that Catalyst builds your 
working installation by starting from a working installation, creating 
an empty directory, adding some minimal packages to that directory and 
building up from there.

So absolutely nothing stops you from just starting with an empty 
directory and just emerging a few basic packages into it (couple MB) and 
then chrooting into it and having some fun...  There is *no* minimal 
package set, you can install whatever you want (as long as it boots). 
Largely the portage dependency tracker will help you pull in the minimal 
needed dependencies, but beware that system packages arent generally 
explicitly tracked so you may stumble across some deps when you are 
going really basic and omiting standard system packages (just use common 
sense: it should be fairly obvious if an application requires a compiler 
and you didn't install one then you have a conflict of interest...)

Have another look at gentoo!  I definitely believe that it's flexibility 
to build you highly customised packages, plus strong templating of those 
packages, plus decent ability to distribute binaries of the end result 
is a very strong combo!  Better binary support is really the only thing 
missing here, but it's pretty adequate as it stands!

Good luck

Ed W

Re: avoiding urgent stabilizations
-- Enrico Weigelt
avoiding urgent stabilizations
-- PaweĊ‚ Hajdan, Jr.
Re: avoiding urgent stabilizations
-- Enrico Weigelt
Re: avoiding urgent stabilizations
-- Ed W
Re: avoiding urgent stabilizations
-- Matthew Marlowe
Re: avoiding urgent stabilizations
-- Ed W
Re: avoiding urgent stabilizations
-- Enrico Weigelt
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: avoiding urgent stabilizations
Next by thread:
Re: avoiding urgent stabilizations
Previous by date:
Add USE_EXPAND for dracut
Next by date:
Re: avoiding urgent stabilizations

Updated Jun 29, 2012

Summary: Archive of the gentoo-dev mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.