Gentoo Archives: gentoo-dev

From: Ed W <lists@××××××××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] avoiding urgent stabilizations
Date: Sat, 26 Feb 2011 12:22:21
Message-Id: 4D68E7E1.1050109@wildgooses.com
In Reply to: Re: [gentoo-dev] avoiding urgent stabilizations by Enrico Weigelt
1 Hi
2
3 > But, for me, even a trimmed-down Gentoo is still too large
4 > (has to contain the whole base packages, from portage to
5 > toolchain, includes, etc). I'd prefer having only the essential
6 > runtime stuff within the containers.
7
8 I'm just building some embedded devices on the side using gentoo and my
9 minimal builds are only a few MB? Curious why you feel you need to move
10 from Gentoo to get the size smaller?
11
12 Seems like your complaint is that you have gentoo installs which are
13 full featured with a toolchain and portage, which you are comparing to
14 an installation you built with a different tool that doesn't have a
15 toolchain installed? However, you can do the same using gentoo if you
16 wish? (you just need a lightweight package installer to avoid installing
17 portage)
18
19 I think your main options are:
20
21 1) Build your base images without a toolchain or portage and use a
22 minimal package installer to install pre-built binary packages. This
23 seems fraught with issues long term though...
24
25 2) Build your base images without a toolchain, but with portage (and
26 perhaps a very minimal python). This gives you full dependency tracking
27 and obviously bind mount/nfs mount the actual portage tree to avoid
28 space used there. This seems workable and minimal?
29
30 3) If we are talking virtual machines then who cares if your containers
31 are individually quite large, if the files in them are duplicated across
32 all containers? Simply use an appropriate de-duplication strategy to
33 coalesce the space and most of the disadvantages disappear? eg
34 linux-vserver you can simply hardlink all the common files across your
35 installations and allow the COW patch to break hardlinks if anyone
36 alters a file in a single instance. Or you could use aufs to mount a
37 writeable layer over your common base VM instance? Or you could use one
38 of the filesystems which de-duplicates files in the background (some
39 caveats apply here to avoid memory still being used multiple times in
40 each VM). Or under KVM there is the memory coalescing feature which
41 merges similar code pages (forget it's name?)
42
43 Personally I think option 3) is quite interesting from the medium number
44 of virtual machines, ie in the 10s to hundreds, ie simply don't worry
45 about it, let the OS do the work. In the hundreds to thousands plus
46 level I guess you have unique challenges and I would be wrong to try and
47 suggest a solution from the comfort of a laptop without having that
48 responsibility, but I would have thought there was some advantage in a
49 very rigidly deployed base OS generated and updated very precisely?
50
51
52 > For this we need a different approach (strictly separating build
53 > and production environments). Binary distros (eg. Debian) might
54 > be one option, but they're lacking the configurability and mostly
55 > are still too large. So I'm going a different route using my own
56 > buildsystem - called Briegel - which originally was designed for
57 > embedded/small-device targets.
58 >
59 > For now I didn't have the spare time to port all the packages
60 > required for complete server systems (most of it is making
61 > them all cleanly crosscompile'able, as this is a fundamental
62 > concept of Briegel). But maybe you'd like to join in and try it :)
63
64 Sounds like an interesting challenge, but I'm unconvinced you can't
65 solve 90% of your problem within the constraints of Gentoo? This saves
66 you a bunch of time that could be invested in the last 10% through more
67 traditional means?
68
69
70 >> It does appear like managing large numbers of virtual machines is one
71 >> are that gentoo could score very well? Interested to see any chatter on
72 >> how others solve this problem, or any general advocacy? Probably we
73 >> should start a new thread though...
74 > I'm not sure if Gentoo really is the right distro for that purpose,
75 > as it's targeted to very different systems (i.g. Gentoo boxes are
76 > expected to be quite unique, beginning with different per-package
77 > useflags, even down to cflags, etc). But it might still be a good
78 > basis for building specific system images (let's call them stage5 ;-))
79
80 I won't disagree on your "where it's targeted", but just to re-iterate
81 why I think Gentoo works well is that it does have a very workable
82 binary package feature!
83
84 My way of working is to use (several) shared binary package repos and
85 the guests largely pull from those shared package directories. In fact
86 what I do is have a minimal number of shared "/usr/portage/package"
87 directories and I mount an appropriate one to the guest type at boot
88 time. At the moment my main two options are "32bit" and "64bit" for the
89 package mounts, but I recently introduced a new machine type which is
90 held back to perl 5.8 and that guest gets it's own package mount since
91 it's obviously linking a lot of binaries differently
92
93 So, my process is to test an update on a small number of guests, either
94 dedicated test guests or less important live guests. If this looks good
95 then I run the upgrade against all other Vms of the same type and they
96 will update quickly from package binaries
97
98 Now, the icing is that this works extremely well even once you decide to
99 lightly customise machine types. So for example my binary packages are
100 very high level (eg 32/64bit), my "profiles" would be fairly high level,
101 eg I have www-apache and www-nginx base profiles. However, a specific
102 virtual machine running say nginx might itself need a specific PHP
103 application installed, and that itself might need some dependencies,
104 which in turn might require a specific set of customisation of use flags
105 and versions.
106
107 Now, the neat thing is that the binary upgrade options are *either* to
108 use *only* binary packages, OR to use binary packages *if* they were
109 built with the correct USE flags. So for example I haven't bothered to
110 split out my packages directory to be specific to the nginx/apache
111 machines, however, this causes the PHP package to be regularly rebuilt
112 depending on whether it was last used to upgrade an nginx or apache
113 guest (different use flags needed for each guest). I could fix this
114 easily enough, but it's not a problem for me and it's automatically
115 handled through the portage binary package updates
116
117 So the end result is that you can make efficient use of binary updates,
118 but portage will still customise the odd package here or there where a
119 local machine requires something which differs from the norm. To my eye
120 this keeps most of the benefits of an RPM/DEB style binary updater, with
121 the flexibility of a per machine, customised USE flag gentoo installation?
122
123
124 > An setup for 100 equal webserver vm's could look like this:
125 >
126 > * run a normal Gentoo vm (tailored for the webserver appliance),
127 > where do you do regular updates (emerge, revdep-rebuild, etc, etc)
128 > * from time to time take a snapshot, strip off the buildtime-only
129 > stuff (hmm, could turn out to be a bit tricky ;-o)
130 > * this stripped snapshot now goes into testing vm's
131 > * when approved, the individual production vm's are switched over
132 > to the new image (maybe using some mount magic, unionfs, etc)
133
134 This could work and perhaps for 100 identical Vms you have enough meat
135 to work on something quite customised anyway?
136
137 Personally for 20-80 identical VMs running very limited variety of web
138 software I would go for:
139 - Slightly cut down gentoo VM
140 - Hardlinked across all instances OR single installation which is read only
141 - Writeable data areas mounted to their own space (/var/www, /tmp,
142 /home, etc)
143
144 By separating the data from the OS you have a lot of flexibility to
145 upgrade the base webserver install and mount the data back on the new
146 VM? With linux-vservers or other container style, you will find that
147 the OS shares code segments across all virtual machines (due to all
148 files sharing the same inode) and the memory usage should be much lower
149 and nearer to firing up an instance of the shared app and it then
150 forking (ie data is duplicated, but the code segment is shared)
151
152
153 For 100+ Vms I guess I would be looking very strongly at a common
154 read-only OS partition and container style virtualisation
155
156 For 20-80 near identical VMs, but running a wider variety of web
157 software I would go for the hardlinked option with a straightforward
158 "emerge" upgrade option across them. Hardlinking keeps the memory usage
159 sane where possible, without the pain of trying to keep the base install
160 absolutely identical and read-only to make the common mount option work?
161
162
163 > At this point I've got a question for to the other folks here:
164 >
165 > emerge has an --root option which allows to (un)merge in a separate
166 > system image. So it should be possible to unmerge a lot of system
167 > packages which are just required for updating/building (even
168 > portage itself), but this still will be manual - what about
169 > dependency handling ?
170
171 This is correct. In fact this is how you build a stage 1,2,3 etc and
172 how catalyst works!
173
174 The information is a bit spread out over several out of date wiki
175 articles, but perhaps start with:
176 http://en.gentoo-wiki.com/wiki/Tiny_Gentoo
177
178 Roughly speaking you could "freshen" your current installation with
179 (from memory):
180 ROOT="/tmp/new_build" emerge -av world
181
182 This has minor gremlins when I test it, probably due to some symlinks
183 being created differently if you follow the current catalyst build
184 script through stage 1,2,3 etc, but roughly speaking it does the same
185 thing only jumping straight to the end result and building a completely
186 new identical install to your current OS...
187
188 Even more special is that you can set an alternative portage source, so
189 if you want to build your new ROOT with alternative make.conf,
190 /etc/portage/*, etc then just put your new files somewhere and set
191 PORTAGE_CONFIGROOT to point to it. Cross compiling is also done through
192 an extension of this basic method
193
194 So, following your chain of thought - yes it's not too hard to quickly
195 generate a customised base OS installation to use for your future VMs.
196 Further, if you wish you can make those VMs have a reduced or missing
197 toolchain etc. In fact if you google a bit I think you will find some
198 recipes for very minimal VMs using this method where the base VM is a
199 very minimal install...
200
201 > Is there some way to drop at least parts of the standard system set,
202 > so eg. portage, python, gcc, etc, etc get unmerged by --depclean
203 > if nobody else (in world set) doesn't explicitly require them ?
204
205 You are almost thinking about it all wrong. ("There is no spoon...")
206
207 This is gentoo, so at this more advanced level, stop thinking about
208 "standard system set" and instead free your mind to start with
209 "nothing". Go read the old bootstrap from stage 1 instructions, plus
210 the TinyGentoo pages and you can quickly see that Catalyst builds your
211 working installation by starting from a working installation, creating
212 an empty directory, adding some minimal packages to that directory and
213 building up from there.
214
215 So absolutely nothing stops you from just starting with an empty
216 directory and just emerging a few basic packages into it (couple MB) and
217 then chrooting into it and having some fun... There is *no* minimal
218 package set, you can install whatever you want (as long as it boots).
219 Largely the portage dependency tracker will help you pull in the minimal
220 needed dependencies, but beware that system packages arent generally
221 explicitly tracked so you may stumble across some deps when you are
222 going really basic and omiting standard system packages (just use common
223 sense: it should be fairly obvious if an application requires a compiler
224 and you didn't install one then you have a conflict of interest...)
225
226
227 Have another look at gentoo! I definitely believe that it's flexibility
228 to build you highly customised packages, plus strong templating of those
229 packages, plus decent ability to distribute binaries of the end result
230 is a very strong combo! Better binary support is really the only thing
231 missing here, but it's pretty adequate as it stands!
232
233 Good luck
234
235 Ed W

Replies

Subject Author
Re: [gentoo-dev] avoiding urgent stabilizations Enrico Weigelt <weigelt@×××××.de>