1 |
Hi |
2 |
|
3 |
> But, for me, even a trimmed-down Gentoo is still too large |
4 |
> (has to contain the whole base packages, from portage to |
5 |
> toolchain, includes, etc). I'd prefer having only the essential |
6 |
> runtime stuff within the containers. |
7 |
|
8 |
I'm just building some embedded devices on the side using gentoo and my |
9 |
minimal builds are only a few MB? Curious why you feel you need to move |
10 |
from Gentoo to get the size smaller? |
11 |
|
12 |
Seems like your complaint is that you have gentoo installs which are |
13 |
full featured with a toolchain and portage, which you are comparing to |
14 |
an installation you built with a different tool that doesn't have a |
15 |
toolchain installed? However, you can do the same using gentoo if you |
16 |
wish? (you just need a lightweight package installer to avoid installing |
17 |
portage) |
18 |
|
19 |
I think your main options are: |
20 |
|
21 |
1) Build your base images without a toolchain or portage and use a |
22 |
minimal package installer to install pre-built binary packages. This |
23 |
seems fraught with issues long term though... |
24 |
|
25 |
2) Build your base images without a toolchain, but with portage (and |
26 |
perhaps a very minimal python). This gives you full dependency tracking |
27 |
and obviously bind mount/nfs mount the actual portage tree to avoid |
28 |
space used there. This seems workable and minimal? |
29 |
|
30 |
3) If we are talking virtual machines then who cares if your containers |
31 |
are individually quite large, if the files in them are duplicated across |
32 |
all containers? Simply use an appropriate de-duplication strategy to |
33 |
coalesce the space and most of the disadvantages disappear? eg |
34 |
linux-vserver you can simply hardlink all the common files across your |
35 |
installations and allow the COW patch to break hardlinks if anyone |
36 |
alters a file in a single instance. Or you could use aufs to mount a |
37 |
writeable layer over your common base VM instance? Or you could use one |
38 |
of the filesystems which de-duplicates files in the background (some |
39 |
caveats apply here to avoid memory still being used multiple times in |
40 |
each VM). Or under KVM there is the memory coalescing feature which |
41 |
merges similar code pages (forget it's name?) |
42 |
|
43 |
Personally I think option 3) is quite interesting from the medium number |
44 |
of virtual machines, ie in the 10s to hundreds, ie simply don't worry |
45 |
about it, let the OS do the work. In the hundreds to thousands plus |
46 |
level I guess you have unique challenges and I would be wrong to try and |
47 |
suggest a solution from the comfort of a laptop without having that |
48 |
responsibility, but I would have thought there was some advantage in a |
49 |
very rigidly deployed base OS generated and updated very precisely? |
50 |
|
51 |
|
52 |
> For this we need a different approach (strictly separating build |
53 |
> and production environments). Binary distros (eg. Debian) might |
54 |
> be one option, but they're lacking the configurability and mostly |
55 |
> are still too large. So I'm going a different route using my own |
56 |
> buildsystem - called Briegel - which originally was designed for |
57 |
> embedded/small-device targets. |
58 |
> |
59 |
> For now I didn't have the spare time to port all the packages |
60 |
> required for complete server systems (most of it is making |
61 |
> them all cleanly crosscompile'able, as this is a fundamental |
62 |
> concept of Briegel). But maybe you'd like to join in and try it :) |
63 |
|
64 |
Sounds like an interesting challenge, but I'm unconvinced you can't |
65 |
solve 90% of your problem within the constraints of Gentoo? This saves |
66 |
you a bunch of time that could be invested in the last 10% through more |
67 |
traditional means? |
68 |
|
69 |
|
70 |
>> It does appear like managing large numbers of virtual machines is one |
71 |
>> are that gentoo could score very well? Interested to see any chatter on |
72 |
>> how others solve this problem, or any general advocacy? Probably we |
73 |
>> should start a new thread though... |
74 |
> I'm not sure if Gentoo really is the right distro for that purpose, |
75 |
> as it's targeted to very different systems (i.g. Gentoo boxes are |
76 |
> expected to be quite unique, beginning with different per-package |
77 |
> useflags, even down to cflags, etc). But it might still be a good |
78 |
> basis for building specific system images (let's call them stage5 ;-)) |
79 |
|
80 |
I won't disagree on your "where it's targeted", but just to re-iterate |
81 |
why I think Gentoo works well is that it does have a very workable |
82 |
binary package feature! |
83 |
|
84 |
My way of working is to use (several) shared binary package repos and |
85 |
the guests largely pull from those shared package directories. In fact |
86 |
what I do is have a minimal number of shared "/usr/portage/package" |
87 |
directories and I mount an appropriate one to the guest type at boot |
88 |
time. At the moment my main two options are "32bit" and "64bit" for the |
89 |
package mounts, but I recently introduced a new machine type which is |
90 |
held back to perl 5.8 and that guest gets it's own package mount since |
91 |
it's obviously linking a lot of binaries differently |
92 |
|
93 |
So, my process is to test an update on a small number of guests, either |
94 |
dedicated test guests or less important live guests. If this looks good |
95 |
then I run the upgrade against all other Vms of the same type and they |
96 |
will update quickly from package binaries |
97 |
|
98 |
Now, the icing is that this works extremely well even once you decide to |
99 |
lightly customise machine types. So for example my binary packages are |
100 |
very high level (eg 32/64bit), my "profiles" would be fairly high level, |
101 |
eg I have www-apache and www-nginx base profiles. However, a specific |
102 |
virtual machine running say nginx might itself need a specific PHP |
103 |
application installed, and that itself might need some dependencies, |
104 |
which in turn might require a specific set of customisation of use flags |
105 |
and versions. |
106 |
|
107 |
Now, the neat thing is that the binary upgrade options are *either* to |
108 |
use *only* binary packages, OR to use binary packages *if* they were |
109 |
built with the correct USE flags. So for example I haven't bothered to |
110 |
split out my packages directory to be specific to the nginx/apache |
111 |
machines, however, this causes the PHP package to be regularly rebuilt |
112 |
depending on whether it was last used to upgrade an nginx or apache |
113 |
guest (different use flags needed for each guest). I could fix this |
114 |
easily enough, but it's not a problem for me and it's automatically |
115 |
handled through the portage binary package updates |
116 |
|
117 |
So the end result is that you can make efficient use of binary updates, |
118 |
but portage will still customise the odd package here or there where a |
119 |
local machine requires something which differs from the norm. To my eye |
120 |
this keeps most of the benefits of an RPM/DEB style binary updater, with |
121 |
the flexibility of a per machine, customised USE flag gentoo installation? |
122 |
|
123 |
|
124 |
> An setup for 100 equal webserver vm's could look like this: |
125 |
> |
126 |
> * run a normal Gentoo vm (tailored for the webserver appliance), |
127 |
> where do you do regular updates (emerge, revdep-rebuild, etc, etc) |
128 |
> * from time to time take a snapshot, strip off the buildtime-only |
129 |
> stuff (hmm, could turn out to be a bit tricky ;-o) |
130 |
> * this stripped snapshot now goes into testing vm's |
131 |
> * when approved, the individual production vm's are switched over |
132 |
> to the new image (maybe using some mount magic, unionfs, etc) |
133 |
|
134 |
This could work and perhaps for 100 identical Vms you have enough meat |
135 |
to work on something quite customised anyway? |
136 |
|
137 |
Personally for 20-80 identical VMs running very limited variety of web |
138 |
software I would go for: |
139 |
- Slightly cut down gentoo VM |
140 |
- Hardlinked across all instances OR single installation which is read only |
141 |
- Writeable data areas mounted to their own space (/var/www, /tmp, |
142 |
/home, etc) |
143 |
|
144 |
By separating the data from the OS you have a lot of flexibility to |
145 |
upgrade the base webserver install and mount the data back on the new |
146 |
VM? With linux-vservers or other container style, you will find that |
147 |
the OS shares code segments across all virtual machines (due to all |
148 |
files sharing the same inode) and the memory usage should be much lower |
149 |
and nearer to firing up an instance of the shared app and it then |
150 |
forking (ie data is duplicated, but the code segment is shared) |
151 |
|
152 |
|
153 |
For 100+ Vms I guess I would be looking very strongly at a common |
154 |
read-only OS partition and container style virtualisation |
155 |
|
156 |
For 20-80 near identical VMs, but running a wider variety of web |
157 |
software I would go for the hardlinked option with a straightforward |
158 |
"emerge" upgrade option across them. Hardlinking keeps the memory usage |
159 |
sane where possible, without the pain of trying to keep the base install |
160 |
absolutely identical and read-only to make the common mount option work? |
161 |
|
162 |
|
163 |
> At this point I've got a question for to the other folks here: |
164 |
> |
165 |
> emerge has an --root option which allows to (un)merge in a separate |
166 |
> system image. So it should be possible to unmerge a lot of system |
167 |
> packages which are just required for updating/building (even |
168 |
> portage itself), but this still will be manual - what about |
169 |
> dependency handling ? |
170 |
|
171 |
This is correct. In fact this is how you build a stage 1,2,3 etc and |
172 |
how catalyst works! |
173 |
|
174 |
The information is a bit spread out over several out of date wiki |
175 |
articles, but perhaps start with: |
176 |
http://en.gentoo-wiki.com/wiki/Tiny_Gentoo |
177 |
|
178 |
Roughly speaking you could "freshen" your current installation with |
179 |
(from memory): |
180 |
ROOT="/tmp/new_build" emerge -av world |
181 |
|
182 |
This has minor gremlins when I test it, probably due to some symlinks |
183 |
being created differently if you follow the current catalyst build |
184 |
script through stage 1,2,3 etc, but roughly speaking it does the same |
185 |
thing only jumping straight to the end result and building a completely |
186 |
new identical install to your current OS... |
187 |
|
188 |
Even more special is that you can set an alternative portage source, so |
189 |
if you want to build your new ROOT with alternative make.conf, |
190 |
/etc/portage/*, etc then just put your new files somewhere and set |
191 |
PORTAGE_CONFIGROOT to point to it. Cross compiling is also done through |
192 |
an extension of this basic method |
193 |
|
194 |
So, following your chain of thought - yes it's not too hard to quickly |
195 |
generate a customised base OS installation to use for your future VMs. |
196 |
Further, if you wish you can make those VMs have a reduced or missing |
197 |
toolchain etc. In fact if you google a bit I think you will find some |
198 |
recipes for very minimal VMs using this method where the base VM is a |
199 |
very minimal install... |
200 |
|
201 |
> Is there some way to drop at least parts of the standard system set, |
202 |
> so eg. portage, python, gcc, etc, etc get unmerged by --depclean |
203 |
> if nobody else (in world set) doesn't explicitly require them ? |
204 |
|
205 |
You are almost thinking about it all wrong. ("There is no spoon...") |
206 |
|
207 |
This is gentoo, so at this more advanced level, stop thinking about |
208 |
"standard system set" and instead free your mind to start with |
209 |
"nothing". Go read the old bootstrap from stage 1 instructions, plus |
210 |
the TinyGentoo pages and you can quickly see that Catalyst builds your |
211 |
working installation by starting from a working installation, creating |
212 |
an empty directory, adding some minimal packages to that directory and |
213 |
building up from there. |
214 |
|
215 |
So absolutely nothing stops you from just starting with an empty |
216 |
directory and just emerging a few basic packages into it (couple MB) and |
217 |
then chrooting into it and having some fun... There is *no* minimal |
218 |
package set, you can install whatever you want (as long as it boots). |
219 |
Largely the portage dependency tracker will help you pull in the minimal |
220 |
needed dependencies, but beware that system packages arent generally |
221 |
explicitly tracked so you may stumble across some deps when you are |
222 |
going really basic and omiting standard system packages (just use common |
223 |
sense: it should be fairly obvious if an application requires a compiler |
224 |
and you didn't install one then you have a conflict of interest...) |
225 |
|
226 |
|
227 |
Have another look at gentoo! I definitely believe that it's flexibility |
228 |
to build you highly customised packages, plus strong templating of those |
229 |
packages, plus decent ability to distribute binaries of the end result |
230 |
is a very strong combo! Better binary support is really the only thing |
231 |
missing here, but it's pretty adequate as it stands! |
232 |
|
233 |
Good luck |
234 |
|
235 |
Ed W |