Re: [gentoo-dev] avoiding urgent stabilizations - gentoo-dev

From:	Ed W <lists@××××××××××.com>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] avoiding urgent stabilizations
Date:	Sat, 26 Feb 2011 12:22:21
Message-Id:	`4D68E7E1.1050109@wildgooses.com`
In Reply to:	Re: [gentoo-dev] avoiding urgent stabilizations by Enrico Weigelt

1

Hi

2

3

> But, for me, even a trimmed-down Gentoo is still too large

4

> (has to contain the whole base packages, from portage to

5

> toolchain, includes, etc). I'd prefer having only the essential

6

> runtime stuff within the containers.

7

8

I'm just building some embedded devices on the side using gentoo and my 

9

minimal builds are only a few MB? Curious why you feel you need to move 

10

from Gentoo to get the size smaller?

11

12

Seems like your complaint is that you have gentoo installs which are 

13

full featured with a toolchain and portage, which you are comparing to 

14

an installation you built with a different tool that doesn't have a 

15

toolchain installed?  However, you can do the same using gentoo if you 

16

wish? (you just need a lightweight package installer to avoid installing 

17

portage)

18

19

I think your main options are:

20

21

1) Build your base images without a toolchain or portage and use a 

22

minimal package installer to install pre-built binary packages.  This 

23

seems fraught with issues long term though...

24

25

2) Build your base images without a toolchain, but with portage (and 

26

perhaps a very minimal python). This gives you full dependency tracking 

27

and obviously bind mount/nfs mount the actual portage tree to avoid 

28

space used there. This seems workable and minimal?

29

30

3) If we are talking virtual machines then who cares if your containers 

31

are individually quite large, if the files in them are duplicated across 

32

all containers?  Simply use an appropriate de-duplication strategy to 

33

coalesce the space and most of the disadvantages disappear?  eg 

34

linux-vserver you can simply hardlink all the common files across your 

35

installations and allow the COW patch to break hardlinks if anyone 

36

alters a file in a single instance. Or you could use aufs to mount a 

37

writeable layer over your common base VM instance?  Or you could use one 

38

of the filesystems which de-duplicates files in the background (some 

39

caveats apply here to avoid memory still being used multiple times in 

40

each VM).  Or under KVM there is the memory coalescing feature which 

41

merges similar code pages (forget it's name?)

42

43

Personally I think option 3) is quite interesting from the medium number 

44

of virtual machines, ie in the 10s to hundreds, ie simply don't worry 

45

about it, let the OS do the work.  In the hundreds to thousands plus 

46

level I guess you have unique challenges and I would be wrong to try and 

47

suggest a solution from the comfort of a laptop without having that 

48

responsibility, but I would have thought there was some advantage in a 

49

very rigidly deployed base OS generated and updated very precisely?

50

51

52

> For this we need a different approach (strictly separating build

53

> and production environments). Binary distros (eg. Debian) might

54

> be one option, but they're lacking the configurability and mostly

55

> are still too large. So I'm going a different route using my own

56

> buildsystem - called Briegel - which originally was designed for

57

> embedded/small-device targets.

58

>

59

> For now I didn't have the spare time to port all the packages

60

> required for complete server systems (most of it is making

61

> them all cleanly crosscompile'able, as this is a fundamental

62

> concept of Briegel). But maybe you'd like to join in and try it :)

63

64

Sounds like an interesting challenge, but I'm unconvinced you can't 

65

solve 90% of your problem within the constraints of Gentoo? This saves 

66

you a bunch of time that could be invested in the last 10% through more 

67

traditional means?

68

69

70

>> It does appear like managing large numbers of virtual machines is one

71

>> are that gentoo could score very well?  Interested to see any chatter on

72

>> how others solve this problem, or any general advocacy?  Probably we

73

>> should start a new thread though...

74

> I'm not sure if Gentoo really is the right distro for that purpose,

75

> as it's targeted to very different systems (i.g. Gentoo boxes are

76

> expected to be quite unique, beginning with different per-package

77

> useflags, even down to cflags, etc). But it might still be a good

78

> basis for building specific system images (let's call them stage5 ;-))

79

80

I won't disagree on your "where it's targeted", but just to re-iterate 

81

why I think Gentoo works well is that it does have a very workable 

82

binary package feature!

83

84

My way of working is to use (several) shared binary package repos and 

85

the guests largely pull from those shared package directories.  In fact 

86

what I do is have a minimal number of shared "/usr/portage/package" 

87

directories and I mount an appropriate one to the guest type at boot 

88

time.  At the moment my main two options are "32bit" and "64bit" for the 

89

package mounts, but I recently introduced a new machine type which is 

90

held back to perl 5.8 and that guest gets it's own package mount since 

91

it's obviously linking a lot of binaries differently

92

93

So, my process is to test an update on a small number of guests, either 

94

dedicated test guests or less important live guests.  If this looks good 

95

then I run the upgrade against all other Vms of the same type and they 

96

will update quickly from package binaries

97

98

Now, the icing is that this works extremely well even once you decide to 

99

lightly customise machine types.  So for example my binary packages are 

100

very high level (eg 32/64bit), my "profiles" would be fairly high level, 

101

eg I have www-apache and www-nginx base profiles.  However, a specific 

102

virtual machine running say nginx might itself need a specific PHP 

103

application installed, and that itself might need some dependencies, 

104

which in turn might require a specific set of customisation of use flags 

105

and versions.

106

107

Now, the neat thing is that the binary upgrade options are *either* to 

108

use *only* binary packages, OR to use binary packages *if* they were 

109

built with the correct USE flags. So for example I haven't bothered to 

110

split out my packages directory to be specific to the nginx/apache 

111

machines, however, this causes the PHP package to be regularly rebuilt 

112

depending on whether it was last used to upgrade an nginx or apache 

113

guest (different use flags needed for each guest).  I could fix this 

114

easily enough, but it's not a problem for me and it's automatically 

115

handled through the portage binary package updates

116

117

So the end result is that you can make efficient use of binary updates, 

118

but portage will still customise the odd package here or there where a 

119

local machine requires something which differs from the norm.  To my eye 

120

this keeps most of the benefits of an RPM/DEB style binary updater, with 

121

the flexibility of a per machine, customised USE flag gentoo installation?

122

123

124

> An setup for 100 equal webserver vm's could look like this:

125

>

126

> * run a normal Gentoo vm (tailored for the webserver appliance),

127

>    where do you do regular updates (emerge, revdep-rebuild, etc, etc)

128

> * from time to time take a snapshot, strip off the buildtime-only

129

>    stuff (hmm, could turn out to be a bit tricky ;-o)

130

> * this stripped snapshot now goes into testing vm's

131

> * when approved, the individual production vm's are switched over

132

>    to the new image (maybe using some mount magic, unionfs, etc)

133

134

This could work and perhaps for 100 identical Vms you have enough meat 

135

to work on something quite customised anyway?

136

137

Personally for 20-80 identical VMs running very limited variety of web 

138

software I would go for:

139

- Slightly cut down gentoo VM

140

- Hardlinked across all instances OR single installation which is read only

141

- Writeable data areas mounted to their own space (/var/www, /tmp, 

142

/home, etc)

143

144

By separating the data from the OS you have a lot of flexibility to 

145

upgrade the base webserver install and mount the data back on the new 

146

VM?  With linux-vservers or other container style, you will find that 

147

the OS shares code segments across all virtual machines (due to all 

148

files sharing the same inode) and the memory usage should be much lower 

149

and nearer to firing up an instance of the shared app and it then 

150

forking (ie data is duplicated, but the code segment is shared)

151

152

153

For 100+ Vms I guess I would be looking very strongly at a common 

154

read-only OS partition and container style virtualisation

155

156

For 20-80 near identical VMs, but running a wider variety of web 

157

software I would go for the hardlinked option with a straightforward 

158

"emerge" upgrade option across them.  Hardlinking keeps the memory usage 

159

sane where possible, without the pain of trying to keep the base install 

160

absolutely identical and read-only to make the common mount option work?

161

162

163

> At this point I've got a question for to the other folks here:

164

>

165

> emerge has an --root option which allows to (un)merge in a separate

166

> system image. So it should be possible to unmerge a lot of system

167

> packages which are just required for updating/building (even

168

> portage itself), but this still will be manual - what about

169

> dependency handling ?

170

171

This is correct.  In fact this is how you build a stage 1,2,3 etc and 

172

how catalyst works!

173

174

The information is a bit spread out over several out of date wiki 

175

articles, but perhaps start with:

176

     http://en.gentoo-wiki.com/wiki/Tiny_Gentoo

177

178

Roughly speaking you could "freshen" your current installation with 

179

(from memory):

180

     ROOT="/tmp/new_build" emerge -av world

181

182

This has minor gremlins when I test it, probably due to some symlinks 

183

being created differently if you follow the current catalyst build 

184

script through stage 1,2,3 etc, but roughly speaking it does the same 

185

thing only jumping straight to the end result and building a completely 

186

new identical install to your current OS...

187

188

Even more special is that you can set an alternative portage source, so 

189

if you want to build your new ROOT with alternative make.conf, 

190

/etc/portage/*, etc then just put your new files somewhere and set 

191

PORTAGE_CONFIGROOT to point to it.  Cross compiling is also done through 

192

an extension of this basic method

193

194

So, following your chain of thought - yes it's not too hard to quickly 

195

generate a customised base OS installation to use for your future VMs.  

196

Further, if you wish you can make those VMs have a reduced or missing 

197

toolchain etc.  In fact if you google a bit I think you will find some 

198

recipes for very minimal VMs using this method where the base VM is a 

199

very minimal install...

200

201

> Is there some way to drop at least parts of the standard system set,

202

> so eg. portage, python, gcc, etc, etc get unmerged by --depclean

203

> if nobody else (in world set) doesn't explicitly require them ?

204

205

You are almost thinking about it all wrong.  ("There is no spoon...")

206

207

This is gentoo, so at this more advanced level, stop thinking about 

208

"standard system set" and instead free your mind to start with 

209

"nothing".  Go read the old bootstrap from stage 1 instructions, plus 

210

the TinyGentoo pages and you can quickly see that Catalyst builds your 

211

working installation by starting from a working installation, creating 

212

an empty directory, adding some minimal packages to that directory and 

213

building up from there.

214

215

So absolutely nothing stops you from just starting with an empty 

216

directory and just emerging a few basic packages into it (couple MB) and 

217

then chrooting into it and having some fun...  There is *no* minimal 

218

package set, you can install whatever you want (as long as it boots). 

219

Largely the portage dependency tracker will help you pull in the minimal 

220

needed dependencies, but beware that system packages arent generally 

221

explicitly tracked so you may stumble across some deps when you are 

222

going really basic and omiting standard system packages (just use common 

223

sense: it should be fairly obvious if an application requires a compiler 

224

and you didn't install one then you have a conflict of interest...)

225

226

227

Have another look at gentoo!  I definitely believe that it's flexibility 

228

to build you highly customised packages, plus strong templating of those 

229

packages, plus decent ability to distribute binaries of the end result 

230

is a very strong combo!  Better binary support is really the only thing 

231

missing here, but it's pretty adequate as it stands!

232

233

Good luck

234

235

Ed W

Gentoo Archives: gentoo-dev

Replies

1	Hi
2
3	> But, for me, even a trimmed-down Gentoo is still too large
4	> (has to contain the whole base packages, from portage to
5	> toolchain, includes, etc). I'd prefer having only the essential
6	> runtime stuff within the containers.
7
8	I'm just building some embedded devices on the side using gentoo and my
9	minimal builds are only a few MB? Curious why you feel you need to move
10	from Gentoo to get the size smaller?
11
12	Seems like your complaint is that you have gentoo installs which are
13	full featured with a toolchain and portage, which you are comparing to
14	an installation you built with a different tool that doesn't have a
15	toolchain installed? However, you can do the same using gentoo if you
16	wish? (you just need a lightweight package installer to avoid installing
17	portage)
18
19	I think your main options are:
20
21	1) Build your base images without a toolchain or portage and use a
22	minimal package installer to install pre-built binary packages. This
23	seems fraught with issues long term though...
24
25	2) Build your base images without a toolchain, but with portage (and
26	perhaps a very minimal python). This gives you full dependency tracking
27	and obviously bind mount/nfs mount the actual portage tree to avoid
28	space used there. This seems workable and minimal?
29
30	3) If we are talking virtual machines then who cares if your containers
31	are individually quite large, if the files in them are duplicated across
32	all containers? Simply use an appropriate de-duplication strategy to
33	coalesce the space and most of the disadvantages disappear? eg
34	linux-vserver you can simply hardlink all the common files across your
35	installations and allow the COW patch to break hardlinks if anyone
36	alters a file in a single instance. Or you could use aufs to mount a
37	writeable layer over your common base VM instance? Or you could use one
38	of the filesystems which de-duplicates files in the background (some
39	caveats apply here to avoid memory still being used multiple times in
40	each VM). Or under KVM there is the memory coalescing feature which
41	merges similar code pages (forget it's name?)
42
43	Personally I think option 3) is quite interesting from the medium number
44	of virtual machines, ie in the 10s to hundreds, ie simply don't worry
45	about it, let the OS do the work. In the hundreds to thousands plus
46	level I guess you have unique challenges and I would be wrong to try and
47	suggest a solution from the comfort of a laptop without having that
48	responsibility, but I would have thought there was some advantage in a
49	very rigidly deployed base OS generated and updated very precisely?
50
51
52	> For this we need a different approach (strictly separating build
53	> and production environments). Binary distros (eg. Debian) might
54	> be one option, but they're lacking the configurability and mostly
55	> are still too large. So I'm going a different route using my own
56	> buildsystem - called Briegel - which originally was designed for
57	> embedded/small-device targets.
58	>
59	> For now I didn't have the spare time to port all the packages
60	> required for complete server systems (most of it is making
61	> them all cleanly crosscompile'able, as this is a fundamental
62	> concept of Briegel). But maybe you'd like to join in and try it :)
63
64	Sounds like an interesting challenge, but I'm unconvinced you can't
65	solve 90% of your problem within the constraints of Gentoo? This saves
66	you a bunch of time that could be invested in the last 10% through more
67	traditional means?
68
69
70	>> It does appear like managing large numbers of virtual machines is one
71	>> are that gentoo could score very well? Interested to see any chatter on
72	>> how others solve this problem, or any general advocacy? Probably we
73	>> should start a new thread though...
74	> I'm not sure if Gentoo really is the right distro for that purpose,
75	> as it's targeted to very different systems (i.g. Gentoo boxes are
76	> expected to be quite unique, beginning with different per-package
77	> useflags, even down to cflags, etc). But it might still be a good
78	> basis for building specific system images (let's call them stage5 ;-))
79
80	I won't disagree on your "where it's targeted", but just to re-iterate
81	why I think Gentoo works well is that it does have a very workable
82	binary package feature!
83
84	My way of working is to use (several) shared binary package repos and
85	the guests largely pull from those shared package directories. In fact
86	what I do is have a minimal number of shared "/usr/portage/package"
87	directories and I mount an appropriate one to the guest type at boot
88	time. At the moment my main two options are "32bit" and "64bit" for the
89	package mounts, but I recently introduced a new machine type which is
90	held back to perl 5.8 and that guest gets it's own package mount since
91	it's obviously linking a lot of binaries differently
92
93	So, my process is to test an update on a small number of guests, either
94	dedicated test guests or less important live guests. If this looks good
95	then I run the upgrade against all other Vms of the same type and they
96	will update quickly from package binaries
97
98	Now, the icing is that this works extremely well even once you decide to
99	lightly customise machine types. So for example my binary packages are
100	very high level (eg 32/64bit), my "profiles" would be fairly high level,
101	eg I have www-apache and www-nginx base profiles. However, a specific
102	virtual machine running say nginx might itself need a specific PHP
103	application installed, and that itself might need some dependencies,
104	which in turn might require a specific set of customisation of use flags
105	and versions.
106
107	Now, the neat thing is that the binary upgrade options are either to
108	use only binary packages, OR to use binary packages if they were
109	built with the correct USE flags. So for example I haven't bothered to
110	split out my packages directory to be specific to the nginx/apache
111	machines, however, this causes the PHP package to be regularly rebuilt
112	depending on whether it was last used to upgrade an nginx or apache
113	guest (different use flags needed for each guest). I could fix this
114	easily enough, but it's not a problem for me and it's automatically
115	handled through the portage binary package updates
116
117	So the end result is that you can make efficient use of binary updates,
118	but portage will still customise the odd package here or there where a
119	local machine requires something which differs from the norm. To my eye
120	this keeps most of the benefits of an RPM/DEB style binary updater, with
121	the flexibility of a per machine, customised USE flag gentoo installation?
122
123
124	> An setup for 100 equal webserver vm's could look like this:
125	>
126	> * run a normal Gentoo vm (tailored for the webserver appliance),
127	> where do you do regular updates (emerge, revdep-rebuild, etc, etc)
128	> * from time to time take a snapshot, strip off the buildtime-only
129	> stuff (hmm, could turn out to be a bit tricky ;-o)
130	> * this stripped snapshot now goes into testing vm's
131	> * when approved, the individual production vm's are switched over
132	> to the new image (maybe using some mount magic, unionfs, etc)
133
134	This could work and perhaps for 100 identical Vms you have enough meat
135	to work on something quite customised anyway?
136
137	Personally for 20-80 identical VMs running very limited variety of web
138	software I would go for:
139	- Slightly cut down gentoo VM
140	- Hardlinked across all instances OR single installation which is read only
141	- Writeable data areas mounted to their own space (/var/www, /tmp,
142	/home, etc)
143
144	By separating the data from the OS you have a lot of flexibility to
145	upgrade the base webserver install and mount the data back on the new
146	VM? With linux-vservers or other container style, you will find that
147	the OS shares code segments across all virtual machines (due to all
148	files sharing the same inode) and the memory usage should be much lower
149	and nearer to firing up an instance of the shared app and it then
150	forking (ie data is duplicated, but the code segment is shared)
151
152
153	For 100+ Vms I guess I would be looking very strongly at a common
154	read-only OS partition and container style virtualisation
155
156	For 20-80 near identical VMs, but running a wider variety of web
157	software I would go for the hardlinked option with a straightforward
158	"emerge" upgrade option across them. Hardlinking keeps the memory usage
159	sane where possible, without the pain of trying to keep the base install
160	absolutely identical and read-only to make the common mount option work?
161
162
163	> At this point I've got a question for to the other folks here:
164	>
165	> emerge has an --root option which allows to (un)merge in a separate
166	> system image. So it should be possible to unmerge a lot of system
167	> packages which are just required for updating/building (even
168	> portage itself), but this still will be manual - what about
169	> dependency handling ?
170
171	This is correct. In fact this is how you build a stage 1,2,3 etc and
172	how catalyst works!
173
174	The information is a bit spread out over several out of date wiki
175	articles, but perhaps start with:
176	http://en.gentoo-wiki.com/wiki/Tiny_Gentoo
177
178	Roughly speaking you could "freshen" your current installation with
179	(from memory):
180	ROOT="/tmp/new_build" emerge -av world
181
182	This has minor gremlins when I test it, probably due to some symlinks
183	being created differently if you follow the current catalyst build
184	script through stage 1,2,3 etc, but roughly speaking it does the same
185	thing only jumping straight to the end result and building a completely
186	new identical install to your current OS...
187
188	Even more special is that you can set an alternative portage source, so
189	if you want to build your new ROOT with alternative make.conf,
190	/etc/portage/*, etc then just put your new files somewhere and set
191	PORTAGE_CONFIGROOT to point to it. Cross compiling is also done through
192	an extension of this basic method
193
194	So, following your chain of thought - yes it's not too hard to quickly
195	generate a customised base OS installation to use for your future VMs.
196	Further, if you wish you can make those VMs have a reduced or missing
197	toolchain etc. In fact if you google a bit I think you will find some
198	recipes for very minimal VMs using this method where the base VM is a
199	very minimal install...
200
201	> Is there some way to drop at least parts of the standard system set,
202	> so eg. portage, python, gcc, etc, etc get unmerged by --depclean
203	> if nobody else (in world set) doesn't explicitly require them ?
204
205	You are almost thinking about it all wrong. ("There is no spoon...")
206
207	This is gentoo, so at this more advanced level, stop thinking about
208	"standard system set" and instead free your mind to start with
209	"nothing". Go read the old bootstrap from stage 1 instructions, plus
210	the TinyGentoo pages and you can quickly see that Catalyst builds your
211	working installation by starting from a working installation, creating
212	an empty directory, adding some minimal packages to that directory and
213	building up from there.
214
215	So absolutely nothing stops you from just starting with an empty
216	directory and just emerging a few basic packages into it (couple MB) and
217	then chrooting into it and having some fun... There is no minimal
218	package set, you can install whatever you want (as long as it boots).
219	Largely the portage dependency tracker will help you pull in the minimal
220	needed dependencies, but beware that system packages arent generally
221	explicitly tracked so you may stumble across some deps when you are
222	going really basic and omiting standard system packages (just use common
223	sense: it should be fairly obvious if an application requires a compiler
224	and you didn't install one then you have a conflict of interest...)
225
226
227	Have another look at gentoo! I definitely believe that it's flexibility
228	to build you highly customised packages, plus strong templating of those
229	packages, plus decent ability to distribute binaries of the end result
230	is a very strong combo! Better binary support is really the only thing
231	missing here, but it's pretty adequate as it stands!
232
233	Good luck
234
235	Ed W