1 |
On Sat, Nov 14, 2009 at 5:09 PM, Alex Schuster <wonko@×××××××××.org> wrote: |
2 |
> Alan McKinnon writes: |
3 |
> |
4 |
>> On Saturday 14 November 2009 19:36:06 Alex Schuster wrote: |
5 |
>>> Alan McKinnon wrote: |
6 |
> |
7 |
>>>> clusterssh will let you log into many machines at once and run emerge |
8 |
>>>> -avuND world everywhere |
9 |
>>> This is way cool. I just started using it on eight Fedora servers I am |
10 |
>>> administrating. Nice, now this is an improvement over my 'for $h in |
11 |
>>> $HOSTS; do ssh $h "yum install foo"; done' approach. |
12 |
>> |
13 |
>> I feel your pain :-) |
14 |
>> |
15 |
>> We used to have the same problem adding new admins to 87 machines. Now |
16 |
>> we have a bespoke provisioner that does it all. |
17 |
> |
18 |
> Sorry, I just do not get 'bespoke provisioner'. Some sort of software, |
19 |
> like clusterssh? Or a person, one admin instead of many? |
20 |
> |
21 |
> |
22 |
>>> What do you guys think about using Gentoo for servers? At the institute |
23 |
>>> I partially work we chose Fedora. There is no special reason for that - |
24 |
>>> we already had some Fedora machines, the setup seemed to work, the |
25 |
>>> reputation was good, so we kept it. That was okay for me, why choose |
26 |
>>> many different environments and learn everything again. I mentioned |
27 |
>>> Gentoo, but did not really suggest to actually use it. Maybe I should |
28 |
>>> have. |
29 |
>> |
30 |
>> I'm a huge fan of Gentoo |
31 |
> |
32 |
> Now who would have thought of that! |
33 |
> |
34 |
>> and all my personal machines (except the new netbook have run it for the |
35 |
>> last 5 years. |
36 |
>> |
37 |
>> But I will never install Gentoo on a production server at work. |
38 |
>> |
39 |
>> Why? |
40 |
>> |
41 |
>> Because it is too time consuming, because no two machines are set up the |
42 |
>> same, because I can't trust that other admins used the flags they should |
43 |
>> have. So updates become a case of logging into 80+ machines individually |
44 |
>> and doing emerge world by hand. Gentoo allows you to customize things to |
45 |
>> the nth degree - that is it's strength - so people WILL use this one |
46 |
>> discriminating factor. |
47 |
>> |
48 |
>> If OTOH I had a server farm of 80+ machines, all identical, I'd put |
49 |
>> Gentoo on them in a flash. But I don't have that |
50 |
> |
51 |
> Of our 8 machines, 7 are essentially the same and differ only in hard |
52 |
> drive space and CPU speed. The other machine is Intel, not AMD, and needs |
53 |
> different IDE drivers. At the moment it has a different initrd (I set up a |
54 |
> minimal fedora install to generate it after the cloned system did not |
55 |
> boot), the rest is - apart from some config files - identical. |
56 |
> |
57 |
> So I would make sure that about everything is exactly the same, well, |
58 |
> maybe except for hostnames, udev net-persistent-rules, ssh keys... what |
59 |
> more? |
60 |
> The last, a little different machine is a problem though. With optimized |
61 |
> CFLAGS, this one would have to compile all stuff again, while for the |
62 |
> others I could use binpkgs. Updating them all with clusterssh should not |
63 |
> be much more work than updating a single one. Well, not completely true, I |
64 |
> would have the double work, as I would upgrade one server first to test if |
65 |
> there are problems, and then do it for the others. Maybe I could use the |
66 |
> special machine to test stuff, and then update all the others. |
67 |
> |
68 |
> If they would differ, Gentoo would of course be too much work. I already |
69 |
> have this problem now... there is my desktop machine, my notebook running |
70 |
> a Gentoo VM, a second desktop machine at my other home, the living-room |
71 |
> machine of my flat share, the machine of a fried I also administrate, the |
72 |
> server of my flat share I need to set up again... and clusterssh is no |
73 |
> option here. |
74 |
|
75 |
My potentially ill informed thoughts on the above issues/ideas: |
76 |
|
77 |
1) Pick one machine to host both your make.conf as well as your |
78 |
portage tree and distfiles, potentially splitting them into separate |
79 |
nfs mounts shared out for the rest of the hosts (having the portage |
80 |
tree itself ro on all but its owning machine forces centralization of |
81 |
syncing). |
82 |
|
83 |
2) /etc/make.conf should simply be a symlink to the centrally located |
84 |
copy. If you must use binpackages, set march to something that will |
85 |
run on every machine involved, then set mcpu to whatever machine is |
86 |
most common if you want to get just a bit more performance here or |
87 |
there. If you don't mind compiling on every host, though, set portage |
88 |
niceness to something friendly to your users and march to native (if |
89 |
you plan to use distcc, this is a BAD idea, use the binpackages). |
90 |
|
91 |
3) use a replaceable (otherwise identical to the others, and therefore |
92 |
able to be brought back online by just cloning it over) system for |
93 |
your testing and keep frequent scheduled backups of whichever system |
94 |
plays host to your portage tree, binpackages, and distfiles. |
95 |
|
96 |
4) build your kernel with built in drivers for every piece of |
97 |
boot-time essential hardware in your systems. You'll still be on a far |
98 |
cleaner setup than a mass produced distro provided kernel, you'll only |
99 |
need to maintain one for all your systems, and you'll only have one |
100 |
kernel to worry about building against if you need any out-of-kernel |
101 |
modules as well. |
102 |
|
103 |
5) script the changing of ssh host keys (or even redistribution of |
104 |
them, if you ), removal of persistent net rules, and prompting for the |
105 |
setting of host name and you'll have a nice, tiny, postinstall tool |
106 |
for the rare case in which you need to re-deploy a system. You may |
107 |
wish to restore things like ssh host keys from backups as well, in the |
108 |
case of re-deployment of systems, since changing them means adjusting |
109 |
known hosts lists elsewhere |
110 |
|
111 |
>>> Now I am thinking about a Gentoo installation instead. |
112 |
>>> |
113 |
>>> Pros: |
114 |
>>> - Continuous updates, no downtime for upgrading, only when I decide to |
115 |
>>> install a new kernel. This is really really cool. I fear the upgrade |
116 |
>>> from Fedora 10 to 12 which has to be done soon. |
117 |
>> |
118 |
>> Do not upgrade, especially not with a version jump of 2 or more. If you |
119 |
>> have a lot of machines, I assume you are a decent shop, and that you |
120 |
>> have some form of formal process for upgrades and changes. |
121 |
> |
122 |
> Not really, I think. We are not very professional I must admit. We have |
123 |
> two capable admins, but one is specialized in network stuff and Windows, |
124 |
> the other has to do with our big Sun servers, huuge storage systems and |
125 |
> such. They do not much about the Linux cluster. Another user sometimes |
126 |
> installs a package on a machine, but usually I do this. For me, it is not |
127 |
> my main job, I work only about ten hours per week there, mostly being some |
128 |
> 100 km away. |
129 |
> We are a research institute. We do neurological research, PET and MRI |
130 |
> tomography. The Linux servers do number crunching, and of course they |
131 |
> should work and have good uptimes, but it is not as important as if we |
132 |
> were an ISP. |
133 |
> |
134 |
>> What you do instead is a formal migration - copy the data off, |
135 |
>> reinstall, restore data. |
136 |
> |
137 |
> Advice noted. Yes, this sounds like the better idea, giving a cleaner |
138 |
> setup. And if some things break I do not have to wonder if it was some |
139 |
> strange side effect from the upgrade process. |
140 |
> |
141 |
>> If you can't afford to do that every six or twleve months, then |
142 |
>> I have to ask - what the hell is the organization doing using a distro |
143 |
>> that is unsupported after 12 months? |
144 |
> |
145 |
> Well, I do not think this was considered much. One machine was set up with |
146 |
> Fedora for no specific reason, and we kept this distro then. This does not |
147 |
> sound too professional, I know. BTW, what distro would you suggest? |
148 |
|
149 |
In the times I've used it, while a bit overweight for my tastes in |
150 |
server work, Ubuntu handled updates quite gracefully, but needed |
151 |
reboots somewhat often. You might get the same or better out of |
152 |
Debian, as it's created a little less directly to be destktop centric, |
153 |
while being the source of the package management that gives Ubuntu |
154 |
what advantages it might have for the role. |
155 |
|
156 |
>>> - Some improvement in speed. Those machines do A LOT of |
157 |
>>> numbercrunching, which jobs often lasting for days, so even small |
158 |
>>> improvements would be nice. |
159 |
>> |
160 |
>> Don't fool yourself. Unless you need what Google needs, there is very |
161 |
>> little speed difference between Gentoo and Fedora. I/O improvements you |
162 |
>> need can be easily gotten by fiddling the kernel tuning knobs. |
163 |
> |
164 |
> I know the difference will not be huge, I see this as a little bonus - |
165 |
> nice if is there, but nothing really important. But in the comparison with |
166 |
> Ubuntu that came in a thread a few weeks ago, for some applications the |
167 |
> speed increase was about 30 percent. Although I would not necessarily |
168 |
> expect the difference to be noticeable, I would also not be surprised too |
169 |
> much if it were noticeable for some number-crunching applications if they |
170 |
> were optimized for the CPU. |
171 |
|
172 |
Are the pieces of software you're using for the number crunching work |
173 |
open source, and will you be recompiling those on Gentoo, with all the |
174 |
optimizations, as well? In the long run, if they're not, you'll get |
175 |
far more out of the I/O improvements Alan mentioned than you ever |
176 |
would out of aggressive use of cflags. |
177 |
|
178 |
>>> - Easier debugging. When things do not work, I think it's easier to |
179 |
>>> dig into the problem. No fancy, but sometimes buggy GUIs hiding basic |
180 |
>>> functionality. |
181 |
>> |
182 |
>> Errrrrrrrrrrrrrrrmmmmmmmmmmmmmm, Fedora does not require a GUI :-) |
183 |
> |
184 |
> Right, and now that I think of it I do not use it anyway... Well, I did do |
185 |
> some things with netsetup (or whatever it's called), now that I know the |
186 |
> system a little better I edit things directly in /etc/sysconfig. |
187 |
> But the installer is a GUI, right? And if I remember this correctly, I |
188 |
> cannot even switch to a text console and do stuff there while installing. |
189 |
> Or I could, but did not have utilities like LVM. Something like that. I |
190 |
> have to use the installer and its capabilities. |
191 |
> |
192 |
>>> - Heck, Gentoo is _cooler_ than typical distributions. And emerging |
193 |
>>> with distcc on about 8*4 cores would be fun :) |
194 |
>> |
195 |
>> Can't argue with that. |
196 |
>> |
197 |
>> But that is your ego talking and the machines do not belong to you but |
198 |
>> to the institute. Your ego has no place in that. |
199 |
> |
200 |
> You're right, thanks for the reminder. But also note the smiley. I know my |
201 |
> boss (who is also into geeky things) would also like this - as long as it |
202 |
> would work. |
203 |
|
204 |
If you've a moderately capable system sitting spare, throw virtualbox |
205 |
or similar on it and bring up a few vms to test the setup in (since |
206 |
with that, you can get away with ). My little core 2 here can handle |
207 |
3-4 vms without fussing at all, and that's with |
208 |
|
209 |
>>> - I am probably the only one who can administrate them. |
210 |
>> |
211 |
>> This is not a benefit. It is a severe liability. |
212 |
> |
213 |
> That's why I listed it also on the contra side. Forgot to add a smiley |
214 |
> here, it was not meant seriously. |
215 |
> But when I think about it... the others also do not know much about |
216 |
> Fedora. Not even I do this well. There you use 'yum install <package>', |
217 |
> with Gentoo it's 'emerge <package>'. Daily work would be similar. |
218 |
> Upgrades would be a different thing, though. Gentoo's portage blockers |
219 |
> would not be understood easily, they would prefer to take the servers down |
220 |
> and just install the current Fedora distro. Which hopefully would work. |
221 |
> |
222 |
> |
223 |
>>> Cons: |
224 |
>>> - If something will not work with this not so common |
225 |
>>> (meta)distribution, people will say "always trouble with your Gentoo |
226 |
>>> Schmentoo, it works fine in Fedora". Fedora is more mainstream, if |
227 |
>>> something does not work there, then it's okay for the people to accept |
228 |
>>> it. |
229 |
>> |
230 |
>> Those same people are likely to say the same about linux vs windows. |
231 |
> |
232 |
> Right, but we already have Linux, and we need it for our software. Gentoo |
233 |
> would not really be needed. |
234 |
> |
235 |
>>> - I am probably the only one who can administrate them. I think Gentoo |
236 |
>>> is easier to maintain in the long run, but only when you take the time |
237 |
>>> to learn it. With Fedora, you do not need much more than the 'yum |
238 |
>>> install' command. There is no need to read complicated X.org upgrade |
239 |
>>> guides and such. |
240 |
>>> |
241 |
>>> I think I already made my decision, but I am still interested in your |
242 |
>>> opinions, maybe some of you are in a similar position and like to share |
243 |
>>> your experiences. Whether I will be allowed to use Gentoo is another |
244 |
>>> question, I guess my boss will not like my idea at first, and I am not |
245 |
>>> even sure if he is right. But maybe I can test-install Gentoo on one |
246 |
>>> machine in a chroot, and see if things work fine. |
247 |
>> |
248 |
>> Depends how critical these machines are. If you want to change them just |
249 |
>> because you feel like it, then I do not see how that can possibly be a |
250 |
>> valid reason. |
251 |
>> |
252 |
>> Remember, the institute's needs and desires trump yours every time |
253 |
> |
254 |
> No, it's not just because I feel like it. The main advantages would be: |
255 |
> - No downtime between upgrades. Our jobs run for several days, every |
256 |
> downtime has to be planned in advance. People understand this, but they do |
257 |
> not like it. They would be very happy if this were not longer necessary. |
258 |
> And I would not fear that during the upgrade something breaks, and it |
259 |
> would take me long to fix it. |
260 |
> - I know this distro well, and this is not at all true about Fedora. I |
261 |
> know how to fix problems, I know how things work here. I would feel better |
262 |
> with Gentoo, more competent. It just does not feel so well to administrate |
263 |
> Fedora. |
264 |
> |
265 |
> Thanks for your opinions, Alan. As always. |
266 |
> |
267 |
> Wonko |
268 |
|
269 |
As a final note... whatever path you take in either implementing a new |
270 |
setup or just updating the old one, document it, and especially |
271 |
document guides for upkeep and general maintenance. Your boss, Windows |
272 |
guy, and Sun guy're going to be like fish out of water if you get this |
273 |
whole thing put in place and get hit by a bus the next day. *This* is |
274 |
why the "I'm the only one that can.." bit is such a dangerous thing. |
275 |
It's not the fear that you'll try to use it as a bargaining chip down |
276 |
the road, given that, they'd just take the hit, replace you, and then |
277 |
replace the setup with a better documented one... it's the fear that |
278 |
if for any reason you drop out of the picture for them, they're stuck |
279 |
with the cost of doing that. Period. (This is also why, when actively |
280 |
and intentionally done, it's a fire-able offense in many places) |
281 |
|
282 |
-- |
283 |
Poison [BLX] |
284 |
Joshua M. Murphy |