1 |
On Sun, Nov 20, 2005 at 08:51:13PM -0500, St?phane Lacasse wrote: |
2 |
[snip discussion about installing] |
3 |
|
4 |
I've done the cluster system (128 node+ 1 master) in a similar fashion |
5 |
to what you are after. |
6 |
1. PXE-boot install environment for performing installs of both the |
7 |
master and all of the nodes. |
8 |
2. The install environment uses the Gentoo Installer, with the CLI |
9 |
frontend I wrote for the GLI project, and performs complete installs of |
10 |
nodes in under 20 minutes (depending on network traffic). |
11 |
|
12 |
By using GLI, it's a simple matter of altering the install profiles to |
13 |
reconfigure the cluster, and wipe the nodes for changing their purpose |
14 |
(presently we have an MPI mode and a MOSIX mode), some of the cluster |
15 |
users need assurances that none of their data remains on the cluster |
16 |
after they are done, hence being able to reinstall easily. |
17 |
|
18 |
For regular system operation, we specifically left out boot loaders on |
19 |
all machines, as we've hit cases where the MBR is in a state that just |
20 |
hangs the machine instead of going to PXE. By enforcing always PXE, and |
21 |
controlling how it boots via PXE instead, we've had much better |
22 |
responses. |
23 |
|
24 |
The above design also allows re-configuring the cluster into multiple |
25 |
smaller clusters with physical network separation (using VLAN-capable |
26 |
switches). |
27 |
|
28 |
Also, make use of your cluster tools to administer the cluster. OpenPBS |
29 |
allows running a job on all nodes, so use it to emerge -K [package]. |
30 |
(not -k as binpkgs don't currently have any locking in $PKGDIR, and can |
31 |
get corrupted if two emerge processes try to create a binpkg at the |
32 |
same time.) |
33 |
|
34 |
-- |
35 |
Robin Hugh Johnson |
36 |
E-Mail : robbat2@g.o |
37 |
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 |