1 |
On Thursday 25 May 2006 14:13, "Bruno Lustosa" <bruno.lists@×××××.com> |
2 |
wrote about '[gentoo-user] Linux Cluster': |
3 |
> - Distributed filesystem, so that all machines can share the same |
4 |
> filesystem. Something like RAID-over-ethernet. |
5 |
|
6 |
You probably want RH's GFS (there are probably other cluster-aware |
7 |
filesystems available for linux that I'm not aware of) and some sort of |
8 |
external storage that allows you to hook two machines to it. You might |
9 |
also look into multipathing, that would help in case of a cable failure. |
10 |
|
11 |
For maximum availability, you want your enclosure to have two scsi disk |
12 |
controllers, each with two separate scsi ports (these ports are on |
13 |
different chains). You'll hook each of the two computers into cluster to |
14 |
one port on each controller and then use multipathing to tell linux both |
15 |
scsi paths are the same device. You'll have a second external storage |
16 |
connected the same way and software use software mirroring. Then, |
17 |
partition the mirror set (you could also partition at the external |
18 |
storage, but then you have to update the partitions on each storage) and |
19 |
lay GFS down. |
20 |
|
21 |
At this point, you don't lose connectivity to your storage if a cable, an |
22 |
hba, an enclosure, a controller, or a computer goes down. Of course, the |
23 |
controllers will handle RAID 5 or RAID 6 so you won't lose even a single |
24 |
path in case of HD failure. GFS should allow concurrent access -- |
25 |
possibly even with multiple r/w mounts. ext2/3, jfs, xfs, reiserfs, and |
26 |
even reiser4 are not cluster aware so they will only work properly in the |
27 |
configuration with multiple r/o mounts *OR* a single r/w mount. |
28 |
|
29 |
> - Load balancing. Tasks should migrate between nodes. |
30 |
|
31 |
HP's ServiceGuard for linux is the only software I know that will do this |
32 |
(for this *sure* there are other commerical solutions), and there is still |
33 |
some small amount of downtime when a task migrates, so they aren't |
34 |
automatically generated. |
35 |
|
36 |
Also, some software (IIRC, WebLogic) is able to exist in a clustered |
37 |
environment with some method to sync state across individual nodes |
38 |
(possibly using the clustered FS) so that instead of |
39 |
jobs/packages/daemons/tasks migrating it just runs on all nodes all the |
40 |
time. |
41 |
|
42 |
The second option (a cluster-aware program) is usually preferable, because |
43 |
the program itself is better at determining what state needs to be shared, |
44 |
so you get less intra-node communication and less downtime in case a node |
45 |
fails. *However*, an external failover/load-balancer may either be your |
46 |
only solution (if you are already attached to a certain, non-cluster-aware |
47 |
program) or provide better behavior in the case the program is buggy |
48 |
(especially if it's failure mode corrupts and/or brings down other nodes). |
49 |
|
50 |
> - Redundancy, so that the death of a machine doesn't take the cluster |
51 |
> or any processes down. |
52 |
|
53 |
I believe there's a userland implementation of the CARP protocol that may |
54 |
work for linux. It allows 2 (or more) machines on the same network to |
55 |
share an IP and failover and/or load-balance handling packets directed to |
56 |
that IP. |
57 |
|
58 |
> So, anyone doing linux clusters? |
59 |
|
60 |
Not personally, but I was looking into them some during my last job. |
61 |
(Trying to get a customer to switch to linux.) |
62 |
|
63 |
-- |
64 |
"If there's one thing we've established over the years, |
65 |
it's that the vast majority of our users don't have the slightest |
66 |
clue what's best for them in terms of package stability." |
67 |
-- Gentoo Developer Ciaran McCreesh |