1 |
Hey all, |
2 |
|
3 |
We're seeing occasional issues with a bunch of machines we have in a |
4 |
datacenter, most of which are currently running Gentoo. The machines will |
5 |
run solid and fine for days, weeks, even months, and then just lock up solid |
6 |
- the box still pings and an nmap scan shows all the normal ports open, but |
7 |
nothing responds on any port, nothing shows up in system logs, and the times |
8 |
we've had console access to a machine at the time, a login prompt would show |
9 |
up, but it would just hang if you tried to log in. |
10 |
|
11 |
This generally indicates hardware issues to me, but it has been happening |
12 |
across a wide array of both well-tested and new machines. In addition, it |
13 |
happens on machines that are running Red Hat 7.1 through 9.0 as well as |
14 |
Gentoo. The problem seems random, and there is almost always close to zero |
15 |
load on the machine when it locks up (only once were we presently using the |
16 |
machine, and it locked up while uncompressing a tar file). |
17 |
|
18 |
The Gentoo systems use the deadline I/O scheduler as it's deemed the most |
19 |
reliable, but this has shown up with the default anticipatory I/O scheduler |
20 |
as well. |
21 |
|
22 |
The only common factor seems to be that they are all plugged into a |
23 |
questionable HP Procurve switch that we've been contemplating replacing. |
24 |
Would that simply be wasting our time (I don't think a buggy switch should be |
25 |
able to lock up boxes...)? Any recommendations for what to investigate at |
26 |
this point? |
27 |
|
28 |
Cheers, |
29 |
-- |
30 |
Casey Allen Shobe | SeattleServer, Inc. |
31 |
cshobe@×××××××××××××.com | cell 425-443-4653 |
32 |
http://www.seattleserver.com |
33 |
-- |
34 |
gentoo-server@g.o mailing list |