1 |
We are also seeing something similiar on Dell 1750's running |
2 |
recent 2.4 kernels. Our monitoring software shows that the servers |
3 |
run for 1-6 months straight w/o problem and then suddenly allocate all |
4 |
memory to something and hang, forcing us to use the DRAC's to |
5 |
reboot. |
6 |
|
7 |
This started happening about 9 months ago, before that the same hardware |
8 |
ran for up to 1YR at a time w/o probs. It also seems to happen more often |
9 |
on apache/php/mysql systems. Postfix and tomcat boxes running on the same |
10 |
hardware and kernels have no problems (~300 days current uptime on a few). |
11 |
|
12 |
Regards, |
13 |
Matt |
14 |
|
15 |
--- Original Message--- |
16 |
To: gentoo-server@l.g.o |
17 |
From: Sean Cook <scook@×××××.net> |
18 |
Sent: 4/24/2005 1:30PM |
19 |
Subject: Re: [gentoo-server] Server lockups (still ping) (OT because not Gentoo-specific?) |
20 |
|
21 |
>> Is it a dell 1550 by any chance? |
22 |
>> |
23 |
>> On Sun, 2005-04-24 at 10:43 -0400, Robert Sanders wrote: |
24 |
>> > Casey, |
25 |
>> > |
26 |
>> > We've been seeing issues like this for probably the last year. I was |
27 |
>> > never able to pinpoint it to any action. We implemented remote reboot |
28 |
>> > hardware and called it a day. |
29 |
>> > |
30 |
>> > Some of them had strange activity, but over a larger group of machines I |
31 |
>> > could never find a pattern to it. It almost seems as if it cannot spawn |
32 |
>> > any new processes. |
33 |
>> > |
34 |
>> > I can't help except to say your not alone. |
35 |
>> > |
36 |
>> > Rob |
37 |
>> > |
38 |
>> > Casey Allen Shobe - SeattleServer Mailing Lists wrote: |
39 |
>> > > Hey all, |
40 |
>> > > |
41 |
>> > > We're seeing occasional issues with a bunch of machines we have in a |
42 |
>> > > datacenter, most of which are currently running Gentoo. The machines will |
43 |
>> > > run solid and fine for days, weeks, even months, and then just lock up solid |
44 |
>> > > - the box still pings and an nmap scan shows all the normal ports open, but |
45 |
>> > > nothing responds on any port, nothing shows up in system logs, and the times |
46 |
>> > > we've had console access to a machine at the time, a login prompt would show |
47 |
>> > > up, but it would just hang if you tried to log in. |
48 |
>> > > |
49 |
>> > > This generally indicates hardware issues to me, but it has been happening |
50 |
>> > > across a wide array of both well-tested and new machines. In addition, it |
51 |
>> > > happens on machines that are running Red Hat 7.1 through 9.0 as well as |
52 |
>> > > Gentoo. The problem seems random, and there is almost always close to zero |
53 |
>> > > load on the machine when it locks up (only once were we presently using the |
54 |
>> > > machine, and it locked up while uncompressing a tar file). |
55 |
>> > > |
56 |
>> > > The Gentoo systems use the deadline I/O scheduler as it's deemed the most |
57 |
>> > > reliable, but this has shown up with the default anticipatory I/O scheduler |
58 |
>> > > as well. |
59 |
>> > > |
60 |
>> > > The only common factor seems to be that they are all plugged into a |
61 |
>> > > questionable HP Procurve switch that we've been contemplating replacing. |
62 |
>> > > Would that simply be wasting our time (I don't think a buggy switch should be |
63 |
>> > > able to lock up boxes...)? Any recommendations for what to investigate at |
64 |
>> > > this point? |
65 |
>> > > |
66 |
>> > > Cheers, |
67 |
>> > |
68 |
>> |
69 |
>> -- |
70 |
>> gentoo-server@g.o mailing list |
71 |
>> |
72 |
>> |
73 |
>> |
74 |
|
75 |
|
76 |
-- |
77 |
gentoo-server@g.o mailing list |