Gentoo Archives: gentoo-server

From: Matthew Marlowe <matt@×××××××××××.net>
To: gentoo-server@l.g.o
Subject: Re: Re: [gentoo-server] Server lockups (still ping) (OT because not Gentoo-specific?)
Date: Mon, 25 Apr 2005 00:20:42
Message-Id: 20050425002038.2330AF5A3C@mail.deploylinux.net
1 We are also seeing something similiar on Dell 1750's running
2 recent 2.4 kernels. Our monitoring software shows that the servers
3 run for 1-6 months straight w/o problem and then suddenly allocate all
4 memory to something and hang, forcing us to use the DRAC's to
5 reboot.
6
7 This started happening about 9 months ago, before that the same hardware
8 ran for up to 1YR at a time w/o probs. It also seems to happen more often
9 on apache/php/mysql systems. Postfix and tomcat boxes running on the same
10 hardware and kernels have no problems (~300 days current uptime on a few).
11
12 Regards,
13 Matt
14
15 --- Original Message---
16 To: gentoo-server@l.g.o
17 From: Sean Cook <scook@×××××.net>
18 Sent: 4/24/2005 1:30PM
19 Subject: Re: [gentoo-server] Server lockups (still ping) (OT because not Gentoo-specific?)
20
21 >> Is it a dell 1550 by any chance?
22 >>
23 >> On Sun, 2005-04-24 at 10:43 -0400, Robert Sanders wrote:
24 >> > Casey,
25 >> >
26 >> > We've been seeing issues like this for probably the last year. I was
27 >> > never able to pinpoint it to any action. We implemented remote reboot
28 >> > hardware and called it a day.
29 >> >
30 >> > Some of them had strange activity, but over a larger group of machines I
31 >> > could never find a pattern to it. It almost seems as if it cannot spawn
32 >> > any new processes.
33 >> >
34 >> > I can't help except to say your not alone.
35 >> >
36 >> > Rob
37 >> >
38 >> > Casey Allen Shobe - SeattleServer Mailing Lists wrote:
39 >> > > Hey all,
40 >> > >
41 >> > > We're seeing occasional issues with a bunch of machines we have in a
42 >> > > datacenter, most of which are currently running Gentoo. The machines will
43 >> > > run solid and fine for days, weeks, even months, and then just lock up solid
44 >> > > - the box still pings and an nmap scan shows all the normal ports open, but
45 >> > > nothing responds on any port, nothing shows up in system logs, and the times
46 >> > > we've had console access to a machine at the time, a login prompt would show
47 >> > > up, but it would just hang if you tried to log in.
48 >> > >
49 >> > > This generally indicates hardware issues to me, but it has been happening
50 >> > > across a wide array of both well-tested and new machines. In addition, it
51 >> > > happens on machines that are running Red Hat 7.1 through 9.0 as well as
52 >> > > Gentoo. The problem seems random, and there is almost always close to zero
53 >> > > load on the machine when it locks up (only once were we presently using the
54 >> > > machine, and it locked up while uncompressing a tar file).
55 >> > >
56 >> > > The Gentoo systems use the deadline I/O scheduler as it's deemed the most
57 >> > > reliable, but this has shown up with the default anticipatory I/O scheduler
58 >> > > as well.
59 >> > >
60 >> > > The only common factor seems to be that they are all plugged into a
61 >> > > questionable HP Procurve switch that we've been contemplating replacing.
62 >> > > Would that simply be wasting our time (I don't think a buggy switch should be
63 >> > > able to lock up boxes...)? Any recommendations for what to investigate at
64 >> > > this point?
65 >> > >
66 >> > > Cheers,
67 >> >
68 >>
69 >> --
70 >> gentoo-server@g.o mailing list
71 >>
72 >>
73 >>
74
75
76 --
77 gentoo-server@g.o mailing list