Gentoo Archives: gentoo-server

From: fire-eyes <sgtphou@×××××××××.org>
To: gentoo-server@l.g.o
Subject: [gentoo-server] Server goes down twice in two days, looking for input
Date: Thu, 22 Sep 2005 14:36:53
Message-Id: 4332C0DE.5060503@fire-eyes.org
1 Hi there!
2
3 I have had our gentoo server go down twice in under two days. I am
4 currently trying to figure out what is happening.
5
6 Facts:
7 - Dual PIII 933 MHz system (ServerWorks OSB4)
8 - 3.5GB RAM
9 - 2.6.11.2-grsec-20050614 kernel (self rolled)
10 - SCSI: Adaptec AIC-7892P, 32MB cache
11 + Disks
12 + For Operating System
13 - 2x IBM DDYS-T09170N SCSI U160 10KRPM 9.1GB in a RAID1, 1x of the
14 same for hotspare
15 + For storage etc
16 - 3x IBM IC35L036UWD210-0 SCSSI U160 10KRPM
17 - 1x IBM DDYS-T36950N SCSI U160 10KRPM
18 - In a RAID5
19
20 Tuesday afternoon, I was informed that there might be problems with this
21 server. I had just been working on it via shell. I went back, and found
22 it unresponsive.
23
24 I went into the server room, only to catch it ending a reboot and being
25 almost totally back up. It behaved the rest of the day. I was not able
26 to find any indications of problems in the logs.
27
28 Wednesday evening, I was again working on the system via ssh, and it
29 stopped responding. I got into the server room fast enough this time. I
30 tried to log in as root, and could not. I could type the username, but
31 upon hitting enter, nothing happened. That was true for any console.
32
33 I have syslogd output *.* to console 10, so flipping over there, I saw
34 nothing out of the ordinary. The last long, at the time I noticed it
35 stop responding, was a simple run-of-the-mill firewall log.
36
37 After a few more minutes, the system was completely unresponsive, save
38 for SysReq. I Synced, tErmed, Synced again, remounted everything
39 read-only and forced it to reboot.
40
41 Again I was not able to find any logs indicating any errors at all.
42
43 The only two possibilities I see is that I was goofing with samba at
44 various points, both days. However, samba was not running at either time
45 the system went down.
46
47 The other, more interesting one, is that at both times when the system
48 went down, I was creating a tar.bz2 out of a kernel source. The problems
49 happened well after I had started them.
50
51 Wondering about disks, I threw smartctl -a at both of the arrays (sda ,
52 sdb), which didn't give anything out of the ordinary.
53
54 However when I run smartctl -t offline or -t short or -t long on sda or
55 sdb, it immediately fails on STDOUT. This I find odd, because I have
56 done these tests in the past. Granted it was on a different kernel,
57 which I no longer have around.
58
59 Here is an example:
60
61 # smartctl -t short /dev/sda
62 smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
63 Home page is http://smartmontools.sourceforge.net/
64
65 Short Background Self Test Failed
66
67 Looking at logs, I don't see anything strange. Including dmesg.
68
69 I am worried by the smartctl results, however I realize there is a small
70 possibility that it's due to kernel changes.
71
72 Any ideas out there? Thank you for reading this! I *LOVE* Gentoo in
73 production.
74 --
75 gentoo-server@g.o mailing list

Replies