Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] spontaneous reboots.. what to look for
Date: Mon, 16 Feb 2009 00:27:50
Message-Id: 4998B2AF.4070509@gmail.com
In Reply to: Re: [gentoo-user] spontaneous reboots.. what to look for by Mark Knecht
1 Mark Knecht wrote:
2 > On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam <reader@×××××××.com> wrote:
3 >
4 >> I've been experiencing spontaneous reboots on one gentoo machine
5 >> lately. Looking thru /var/log/messages... I see the restarts but
6 >> looking above that... I'm not seeing anything I recognize as being a
7 >> culprit.
8 >>
9 >> Its been happening for a few weeks... but I've been busy and only now
10 >> digging into it ( The machine is no kind of server ).
11 >>
12 >> It appears to only happen in X (I'm using xfce4) and I've only noticed
13 >> it since I started running 2.6.28 kernels. Although I couldn't say
14 >> that it seemed to be directly related.
15 >>
16 >> I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
17 >> rebooting.
18 >>
19 >> It does not appear to be heat realated... but I am only now using
20 >> lm_sensors to keep an accurate record and see if there appears to be a
21 >> relationship.
22 >>
23 >> I've had two today so either its happening more often or I'm just
24 >> spending more time on that machine.
25 >>
26 >> It may also be on the first or second time its happened while I as
27 >> actually right at the keyboard.
28 >>
29 >> I'm sorry to be so vague about it, but in truth, I've been pretty lazy
30 >> about it... since no real harm comes of an unexpected reboot on that
31 >> machine (so far anyway). But clearly something that has to be figured
32 >> out.
33 >>
34 >> The only things I've checked so far...
35 >> 1) browsing thru /var/log/messages (Having trouble recognizing any
36 >> thing that looks suspicious.
37 >>
38 >> I have noticed what appears to be a time/date anomaly where the
39 >> progression of time is suddenly irregular. That is, an earlier
40 >> time shows up amongst some later times.
41 >>
42 >> It appears to have been me sudoing to visudo. And apparently
43 >> having /etc/sudoers open long enough for the closing of it to be
44 >> earlier than other events taking place.
45 >>
46 >> Again ... I'm not real sure exactly what happened there but it
47 >> does not appear to coincide with a reboot anyway.
48 >>
49 >> 2) checking how hot the cpu is getting (Doesn't appear to be a
50 >> problem) But now running a cron job recording temperatures every 10
51 >> minutes. So that may turn up something.
52 >>
53 >> 3) checking for overfilled disks. (none show in df -h)
54 >>
55 >>
56 >
57 > Reseat memory and PCI cards, etc. Consider removing for a period of
58 > time any hardware not absolutely necessary to debug the problem. (I.e.
59 > - second video card, extra disk drives, extra network adapters, etc.)
60 > Run memtest86 for a few days if you can spare the machine. Run
61 > spinrite, etc., to look for drive problems. Open the box up and place
62 > a fan blowing extra air for additional cooling.
63 >
64 > good luck,
65 > Mark
66 >
67 >
68 >
69
70 To add another test. I had this issue once before and it was a faulty
71 driver for my hard drives. I ran a command like this to test mine:
72
73 hdparm -Tt /dev/hda && hdparm -Tt /dev/hda && hdparm -Tt /dev/hda &&
74 hdparm -Tt /dev/hda && hdparm -Tt /dev/hda
75
76 If it can pass that then it should be all right and you can look
77 elsewhere. Mine would only fail when the drives were very busy and that
78 test should do that pretty good.
79
80 Hope that helps.
81
82 Dale
83
84 :-) :-)