Gentoo Archives: gentoo-user

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Cc: Dale <rdalek1967@×××××.com>
Subject: Re: [gentoo-user] Random reboots. Where to start?
Date: Fri, 25 Feb 2011 18:08:17
Message-Id: AANLkTimkvV527Nerz-4=WzHDxSD1zewJfoj8HUmbmAUy@mail.gmail.com
In Reply to: [gentoo-user] Random reboots. Where to start? by Dale
1 On Fri, Feb 25, 2011 at 7:33 AM, Dale <rdalek1967@×××××.com> wrote:
2 > Well, I think my machine is possessed or something.  I'm getting random
3 > reboots here.  When it does this, it is like hitting the reset button.  It
4 > is sitting on the grub screen when it does this.  I noticed the first time
5 > the other day and this was before adding the extra memory.  I seemed to be
6 > stable at 4Gbs but I seem to be rebooting at random.  I ran memtest
7 > yesterday, it checked fine.  It didn't find a error but it looked like it
8 > was only testing part of it.  Memtest recognizes all 16Gbs on the last run
9 > but it didn't seem to be testing it all.  Is there a trick to getting it to
10 > test the whole thing?
11 >
12 > This is the last few lines from messages before the reboot:
13 >
14 > Feb 25 05:10:01 localhost cron[5697]: (root) CMD (test -x
15 > /usr/sbin/run-crons && /usr/sbin/run-crons )
16 > Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdb [SAT], SMART Usage
17 > Attribute: 194 Temperature_Celsius changed from 113 to 112
18 > Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
19 > Attribute: 190 Airflow_Temperature_Cel changed from 80 to 78
20 > Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
21 > Attribute: 194 Temperature_Celsius changed from 75 to 74
22 > Feb 25 05:20:01 localhost cron[5850]: (root) CMD (test -x
23 > /usr/sbin/run-crons && /usr/sbin/run-crons )
24 > Feb 25 05:30:01 localhost cron[5994]: (root) CMD (test -x
25 > /usr/sbin/run-crons && /usr/sbin/run-crons )
26 > Feb 25 05:40:01 localhost cron[6136]: (root) CMD (test -x
27 > /usr/sbin/run-crons && /usr/sbin/run-crons )
28 > Feb 25 05:41:49 localhost uptimed: moving up to position 20: 0 days,
29 > 01:27:23
30 > Feb 25 05:44:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
31 > Attribute: 190 Airflow_Temperature_Cel changed from 78 to 77
32 > Feb 25 05:50:01 localhost cron[6284]: (root) CMD (test -x
33 > /usr/sbin/run-crons && /usr/sbin/run-crons )
34 > Feb 25 05:59:01 localhost cron[6413]: (root) CMD (rm -f
35 > /var/spool/cron/lastrun/cron.hourly)
36 > Feb 25 06:00:01 localhost cron[6429]: (root) CMD (test -x
37 > /usr/sbin/run-crons && /usr/sbin/run-crons )
38 > Feb 25 06:10:01 localhost cron[6573]: (root) CMD (test -x
39 > /usr/sbin/run-crons && /usr/sbin/run-crons )
40 > Feb 25 06:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
41 > Attribute: 190 Airflow_Temperature_Cel changed from 77 to 76
42 > Feb 25 06:20:01 localhost cron[6722]: (root) CMD (test -x
43 > /usr/sbin/run-crons && /usr/sbin/run-crons )
44 > Feb 25 06:30:01 localhost cron[6865]: (root) CMD (test -x
45 > /usr/sbin/run-crons && /usr/sbin/run-crons )
46 > Feb 25 06:40:01 localhost cron[7008]: (root) CMD (test -x
47 > /usr/sbin/run-crons && /usr/sbin/run-crons )
48 > Feb 25 06:50:01 localhost cron[7156]: (root) CMD (test -x
49 > /usr/sbin/run-crons && /usr/sbin/run-crons )
50 > Feb 25 06:59:01 localhost cron[7286]: (root) CMD (rm -f
51 > /var/spool/cron/lastrun/cron.hourly)
52 > Feb 25 07:00:01 localhost cron[7301]: (root) CMD (test -x
53 > /usr/sbin/run-crons && /usr/sbin/run-crons )
54 > Feb 25 07:10:01 localhost cron[7444]: (root) CMD (test -x
55 > /usr/sbin/run-crons && /usr/sbin/run-crons )
56 > Feb 25 07:20:01 localhost cron[7592]: (root) CMD (test -x
57 > /usr/sbin/run-crons && /usr/sbin/run-crons )
58 > Feb 25 07:30:01 localhost cron[7741]: (root) CMD (test -x
59 > /usr/sbin/run-crons && /usr/sbin/run-crons )
60 > Feb 25 07:40:01 localhost cron[7884]: (root) CMD (test -x
61 > /usr/sbin/run-crons && /usr/sbin/run-crons )
62 > Feb 25 07:42:49 localhost uptimed: moving up to position 19: 0 days,
63 > 03:28:23
64 > Feb 25 07:50:01 localhost cron[8032]: (root) CMD (test -x
65 > /usr/sbin/run-crons && /usr/sbin/run-crons )
66 >
67 > I don't see anything out of the norm, do you?  What else should I check?  I
68 > have a Gigabyte mobo, anything in the BIOS I should check?  After I added
69 > the last two sticks of ram, I loaded the optimized settings.  No
70 > overclocking or anything here.
71 >
72 > It does this while logged into KDE and after running a while.  I have shut
73 > down folding and the CPU is running below 85F and all the fans are running
74 > fine.  I don't think this could be a heat issue.  It's a Cooler Master HAF
75 > 932 case with lots of cooling.
76 >
77 > I'm going to reboot and let memtest run a while and see exactly what it was
78 > that makes me think it is not testing ALL the memory.
79 >
80 > Thanks.
81 >
82 > Dale
83 >
84 > :-)  :-)
85
86 Is folding pretty CPU intensive? If it is then possibly shut that off
87 completely until you find the root cause. Additional CPU heating can
88 cause higher temps all through the machine. If you have a broken trace
89 somewhere that only comes apart when the motherboard heats up, etc.
90
91 The order I walk through this sort of problem is:
92
93 1) Google, Google, Google for your exact hardware looking for similar
94 problems. (and hopefully solutions...) The main culprits are
95 generally:
96 - Motherboard
97 - Power supply
98 - VGA
99
100 2) Unlikely if this is your new machine but use some canned air and
101 blow out all heat sinks if they have collected dust.
102
103 3) Remove _ALL_ adapter cards and any external devices that you don't
104 absolutely need for testing. Run for a number of hours or days.
105
106 If you are still rebooting then consider changing your power supply
107 first. What sort of supply are you using now? Does it have _more_ than
108 power for your machine?
109
110 I hope you find it soon. This can be very frustrating. (From experience...)
111
112 Good luck,
113 Mark