1 |
Rich Freeman wrote: |
2 |
> On Sat, Dec 15, 2018 at 10:54 PM Dale <rdalek1967@×××××.com> wrote: |
3 |
>> I checked the messages log. Before with the memory hogging Dolphin it |
4 |
>> had logged the problem. Today, it shows this: |
5 |
>> |
6 |
>> |
7 |
>> Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
8 |
>> Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
9 |
>> Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
10 |
>> Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts |
11 |
>> /etc/cron.hourly) |
12 |
>> Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up; |
13 |
>> version='3.17.2' |
14 |
>> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info |
15 |
>> [daemon/startup.c(136)]: |
16 |
>> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully. |
17 |
>> Entered daemon mode. |
18 |
>> |
19 |
>> |
20 |
>> As you can see, it went from running a normal cron job to me booting |
21 |
>> back up. I don't see any error at all. Not even one electron. |
22 |
> This is pretty typical if you aren't taking special steps to log this |
23 |
> sort of thing. There are a couple of ways the kernel can crash: |
24 |
> |
25 |
> 1. OOPS/BUG - these are semi-recoverable errors. I believe they can |
26 |
> get logged unless they occur in a manner that disrupt your userspace |
27 |
> logger, vfs, filesystem, or disk. If the error happens in one of |
28 |
> those subsystems then your filesystems will stop syncing and it won't |
29 |
> be logged normally. |
30 |
> |
31 |
> 2. PANIC - these are unrecoverable and are NOT logged. When the |
32 |
> kernel PANICs it halts all disk IO and just about everything else. |
33 |
> This is to prevent damage to anything already written on disk. You |
34 |
> don't want a corrupt OS trying to write to your disk - that makes a |
35 |
> bad situation MUCH worse. It would be like sending a drunk surgeon |
36 |
> into the operating room to fix up a trauma patient. |
37 |
> |
38 |
> 3. Hardware reset. This isn't a kernel issue but I'll toss it in. |
39 |
> If your CPU gets a reset signal it forgets it was ever running linux |
40 |
> and starts executing the firmware as if it had been freshly powered |
41 |
> on. There is no opportunity to capture anything. Only way to log |
42 |
> something like this is hardware-level monitoring. |
43 |
> |
44 |
> Issues #1-2 CAN be logged, but not conventionally. There are few |
45 |
> routes for this: |
46 |
> |
47 |
> 1. Remote console logging. Serial and network are the two main |
48 |
> options for this. If you have a hardware serial port you can capture |
49 |
> its output and any kernel errors will be output to these (just the |
50 |
> text/backtrace/etc). A network console is very easy to set up if you |
51 |
> have a remote host that can run netcat on the same LAN: |
52 |
> https://www.kernel.org/doc/Documentation/networking/netconsole.txt |
53 |
> |
54 |
> 2. Recovery kernel. Gentoo doesn't have tooling for this but you can |
55 |
> follow https://wiki.gentoo.org/wiki/Kernel_Crash_Dumps . Disclaimer - |
56 |
> I haven't done this in ages so it could be dated in parts. If the |
57 |
> kernel panics then it will run the recovery kernel, which boots in a |
58 |
> clean state and dumps the old kernel's RAM to disk for subsequent |
59 |
> analysis. |
60 |
> |
61 |
> #1 gets the job done most of the time, but #2 is more thorough. If |
62 |
> you have a host that is tending to reset you should consider network |
63 |
> logging as a starting point - it is easy to set up. |
64 |
> |
65 |
> I'm not sure why your UPS display is coming on. It might be some kind |
66 |
> of spurious data on the USB port if it is connected. It might be a |
67 |
> result of something the PC is doing. It is also possible it is due to |
68 |
> a brownout or other power issues going into your house, but if your |
69 |
> UPS is in good shape and not overloaded then it should be shielding |
70 |
> your PC from the effects of these. A PC power supply issue sounds |
71 |
> plausible. I've had my CP UPS flicker its display and a light might |
72 |
> flicker a bit at the same time, but the PC was unaffected. I'll also |
73 |
> note that these kinds of transient issues are often mitigated by |
74 |
> having a good quality PC power supply that is not overloaded, and that |
75 |
> this probably also helps with any latency in the UPS switching in. If |
76 |
> your PC power supply is strained to the point of breaking then any |
77 |
> transients in the input supply are going to get through to the output |
78 |
> rails. This is one of those areas where spending an extra $30 on your |
79 |
> build can make a significant difference. |
80 |
> |
81 |
|
82 |
|
83 |
I've seen kernel panics in the past. Keep in mind, different panics can |
84 |
behave differently but in the past, I got a console type screen with |
85 |
some weird error messages. Those are what I usually see. This tho, it |
86 |
was as if the power off button was pushed and held down. The system |
87 |
didn't reboot, it powered off. I was asleep but it did beep, which is |
88 |
what woke me up. Generally in the past when I've seen something like |
89 |
this, it either goes to the console and sits there until I hit reset or |
90 |
just reboots. This is the first time I've seen my system poweroff like |
91 |
this. This is what has me curious. |
92 |
|
93 |
My BIOS is set to remain off in the event of a power failure, which |
94 |
shouldn't reach it with a power glitch or even short term power outage |
95 |
due to the UPS. However, if power fails and it does a shut off, it is |
96 |
set to remain off. This is what makes me think power supply. It's not |
97 |
that old but that doesn't rule it out either. I've read about bad out |
98 |
of the box units before. Thing is, that is what it sort of acts like. |
99 |
|
100 |
My power supply is a 650 watt unit. It can power over twice what I |
101 |
pull. When I built this rig, it could power up three times what I |
102 |
pull. Keep in mind, I'm measuring not only the puter but also the |
103 |
monitor, speakers, modem and router as well. That wattage is from the |
104 |
UPS itself. I try to allow for a lot of head room power wise to |
105 |
compensate for that turn on surge when several drives and fans are |
106 |
spinning up. I've got five hard drives, three 230MM fans and a 140MM |
107 |
fan just for the case. Then comes the CPU etc. I haven't calculated |
108 |
the surge or anything but I figure it is a good bit more than what it |
109 |
pulls when already running. |
110 |
|
111 |
It may be that this has to happen a few times to see if anything can be |
112 |
narrowed down. Maybe it will do it while I'm sitting at it next time |
113 |
and I can see from start to finish what it is doing. May help, may |
114 |
not. One reason for the thread, tips on what to look for. A good tip |
115 |
could come in handy. ;-) Plus, I thought there may be another log I |
116 |
wasn't aware of to look at. |
117 |
|
118 |
Thanks. Gives me things to think on. |
119 |
|
120 |
Dale |
121 |
|
122 |
:-) :-) |