1 |
On Sat, Dec 15, 2018 at 10:54 PM Dale <rdalek1967@×××××.com> wrote: |
2 |
> |
3 |
> I checked the messages log. Before with the memory hogging Dolphin it |
4 |
> had logged the problem. Today, it shows this: |
5 |
> |
6 |
> |
7 |
> Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
8 |
> Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
9 |
> Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1) |
10 |
> Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts |
11 |
> /etc/cron.hourly) |
12 |
> Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up; |
13 |
> version='3.17.2' |
14 |
> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info |
15 |
> [daemon/startup.c(136)]: |
16 |
> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully. |
17 |
> Entered daemon mode. |
18 |
> |
19 |
> |
20 |
> As you can see, it went from running a normal cron job to me booting |
21 |
> back up. I don't see any error at all. Not even one electron. |
22 |
|
23 |
This is pretty typical if you aren't taking special steps to log this |
24 |
sort of thing. There are a couple of ways the kernel can crash: |
25 |
|
26 |
1. OOPS/BUG - these are semi-recoverable errors. I believe they can |
27 |
get logged unless they occur in a manner that disrupt your userspace |
28 |
logger, vfs, filesystem, or disk. If the error happens in one of |
29 |
those subsystems then your filesystems will stop syncing and it won't |
30 |
be logged normally. |
31 |
|
32 |
2. PANIC - these are unrecoverable and are NOT logged. When the |
33 |
kernel PANICs it halts all disk IO and just about everything else. |
34 |
This is to prevent damage to anything already written on disk. You |
35 |
don't want a corrupt OS trying to write to your disk - that makes a |
36 |
bad situation MUCH worse. It would be like sending a drunk surgeon |
37 |
into the operating room to fix up a trauma patient. |
38 |
|
39 |
3. Hardware reset. This isn't a kernel issue but I'll toss it in. |
40 |
If your CPU gets a reset signal it forgets it was ever running linux |
41 |
and starts executing the firmware as if it had been freshly powered |
42 |
on. There is no opportunity to capture anything. Only way to log |
43 |
something like this is hardware-level monitoring. |
44 |
|
45 |
Issues #1-2 CAN be logged, but not conventionally. There are few |
46 |
routes for this: |
47 |
|
48 |
1. Remote console logging. Serial and network are the two main |
49 |
options for this. If you have a hardware serial port you can capture |
50 |
its output and any kernel errors will be output to these (just the |
51 |
text/backtrace/etc). A network console is very easy to set up if you |
52 |
have a remote host that can run netcat on the same LAN: |
53 |
https://www.kernel.org/doc/Documentation/networking/netconsole.txt |
54 |
|
55 |
2. Recovery kernel. Gentoo doesn't have tooling for this but you can |
56 |
follow https://wiki.gentoo.org/wiki/Kernel_Crash_Dumps . Disclaimer - |
57 |
I haven't done this in ages so it could be dated in parts. If the |
58 |
kernel panics then it will run the recovery kernel, which boots in a |
59 |
clean state and dumps the old kernel's RAM to disk for subsequent |
60 |
analysis. |
61 |
|
62 |
#1 gets the job done most of the time, but #2 is more thorough. If |
63 |
you have a host that is tending to reset you should consider network |
64 |
logging as a starting point - it is easy to set up. |
65 |
|
66 |
I'm not sure why your UPS display is coming on. It might be some kind |
67 |
of spurious data on the USB port if it is connected. It might be a |
68 |
result of something the PC is doing. It is also possible it is due to |
69 |
a brownout or other power issues going into your house, but if your |
70 |
UPS is in good shape and not overloaded then it should be shielding |
71 |
your PC from the effects of these. A PC power supply issue sounds |
72 |
plausible. I've had my CP UPS flicker its display and a light might |
73 |
flicker a bit at the same time, but the PC was unaffected. I'll also |
74 |
note that these kinds of transient issues are often mitigated by |
75 |
having a good quality PC power supply that is not overloaded, and that |
76 |
this probably also helps with any latency in the UPS switching in. If |
77 |
your PC power supply is strained to the point of breaking then any |
78 |
transients in the input supply are going to get through to the output |
79 |
rails. This is one of those areas where spending an extra $30 on your |
80 |
build can make a significant difference. |
81 |
|
82 |
-- |
83 |
Rich |