Gentoo Archives: gentoo-user

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] System reboot
Date:	Sun, 16 Dec 2018 12:18:33
Message-Id:	`CAGfcS_mtq-ZcG0avJyh6XuNYOXE5-AMzd_Uu=eeCRdrx0bgr0A@mail.gmail.com`
In Reply to:	[gentoo-user] System reboot by Dale

1	On Sat, Dec 15, 2018 at 10:54 PM Dale <rdalek1967@×××××.com> wrote:
2	>
3	> I checked the messages log. Before with the memory hogging Dolphin it
4	> had logged the problem. Today, it shows this:
5	>
6	>
7	> Dec 15 20:40:01 fireball CROND[30668]: (root) CMD (/usr/lib64/sa/sa1 1 1)
8	> Dec 15 20:50:01 fireball CROND[1532]: (root) CMD (/usr/lib64/sa/sa1 1 1)
9	> Dec 15 21:00:01 fireball CROND[5513]: (root) CMD (/usr/lib64/sa/sa1 1 1)
10	> Dec 15 21:01:01 fireball CROND[5718]: (root) CMD (run-parts
11	> /etc/cron.hourly)
12	> Dec 15 21:08:34 fireball syslog-ng[4370]: syslog-ng starting up;
13	> version='3.17.2'
14	> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: *** info
15	> [daemon/startup.c(136)]:
16	> Dec 15 21:08:34 fireball /usr/sbin/gpm[4400]: Started gpm successfully.
17	> Entered daemon mode.
18	>
19	>
20	> As you can see, it went from running a normal cron job to me booting
21	> back up. I don't see any error at all. Not even one electron.
22
23	This is pretty typical if you aren't taking special steps to log this
24	sort of thing. There are a couple of ways the kernel can crash:
25
26	1. OOPS/BUG - these are semi-recoverable errors. I believe they can
27	get logged unless they occur in a manner that disrupt your userspace
28	logger, vfs, filesystem, or disk. If the error happens in one of
29	those subsystems then your filesystems will stop syncing and it won't
30	be logged normally.
31
32	2. PANIC - these are unrecoverable and are NOT logged. When the
33	kernel PANICs it halts all disk IO and just about everything else.
34	This is to prevent damage to anything already written on disk. You
35	don't want a corrupt OS trying to write to your disk - that makes a
36	bad situation MUCH worse. It would be like sending a drunk surgeon
37	into the operating room to fix up a trauma patient.
38
39	3. Hardware reset. This isn't a kernel issue but I'll toss it in.
40	If your CPU gets a reset signal it forgets it was ever running linux
41	and starts executing the firmware as if it had been freshly powered
42	on. There is no opportunity to capture anything. Only way to log
43	something like this is hardware-level monitoring.
44
45	Issues #1-2 CAN be logged, but not conventionally. There are few
46	routes for this:
47
48	1. Remote console logging. Serial and network are the two main
49	options for this. If you have a hardware serial port you can capture
50	its output and any kernel errors will be output to these (just the
51	text/backtrace/etc). A network console is very easy to set up if you
52	have a remote host that can run netcat on the same LAN:
53	https://www.kernel.org/doc/Documentation/networking/netconsole.txt
54
55	2. Recovery kernel. Gentoo doesn't have tooling for this but you can
56	follow https://wiki.gentoo.org/wiki/Kernel_Crash_Dumps . Disclaimer -
57	I haven't done this in ages so it could be dated in parts. If the
58	kernel panics then it will run the recovery kernel, which boots in a
59	clean state and dumps the old kernel's RAM to disk for subsequent
60	analysis.
61
62	#1 gets the job done most of the time, but #2 is more thorough. If
63	you have a host that is tending to reset you should consider network
64	logging as a starting point - it is easy to set up.
65
66	I'm not sure why your UPS display is coming on. It might be some kind
67	of spurious data on the USB port if it is connected. It might be a
68	result of something the PC is doing. It is also possible it is due to
69	a brownout or other power issues going into your house, but if your
70	UPS is in good shape and not overloaded then it should be shielding
71	your PC from the effects of these. A PC power supply issue sounds
72	plausible. I've had my CP UPS flicker its display and a light might
73	flicker a bit at the same time, but the PC was unaffected. I'll also
74	note that these kinds of transient issues are often mitigated by
75	having a good quality PC power supply that is not overloaded, and that
76	this probably also helps with any latency in the UPS switching in. If
77	your PC power supply is strained to the point of breaking then any
78	transients in the input supply are going to get through to the output
79	rails. This is one of those areas where spending an extra $30 on your
80	build can make a significant difference.
81
82	--
83	Rich

Replies

Subject	Author
Re: [gentoo-user] System reboot	Dale <rdalek1967@×××××.com>

Report Message

Find on MARC Find on Google Groups