1 |
On Wednesday 17 February 2010 08:49:28 Harry Putnam wrote: |
2 |
> I have caught the freeze in the early stages before completely losing |
3 |
> the network when just mouse and keyboard became unresponsive, was able |
4 |
> to ssh in and noticed that restarting hald held off the freeze for |
5 |
> some (again unspecified) amount of time. |
6 |
> |
7 |
> So cutting the lengthy narrative down a bit, and briefly put, I'm |
8 |
> looking for anything unusual that is causing this. The hdc messages |
9 |
> is the only odd thing I'm seeing. |
10 |
> |
11 |
> Something appears to be jamming up the hal layer somehow, but not |
12 |
> leaving findable tracks. At least not findable by an someone with |
13 |
> many yrs experience with linux but not much real debugging of |
14 |
> complicated problems under his belt. |
15 |
|
16 |
You say the box runs ssh, implying that other hosts are nearby, so what I |
17 |
would suggest is to configure your syslogger to send all logs to another host |
18 |
and have that host write them to a known location. |
19 |
|
20 |
I find that machines that freeze often still send logs to syslog properly |
21 |
right up to the moment of the freeze, but these do not get written to disk as |
22 |
IO is blocked. Then we restart the box, guaranteeing that the logs are lost |
23 |
:-) |
24 |
|
25 |
Remote logging and just leave it till the machine freezes again will hopefully |
26 |
give you the useful logs you need to identify the problem. To save disk space |
27 |
you can configure logrotate on the remote logger to delete the previous days |
28 |
stuff - you don't need logs from days where the box was working fine. |
29 |
|
30 |
Another option is to look at the pattern here: one day out of the blue a |
31 |
stable system developed problems and they still surface at random times. This |
32 |
is one of the characteristics of failing hardware. Have you done a full |
33 |
thorough hardware test, including such things as memtest and smart? |
34 |
|
35 |
-- |
36 |
alan dot mckinnon at gmail dot com |