1 |
Do you have CONFIG_CPU_FREQ defined in your kernel config? |
2 |
|
3 |
I have an HP laptop where I have seen similar behavior. After dealing with |
4 |
it for some time, I tracked it down to a problem with changing the cpu's |
5 |
frequency. For a very small period after the clock is changed, the thermal |
6 |
sensor reads back nonsense. I've seen readings like "69... 69... 95... |
7 |
70..." and that's with 0.5 second sampling. I've found 2 workarounds: |
8 |
|
9 |
1) The quick and easy way: |
10 |
/etc/init.d/powernowd stop |
11 |
Now, build x.org |
12 |
/etc/init.d/powernowd start |
13 |
|
14 |
Of course you'll need to replace powernowd with what ever power |
15 |
management daemon you have emerged. |
16 |
|
17 |
2) The uglier, but potentially more useful fix: |
18 |
Save this as thermal.diff: |
19 |
----------------------------------------------------------------------------------------- |
20 |
--- orig/drivers/acpi/thermal.c 2005-07-07 22:37:42.000000000 -0400 |
21 |
+++ new/drivers/acpi/thermal.c 2005-06-15 18:30:43.000000000 -0400 |
22 |
@@ -61,7 +61,8 @@ |
23 |
#define ACPI_THERMAL_MODE_ACTIVE 0x00 |
24 |
#define ACPI_THERMAL_MODE_PASSIVE 0x01 |
25 |
#define ACPI_THERMAL_MODE_CRITICAL 0xff |
26 |
-#define ACPI_THERMAL_PATH_POWEROFF "/sbin/poweroff" |
27 |
+//#define ACPI_THERMAL_PATH_POWEROFF "/sbin/poweroff" |
28 |
+#define ACPI_THERMAL_PATH_POWEROFF "/sbin/overheat" |
29 |
|
30 |
#define ACPI_THERMAL_MAX_ACTIVE 10 |
31 |
#define ACPI_THERMAL_MAX_LIMIT_STR_LEN 65 |
32 |
----------------------------------------------------------------------------------------- |
33 |
Patch the kernel by cd'ing to /usr/src/linux and typing: |
34 |
patch -p1 < <path-to>/thermal.diff |
35 |
|
36 |
This will cause the kernel to call /sbin/overheat instead of |
37 |
/sbin/powerdown if your laptop hits a critical temperature. |
38 |
Save this as /sbin/overheat: |
39 |
----------------------------------------------------------------------------------------- |
40 |
#!/bin/bash |
41 |
|
42 |
POWER_MGT_COMMAND=/etc/init.d/powernowd |
43 |
|
44 |
if ${POWER_MGT_COMMAND} status > /dev/null ; then |
45 |
${POWER_MGT_COMMAND} stop |
46 |
|
47 |
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq \ |
48 |
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed |
49 |
echo -n 0 > /proc/acpi/thermal_zone/THRM/cooling_mode |
50 |
( |
51 |
echo System switched to low power mode for cooling |
52 |
cat /proc/acpi/thermal_zone/THRM/temperature |
53 |
) | wall |
54 |
fi |
55 |
----------------------------------------------------------------------------------------- |
56 |
Make /sbin/overheat executable by typing: |
57 |
chmod 755 /sbin/overheat |
58 |
|
59 |
Now, when the thermal sensor reports crazy values, my laptop just |
60 |
slows way down instead of completely stopping. |
61 |
|
62 |
On my todo list: |
63 |
o After the temperature comes down, reenable power management |
64 |
o If the temperature does not come down in a reasonable period, |
65 |
then shut it down. |
66 |
o A better patch that takes into account cpufreq changes and |
67 |
disable the thermal faults for a few ms after a frequency change. I need to |
68 |
get a better idea of how long the sensor gives erroneous readings. |
69 |
|
70 |
dcm |
71 |
|
72 |
On 12/12/05, Mariusz Pękala <skoot@××.pl> wrote: |
73 |
> |
74 |
> > El Domingo, 11 de Diciembre de 2005 11:42, C. Beamer escribió: |
75 |
> > > My issue is this: The computer powered off in the middle of the |
76 |
> install |
77 |
> > > of xorg-x11. This has happened a couple of times. I haven't been |
78 |
> > > having problems with the laptop, so I'm pretty sure the issue has |
79 |
> > > something to do with power management since I built power management |
80 |
> > > into the kernel, but didn't emerge acpid. Anyway, since the emerge of |
81 |
> > > xorg-x11 has bombed a couple of times, is there anything that I should |
82 |
> > > do in the way of clean up before trying to emerge it again? |
83 |
> > > Colleen |
84 |
> |
85 |
> > On 2005-12-11 17:32:46 +0100 (Sun, Dec), Rafael Fernández López wrote: |
86 |
> > I can't find any sense at that issue: I can't understand what's the |
87 |
> reason |
88 |
> > that make your computer turn off in a compilation. |
89 |
> > |
90 |
> > Well... I'm afraid of temperature. I hope that's not the reason, but is |
91 |
> the |
92 |
> > first thing that came to my mind. Maybe in your laptop (I've an Amilo |
93 |
> Fujitsu |
94 |
> > Siemens, and when compiling OO or KDE it is really hot), when it reachs |
95 |
> some |
96 |
> > temperature it turns off because of security reasons. |
97 |
> > |
98 |
> > I cannot find any other reason. |
99 |
> |
100 |
> I vote for temperature issues too. That is my experience with some |
101 |
> Aristo laptop - it get very hot very easily and powers off when |
102 |
> temperature exceeds 85 C. |
103 |
> |
104 |
> You may try to run something like this while emerging: |
105 |
> # while sleep 5 ; do cat /proc/acpi/thermal_zone/THM0/temperature >> |
106 |
> /tmp/temper ; done & |
107 |
> |
108 |
> and hope that part of that file will survive the poweroff - you will see |
109 |
> whether temperature was raising before end. |
110 |
> |
111 |
> Or you may put something like: |
112 |
> ... do cat /proc/acp..... | tee -a /tmp/temper ; done & |
113 |
> in background in the session in which emerge runs and observe the |
114 |
> temperature between compilation lines. |
115 |
> |
116 |
> The exact path to temperature file may differ, it will be something like |
117 |
> /proc/acpi/thermal_zone/*/temperature - and it will exist only if your |
118 |
> kernel has necessary drivers compiled (or modules inserted). |
119 |
> |
120 |
> The /proc/acpi/thermal_zone/*/temperture file has about 30 bytes, |
121 |
> 35 thousands of copies makes 1MB file, so you loop may run for 9 |
122 |
> hours if storing one copy every second or 48 hours if appending one copy |
123 |
> every 5 seconds. |
124 |
> |
125 |
> HTH. |
126 |
> |
127 |
> -- |
128 |
> No virus found in this outgoing message. |
129 |
> Checked by 'grep -i virus $MESSAGE' |
130 |
> Trust me. |
131 |
> |
132 |
> |
133 |
> |