1 |
Stroller writes: |
2 |
|
3 |
> On 21 Aug 2010, at 14:25, Alex Schuster wrote: |
4 |
> > ... |
5 |
> > I want to monitor the power status of my hard drives, so I wrote a |
6 |
> > little |
7 |
> > script that gives me this output: |
8 |
> > |
9 |
> > sda: standby |
10 |
> > sdb: standby |
11 |
> > sdc: active/idle 32°C |
12 |
> > sdd: active/idle 37°C |
13 |
> > |
14 |
> > This script is called every minute via an fcron entry, output goes |
15 |
> > into a log file, and I use the file monitor plasmoid to watch this log |
16 |
> > file in KDE. |
17 |
> > |
18 |
> > It's working fine, but also monitor my syslog in another file |
19 |
> > monitor plamoid, and now I get lots of these entries: |
20 |
> > |
21 |
> > Aug 21 14:21:06 [fcron] pam_unix(fcron:session): session opened for |
22 |
> > user root by (uid=0) |
23 |
> > Aug 21 14:21:06 [fcron] Job /usr/local/sbin/hdstate >> /var/log/ |
24 |
> > hdstate started for user root (pid 24483) |
25 |
> > Aug 21 14:21:08 [fcron] Job /usr/local/sbin/hdstate >> /var/log/ |
26 |
> > hdstate completed |
27 |
> > Aug 21 14:21:08 [fcron] pam_unix(fcron:session): session closed for |
28 |
> > user root |
29 |
> |
30 |
> #!/bin/bash |
31 |
> while true |
32 |
> do |
33 |
> for drive in a b c d |
34 |
> do |
35 |
> /usr/sbin/smartctl /dev/sd$drive --whatever >> /var/log/hdstate |
36 |
> done |
37 |
> sleep 60 |
38 |
> done |
39 |
|
40 |
I use hdparm and hddtemp: |
41 |
|
42 |
for hd in sda sdb sdc sdd |
43 |
do |
44 |
str=$( /sbin/hdparm -C /dev/$hd ) |
45 |
state=${str##*is: } |
46 |
if [[ $state == active/idle ]] && [[ $hd =~ sd[c] ]] |
47 |
then |
48 |
temp=$( /usr/sbin/hddtemp -q /dev/$hd ) |
49 |
temp=${temp% or *} |
50 |
temp=${temp##* } |
51 |
else |
52 |
temp= |
53 |
fi |
54 |
echo "$hd: $state $temp" |
55 |
done |
56 |
|
57 |
Unfortunately, reading the temperature makes a drive in standby spin up, |
58 |
and prevents automatic spindown after a while of idle time. So now I ask |
59 |
for the temperature only on my system drive, the others should sleep most |
60 |
of the time anyway. |
61 |
|
62 |
|
63 |
> I would personally update more often than this, and my concern would |
64 |
> be that if the process fails then your plasmoid isn't showing the |
65 |
> correct data. |
66 |
> |
67 |
> I presume this is the same with your current setup: if cron dies then |
68 |
> the current temperature will not be read to file, and the plasmoid |
69 |
> will continue reading the last lines in /var/log/hdstate - the drive |
70 |
> can overheat without you knowing about it. |
71 |
|
72 |
Nah, it's really not that important for me. I show the temperature just |
73 |
for the fun of it, and for extreme temperatures I have smartd running, see |
74 |
below. |
75 |
I'm more interested in the active/standby state. I just added two old |
76 |
additonal IDE drives for additional backups, and I want them to be silent |
77 |
most of the time. So I wrote a little script to show the status so I see |
78 |
when they spin up again (and they do this sometimes), and used fcron to |
79 |
get the data into a log file that the plasmoids shows. |
80 |
|
81 |
The problem with cron is that I get those cron logs I do not like, and |
82 |
that the update time of 60 seconds is a little long. Running the script in |
83 |
a loop, started in .kde4/Autostart, would be better, but as a user I have |
84 |
no permission to call hdparm or hdtemp. I do not want to be part of the |
85 |
disk group, and when using sudo I would get the logs by sudo I wanted to |
86 |
avoid. So now I SUID'ed hdparm and hddtemp, changed the group to wheel and |
87 |
disabled execution for others. cron problem not solved, but workarounded. |
88 |
|
89 |
|
90 |
> So I would expect there to be a better "plasmid" for this task. I'm |
91 |
> completely unfamiliar with plasmids, but what you really want is a |
92 |
> plasmid that itself runs a script and displays the stdout on your |
93 |
> screen. That way if there's no data, or an error, then _you see that |
94 |
> in the plasmid_, instead of silently ignoring it (as you may be at |
95 |
> present). |
96 |
> |
97 |
> The easiest (but dumb) way to handle this is to add the date to your |
98 |
> plasmid's display so that at least you can see that something's wrong |
99 |
> if it doesn't match the clock. A better way is not to have to watch a |
100 |
> status monitor at all, and just have a script running that emails you |
101 |
> if the temperature is above a specified range. |
102 |
|
103 |
I have smartd running, which should send me mails about such things. For |
104 |
each drive, I have a line like this in /etc/smartd.conf: |
105 |
|
106 |
/dev/sdc -a -n standby -o on -S on -W 5,40,45 \ |
107 |
-s (S/../.././12|L/../../06/06) -m root@×××××××××.org |
108 |
|
109 |
This does some regular health checks on the drive, when it is not in |
110 |
standby mode. Temperature changes of more than 5 degrees and temperatures |
111 |
of 40 degrees or more are logged. I will receive an email when the |
112 |
temperature reaches 45 degrees, or when it reaches a new maximum. The |
113 |
maximum values are preserved across boot cycles (option -S). Every day at |
114 |
12:00, a short self test is scheduled, and a long self test each sunday on |
115 |
06:00. |
116 |
|
117 |
Wonko |