1 |
On 3/17/20 10:14 AM, Rich Freeman wrote: |
2 |
> On Tue, Mar 17, 2020 at 1:59 AM <tuxic@××××××.de> wrote: |
3 |
|
4 |
> Finally, ALL DRIVES FAIL. It doesn't matter what the underlying |
5 |
> storage technology is. I've seen hard drives fail in less than a |
6 |
> year, with the warranty replacement drive failing less than a year |
7 |
> after that. I think next warranty replacement (still in the original |
8 |
> warranty period) lasted 5+ years of near-continuous use. The typical |
9 |
> failure modes of hard drives and solid state storage are different, |
10 |
> but they all fail. You can't perfectly predict WHEN they will fail |
11 |
> either. Most drives have SMART and sometimes it can detect failure |
12 |
> conditions before failure, but not always. |
13 |
|
14 |
|
15 |
Hello Rich, et al. |
16 |
|
17 |
I have deleted most, because I agree with the thread details, you get |
18 |
what you pay for, but excess payment is rarely rewarded... |
19 |
|
20 |
|
21 |
HEAT is the enemy of all electronics and mechanical things, computer |
22 |
drives/memory are no exception. There are a myriad of interfaces/codes |
23 |
on modern motherboards, and quite a few on legacy motherboards that |
24 |
track heat. Some are not very accurate, but most, are reasonable. |
25 |
|
26 |
Hopefully, you kept your mobo book. A section somewhere talks about |
27 |
temperature sensors. If the cpu is loaded, the drives are most likely |
28 |
getting hot. If the fans are running on a relatively high speed, the |
29 |
system is generating tons of heat. If the GPU(s) are running ho9t, the |
30 |
drives are hot. tools that scan the hardware for sensors are great, use |
31 |
them! |
32 |
|
33 |
|
34 |
I now install 'water coolers' from thermaltake on all my chassis based |
35 |
system. new or large video cards have tons of processing going on inside |
36 |
the GPUs; thus a large source of heat. Systems with lots of GPU cards, |
37 |
are like ovens. All of this heat, regardless of source, KILLS all forms |
38 |
of memory, especially 'drives'. Keep everything monitored, well vented |
39 |
and in a room, cool as possible. Many server farm rooms run below 50 |
40 |
degrees F, to extend the performance and life of electronics, |
41 |
particularly HDD and other forms of memory. Many chipsets, scale down, |
42 |
upon increased heat, auto-magically. |
43 |
|
44 |
|
45 |
Another (indirect) way to monitor heat, is to monitor the power |
46 |
consumption of a component. (relatively) large power draw, is entwined |
47 |
with heat production. Heat kills drives and memory.... no exceptions! |
48 |
|
49 |
|
50 |
Here are few one-liners I use to monitor |
51 |
(use/load==heat): |
52 |
|
53 |
watch -n12 sensors -f |
54 |
|
55 |
dstat -tcndylp --top-cpu 10 |
56 |
|
57 |
htop |
58 |
|
59 |
What would be great, is if folks just list what they use to monitor the |
60 |
workload (and therefor heat indirectly) or the actual temperatures of |
61 |
given chipsets and "smart drives"? Perhaps we can then cull the |
62 |
responses and update of the gentoo help pages online with more detailed |
63 |
examples, scripts and tools to better organize heat, current and other |
64 |
relative performance parameters. |
65 |
|
66 |
|
67 |
hth, |
68 |
James |