1 |
Suggestion : |
2 |
|
3 |
... upgrade the cooling capacity. |
4 |
|
5 |
The CPU in my box is a AMD FX-9590. TDP is 220 watts. Running at 4.7 Ghz. |
6 |
|
7 |
With cooling for TDP 250 watts, it ran hot under load. |
8 |
|
9 |
With cooling for TDP 900 watts, it rarely gets close to 110 F under |
10 |
heavy load. |
11 |
|
12 |
|
13 |
On 04/20/2018 09:11 AM, R0b0t1 wrote: |
14 |
> On Fri, Apr 20, 2018 at 7:21 AM, Mick <michaelkintzios@×××××.com> wrote: |
15 |
>> On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote: |
16 |
>>> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) and |
17 |
>>> has numerous heat failures. |
18 |
>>> |
19 |
>>> Due to poor cooling ... surprised? |
20 |
>>> |
21 |
>>> The cooling is not working right. Something is still wrong. |
22 |
>>> |
23 |
>>> On 04/19/2018 09:33 PM, R0b0t1 wrote: |
24 |
>>>> Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro |
25 |
>>>> cards and a Tesla card. |
26 |
>>>> |
27 |
>>>> The system is a few years old at this point. Old enough that the |
28 |
>>>> thermal compound could have hardened, which is why I replaced it. |
29 |
>> If the problem started suddenly, rather than getting progressively worse over |
30 |
>> time, it may have something to do with kernel drivers, or some change in |
31 |
>> firmware. |
32 |
>> |
33 |
> As far as I know it has always been like this. It may be why it was |
34 |
> hardly used before it came into my care. Looking at the server I could |
35 |
> blame poor design; the inside is rather cramped, despite the care |
36 |
> taken with the internal baffles. They may not have run a good flow |
37 |
> simulation. |
38 |
> |
39 |
> Mr. Bird's observation seems to support this. |
40 |
> |
41 |
>> If the cause is mechanical, I'd also suggest checking the heat sink contact |
42 |
>> surface. Some heat sinks are poorly manufactured and require flattening with |
43 |
>> wet 'n dry sandpaper to get a flat enough surface and improve their contact |
44 |
>> with the CPU. I've seen 15°C improvement in a Zalman CPU cooler after excess |
45 |
>> metal was removed from copper pipes, which were manufactured proud. Hardcore |
46 |
>> O/C's flatten the CPU too, but I'd avoid anything as radical because it can go |
47 |
>> badly wrong if you remove more than the surface varnish from the chip. |
48 |
>> |
49 |
>> In the interim, opening the side panel may also help in hot weather. |
50 |
>> |
51 |
> The internals are custom made to fit the motherboard, cards, and drive |
52 |
> slots. It may work better if I move it to another tower but it will be |
53 |
> a while before I can find one. I will look at the interface between |
54 |
> the heatsink and processor again, but it looked fine. |
55 |
> |
56 |
> |
57 |
> How concerned should I be about overheating machine check errors? I |
58 |
> used to think that it was best to avoid them, as the threshold was |
59 |
> high enough that very small parts of the die could overshoot and fail, |
60 |
> but I was informed that is not the case. Besides the throttling (which |
61 |
> is fairly bad) I am not sure if there are any drawbacks to the |
62 |
> overheating. |
63 |
> |
64 |
> I am wondering what the point of 32 threads is if you can't use them at 100%. |
65 |
> |
66 |
> Cheers, |
67 |
> R0b0t1 |