Gentoo Archives: gentoo-user

From: Corbin Bird <corbinbird@×××××××.net>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Dell Precision Workstation Overheating
Date: Fri, 20 Apr 2018 14:44:17
Message-Id: 0cc20525-91dc-8fd8-8b08-99ea775bea01@charter.net
In Reply to: Re: [gentoo-user] Dell Precision Workstation Overheating by R0b0t1
1 Suggestion :
2
3  ... upgrade the cooling capacity.
4
5 The CPU in my box is a AMD FX-9590. TDP is 220 watts. Running at 4.7 Ghz.
6
7 With cooling for TDP 250 watts, it ran hot under load.
8
9 With cooling for TDP 900 watts, it rarely gets close to 110 F under
10 heavy load.
11
12
13 On 04/20/2018 09:11 AM, R0b0t1 wrote:
14 > On Fri, Apr 20, 2018 at 7:21 AM, Mick <michaelkintzios@×××××.com> wrote:
15 >> On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote:
16 >>> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) and
17 >>> has numerous heat failures.
18 >>>
19 >>> Due to poor cooling ... surprised?
20 >>>
21 >>> The cooling is not working right. Something is still wrong.
22 >>>
23 >>> On 04/19/2018 09:33 PM, R0b0t1 wrote:
24 >>>> Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro
25 >>>> cards and a Tesla card.
26 >>>>
27 >>>> The system is a few years old at this point. Old enough that the
28 >>>> thermal compound could have hardened, which is why I replaced it.
29 >> If the problem started suddenly, rather than getting progressively worse over
30 >> time, it may have something to do with kernel drivers, or some change in
31 >> firmware.
32 >>
33 > As far as I know it has always been like this. It may be why it was
34 > hardly used before it came into my care. Looking at the server I could
35 > blame poor design; the inside is rather cramped, despite the care
36 > taken with the internal baffles. They may not have run a good flow
37 > simulation.
38 >
39 > Mr. Bird's observation seems to support this.
40 >
41 >> If the cause is mechanical, I'd also suggest checking the heat sink contact
42 >> surface. Some heat sinks are poorly manufactured and require flattening with
43 >> wet 'n dry sandpaper to get a flat enough surface and improve their contact
44 >> with the CPU. I've seen 15°C improvement in a Zalman CPU cooler after excess
45 >> metal was removed from copper pipes, which were manufactured proud. Hardcore
46 >> O/C's flatten the CPU too, but I'd avoid anything as radical because it can go
47 >> badly wrong if you remove more than the surface varnish from the chip.
48 >>
49 >> In the interim, opening the side panel may also help in hot weather.
50 >>
51 > The internals are custom made to fit the motherboard, cards, and drive
52 > slots. It may work better if I move it to another tower but it will be
53 > a while before I can find one. I will look at the interface between
54 > the heatsink and processor again, but it looked fine.
55 >
56 >
57 > How concerned should I be about overheating machine check errors? I
58 > used to think that it was best to avoid them, as the threshold was
59 > high enough that very small parts of the die could overshoot and fail,
60 > but I was informed that is not the case. Besides the throttling (which
61 > is fairly bad) I am not sure if there are any drawbacks to the
62 > overheating.
63 >
64 > I am wondering what the point of 32 threads is if you can't use them at 100%.
65 >
66 > Cheers,
67 > R0b0t1