Gentoo Archives: gentoo-user

From: R0b0t1 <r030t1@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Dell Precision Workstation Overheating
Date: Fri, 20 Apr 2018 14:11:55
Message-Id: CAAD4mYgKuaF58paM-qLxFscJQ_JfLdcfMiAXJc=60raX3gG6Eg@mail.gmail.com
In Reply to: Re: [gentoo-user] Dell Precision Workstation Overheating by Mick
1 On Fri, Apr 20, 2018 at 7:21 AM, Mick <michaelkintzios@×××××.com> wrote:
2 > On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote:
3 >> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) and
4 >> has numerous heat failures.
5 >>
6 >> Due to poor cooling ... surprised?
7 >>
8 >> The cooling is not working right. Something is still wrong.
9 >>
10 >> On 04/19/2018 09:33 PM, R0b0t1 wrote:
11 >> > Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro
12 >> > cards and a Tesla card.
13 >> >
14 >> > The system is a few years old at this point. Old enough that the
15 >> > thermal compound could have hardened, which is why I replaced it.
16 >
17 > If the problem started suddenly, rather than getting progressively worse over
18 > time, it may have something to do with kernel drivers, or some change in
19 > firmware.
20 >
21
22 As far as I know it has always been like this. It may be why it was
23 hardly used before it came into my care. Looking at the server I could
24 blame poor design; the inside is rather cramped, despite the care
25 taken with the internal baffles. They may not have run a good flow
26 simulation.
27
28 Mr. Bird's observation seems to support this.
29
30 > If the cause is mechanical, I'd also suggest checking the heat sink contact
31 > surface. Some heat sinks are poorly manufactured and require flattening with
32 > wet 'n dry sandpaper to get a flat enough surface and improve their contact
33 > with the CPU. I've seen 15°C improvement in a Zalman CPU cooler after excess
34 > metal was removed from copper pipes, which were manufactured proud. Hardcore
35 > O/C's flatten the CPU too, but I'd avoid anything as radical because it can go
36 > badly wrong if you remove more than the surface varnish from the chip.
37 >
38 > In the interim, opening the side panel may also help in hot weather.
39 >
40
41 The internals are custom made to fit the motherboard, cards, and drive
42 slots. It may work better if I move it to another tower but it will be
43 a while before I can find one. I will look at the interface between
44 the heatsink and processor again, but it looked fine.
45
46
47 How concerned should I be about overheating machine check errors? I
48 used to think that it was best to avoid them, as the threshold was
49 high enough that very small parts of the die could overshoot and fail,
50 but I was informed that is not the case. Besides the throttling (which
51 is fairly bad) I am not sure if there are any drawbacks to the
52 overheating.
53
54 I am wondering what the point of 32 threads is if you can't use them at 100%.
55
56 Cheers,
57 R0b0t1

Replies

Subject Author
Re: [gentoo-user] Dell Precision Workstation Overheating Mick <michaelkintzios@×××××.com>
Re: [gentoo-user] Dell Precision Workstation Overheating Dale <rdalek1967@×××××.com>
Re: [gentoo-user] Dell Precision Workstation Overheating Corbin Bird <corbinbird@×××××××.net>