Gentoo Archives: gentoo-user

From: R0b0t1 <r030t1@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Instrumenting the GPU
Date: Tue, 14 Nov 2017 06:37:44
Message-Id: CAAD4mYh=oL20ahsGqii12ncLCTm6b7c_VMBA2FuFCcBW4K+e6g@mail.gmail.com
In Reply to: Re: [gentoo-user] Instrumenting the GPU by Peter Humphrey
1 Hello,
2
3 On Mon, Nov 13, 2017 at 9:46 AM, Peter Humphrey <peter@××××××××××××.uk> wrote:
4 > On Monday, 13 November 2017 15:12:56 GMT Daniel Frey wrote:
5 >> On 11/13/17 02:59, Peter Humphrey wrote:
6 >> > Hello list,
7 >> >
8 >> > I'm hunting a problem with cooling in this box, and I've got as far as
9 >> > suspecting my new AMD WX 5100 GPU.
10 >> >
11 >> > One of my BOINC projects causes the GPU temperature, as shown by
12 >> > gkrellm, to shoot up to 75C or more and cause intolerable system
13 >> > cooling noise. If I suspend that project but leave the other seven
14 >> > running, the temperature returns to what I hope is a normal 55C. Those
15 >> > seven projects are supposed to use the GPU, but I'm not sure whether
16 >> > they do in fact.
17 >> >
18 >> > Is there any way I can monitor what is using the GPU, to find out?
19 >>
20 >> I don't know if there's a utility for consumer level cards that can do
21 >> this. I do remember for Nvidia there's nvidia-smi but I don't think it
22 >> will list processes for desktop cards.
23 >
24 > This isn't consumer grade (look it up in your local shops ;-) ):
25 >
26 > # lspci -v -s 01:00.0
27 > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
28 > Ellesmere [Radeon Pro WX 5100] (prog-if 00 [VGA controller])
29 > Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon
30 > Pro WX 5100]
31 > Flags: bus master, fast devsel, latency 0, IRQ 34, NUMA node 0
32 > Memory at c0000000 (64-bit, prefetchable) [size=256M]
33 > Memory at d0000000 (64-bit, prefetchable) [size=2M]
34 > I/O ports at e000 [size=256]
35 > Memory at fbe00000 (32-bit, non-prefetchable) [size=256K]
36 > Expansion ROM at 000c0000 [disabled] [size=128K]
37 > Capabilities: [48] Vendor Specific Information: Len=08 <?>
38 > Capabilities: [50] Power Management version 3
39 > Capabilities: [58] Express Legacy Endpoint, MSI 00
40 > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
41 > Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
42 > Len=010 <?>
43 > Capabilities: [150] Advanced Error Reporting
44 > Capabilities: [200] #15
45 > Capabilities: [270] #19
46 > Capabilities: [2b0] Address Translation Service (ATS)
47 > Capabilities: [2c0] Page Request Interface (PRI)
48 > Capabilities: [2d0] Process Address Space ID (PASID)
49 > Capabilities: [320] Latency Tolerance Reporting
50 > Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
51 > Capabilities: [370] L1 PM Substates
52 > Kernel driver in use: amdgpu
53 >
54 >> The only other generic ones I can think of are cuda-z and gputop. Have
55 >> you tried one of those? Although I don't think it'll give you the
56 >> information you need either.
57 >
58 > As it's AMD, not nVidia, nvidia-smi and cuda aren't suitable. I hadn't heard
59 > of GPU Top - thanks. I'll have a look at it.
60 >
61 > I forgot to add that I'm using the proprietary dev-libs/amdgpu-pro-opencl
62 > because mesa hasn't caught up yet.
63 >
64
65 The level of detail you want will likely necessitate the use of a GPU
66 debugger. AMD provides CodeXL, located at
67 https://gpuopen.com/compute-product/codexl/. I suggest looking at the
68 profiling features.
69
70 You may want to communicate your findings to the relevant BOINC projects.
71
72 Cheers,
73 R0b0t1

Replies

Subject Author
Re: [gentoo-user] Instrumenting the GPU Peter Humphrey <peter@××××××××××××.uk>