Gentoo Archives: gentoo-user

From: R0b0t1 <r030t1@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Instrumenting the GPU
Date: Tue, 14 Nov 2017 06:37:44
In Reply to: Re: [gentoo-user] Instrumenting the GPU by Peter Humphrey

On Mon, Nov 13, 2017 at 9:46 AM, Peter Humphrey <peter@××××××××××××.uk> wrote:
> On Monday, 13 November 2017 15:12:56 GMT Daniel Frey wrote: >> On 11/13/17 02:59, Peter Humphrey wrote: >> > Hello list, >> > >> > I'm hunting a problem with cooling in this box, and I've got as far as >> > suspecting my new AMD WX 5100 GPU. >> > >> > One of my BOINC projects causes the GPU temperature, as shown by >> > gkrellm, to shoot up to 75C or more and cause intolerable system >> > cooling noise. If I suspend that project but leave the other seven >> > running, the temperature returns to what I hope is a normal 55C. Those >> > seven projects are supposed to use the GPU, but I'm not sure whether >> > they do in fact. >> > >> > Is there any way I can monitor what is using the GPU, to find out? >> >> I don't know if there's a utility for consumer level cards that can do >> this. I do remember for Nvidia there's nvidia-smi but I don't think it >> will list processes for desktop cards. > > This isn't consumer grade (look it up in your local shops ;-) ): > > # lspci -v -s 01:00.0 > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Ellesmere [Radeon Pro WX 5100] (prog-if 00 [VGA controller]) > Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon > Pro WX 5100] > Flags: bus master, fast devsel, latency 0, IRQ 34, NUMA node 0 > Memory at c0000000 (64-bit, prefetchable) [size=256M] > Memory at d0000000 (64-bit, prefetchable) [size=2M] > I/O ports at e000 [size=256] > Memory at fbe00000 (32-bit, non-prefetchable) [size=256K] > Expansion ROM at 000c0000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 <?> > Capabilities: [50] Power Management version 3 > Capabilities: [58] Express Legacy Endpoint, MSI 00 > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 > Len=010 <?> > Capabilities: [150] Advanced Error Reporting > Capabilities: [200] #15 > Capabilities: [270] #19 > Capabilities: [2b0] Address Translation Service (ATS) > Capabilities: [2c0] Page Request Interface (PRI) > Capabilities: [2d0] Process Address Space ID (PASID) > Capabilities: [320] Latency Tolerance Reporting > Capabilities: [328] Alternative Routing-ID Interpretation (ARI) > Capabilities: [370] L1 PM Substates > Kernel driver in use: amdgpu > >> The only other generic ones I can think of are cuda-z and gputop. Have >> you tried one of those? Although I don't think it'll give you the >> information you need either. > > As it's AMD, not nVidia, nvidia-smi and cuda aren't suitable. I hadn't heard > of GPU Top - thanks. I'll have a look at it. > > I forgot to add that I'm using the proprietary dev-libs/amdgpu-pro-opencl > because mesa hasn't caught up yet. >
The level of detail you want will likely necessitate the use of a GPU debugger. AMD provides CodeXL, located at I suggest looking at the profiling features. You may want to communicate your findings to the relevant BOINC projects. Cheers, R0b0t1


Subject Author
Re: [gentoo-user] Instrumenting the GPU Peter Humphrey <peter@××××××××××××.uk>