Gentoo Archives: gentoo-soc

From: wuyy <xgreenlandforwyy@×××××.com>
To: gentoo-soc <gentoo-soc@l.g.o>
Subject: [gentoo-soc] Week 10 Report for Refining ROCm Packages in Gentoo
Date: Tue, 23 Aug 2022 14:03:12
Message-Id: YwTeGYxG4a+vvi+1@HEPwuyy
1 Hello all,
2
3 Sorry for the late report. I have been busy last week, so the actual
4 progress is slower than expected.
5
6 This week I have leant a lot from Ulrich's comments on rocm.eclass. I
7 polished the eclass to v3 and send to gentoo-dev mailing list. However,
8 I observed another error introduced in v3, and I'll include a fix for it
9 in the v4 in the following days.
10
11 Another half of my time is spent on testing sci-libs/roc-* packages on
12 various platforms, utilizing rocm.eclass. I can say that rocm.eclass did
13 its job as expected, so I believe after v4 it can be merged.
14
15 With src_test enabled, I have found various test failures. rocBLAS-5.1.3
16 fails 3 tests on Radeon RX 6700XT, slightly exceeding tolerance, which
17 seems not a big issue; rocFFT-5.1.3 fails 16 suites on Radeon VII [1],
18 which is serious and confirmed by upstream, so I suggest masking
19 amdgpu_targets_gfx906 USE flag for rocFFT-5.1.3; just today I observe
20 MIOpen is failing many tests, probably due to vanilla clang. I'll open
21 issues and report those test failures to upstream. Running tests suite
22 takes a lot of time, and often drain the GPU. It may takes more than 15
23 hours testing rocBLAS, even on performant CPU like Ryzen 5950X. If I use
24 the GPU to render graphics (run a desktop environment) and do test
25 simultaneously, it often result in amdgpu driver failure. I hope one day
26 we can have a testing farm for ROCm packages, but that would be
27 expensive because there are a lot of GPU architectures, and the
28 compilation takes a lot of time.
29
30 I planned to finish the draft of wiki pages [2,3], but turns out I'm
31 running out of time. I'll catch up in week 11. My mentor is also busy in
32 week 10, so my PR about rocm-opencl-runtime is still pending for review.
33 Now we are working on solving the dependency issue of ROCm packages --
34 gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current
35 stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the
36 current unstable gcc, gcc-12, is unable to compile nearly all ROCm
37 packages [5].
38
39 I'll continue to do what's postponed in week 10 -- landing rocm.eclass
40 and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki
41 pages. I'll investigate MIOpen's situation as well.
42
43 [1] https://github.com/ROCmSoftwarePlatform/rocFFT/issues/369
44 [2] https://wiki.gentoo.org/wiki/ROCm
45 [3] https://wiki.gentoo.org/wiki/HIP
46 [4] https://bugs.gentoo.org/842405
47 [5] https://bugs.gentoo.org/857660
48
49 Yours,
50 --
51 Yiyang Wu

Replies