Gentoo Archives: gentoo-soc

From: Benda Xu <heroxbd@g.o>
To: gentoo-soc <gentoo-soc@l.g.o>
Subject: Re: [gentoo-soc] Week 10 Report for Refining ROCm Packages in Gentoo
Date: Tue, 23 Aug 2022 14:51:26
Message-Id: 87sfln109l.fsf@gentoo.org
In Reply to: [gentoo-soc] Week 10 Report for Refining ROCm Packages in Gentoo by wuyy
1 Hi Yiyang,
2
3 wuyy <xgreenlandforwyy@×××××.com> writes:
4
5 > Sorry for the late report. I have been busy last week, so the actual
6 > progress is slower than expected.
7 >
8 > This week I have leant a lot from Ulrich's comments on rocm.eclass. I
9 > polished the eclass to v3 and send to gentoo-dev mailing list. However,
10 > I observed another error introduced in v3, and I'll include a fix for it
11 > in the v4 in the following days.
12 >
13 > Another half of my time is spent on testing sci-libs/roc-* packages on
14 > various platforms, utilizing rocm.eclass. I can say that rocm.eclass did
15 > its job as expected, so I believe after v4 it can be merged.
16
17 Very good progress despite being busy.
18
19 > With src_test enabled, I have found various test
20 > failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly
21 > exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails
22 > 16 suites on Radeon VII [1], which is serious and confirmed by
23 > upstream, so I suggest masking amdgpu_targets_gfx906 USE flag for
24 > rocFFT-5.1.3; just today I observe MIOpen is failing many tests,
25 > probably due to vanilla clang. I'll open issues and report those test
26 > failures to upstream. Running tests suite takes a lot of time, and
27 > often drain the GPU. It may takes more than 15 hours testing rocBLAS,
28 > even on performant CPU like Ryzen 5950X. If I use the GPU to render
29 > graphics (run a desktop environment) and do test simultaneously, it
30 > often result in amdgpu driver failure. I hope one day we can have a
31 > testing farm for ROCm packages, but that would be expensive because
32 > there are a lot of GPU architectures, and the compilation takes a lot
33 > of time.
34
35 Have the problems on Radeon VII reproduced on MI100?
36
37 > I planned to finish the draft of wiki pages [2,3], but turns out I'm
38 > running out of time. I'll catch up in week 11. My mentor is also busy in
39 > week 10, so my PR about rocm-opencl-runtime is still pending for review.
40 > Now we are working on solving the dependency issue of ROCm packages --
41 > gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current
42 > stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the
43 > current unstable gcc, gcc-12, is unable to compile nearly all ROCm
44 > packages [5].
45
46 If we cannot backport the gcc-12 compatiblity patch to llvm-14, the only
47 way left is to live with gcc-11.4 while waiting for llvm-15. Good
48 documentation on the wiki will educate the users why we are here.
49
50 > I'll continue to do what's postponed in week 10 -- landing rocm.eclass
51 > and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki
52 > pages. I'll investigate MIOpen's situation as well.
53
54 Cheers,
55 Benda