1 |
Hi Yiyang, |
2 |
|
3 |
wuyy <xgreenlandforwyy@×××××.com> writes: |
4 |
|
5 |
> Sorry for the late report. I have been busy last week, so the actual |
6 |
> progress is slower than expected. |
7 |
> |
8 |
> This week I have leant a lot from Ulrich's comments on rocm.eclass. I |
9 |
> polished the eclass to v3 and send to gentoo-dev mailing list. However, |
10 |
> I observed another error introduced in v3, and I'll include a fix for it |
11 |
> in the v4 in the following days. |
12 |
> |
13 |
> Another half of my time is spent on testing sci-libs/roc-* packages on |
14 |
> various platforms, utilizing rocm.eclass. I can say that rocm.eclass did |
15 |
> its job as expected, so I believe after v4 it can be merged. |
16 |
|
17 |
Very good progress despite being busy. |
18 |
|
19 |
> With src_test enabled, I have found various test |
20 |
> failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly |
21 |
> exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails |
22 |
> 16 suites on Radeon VII [1], which is serious and confirmed by |
23 |
> upstream, so I suggest masking amdgpu_targets_gfx906 USE flag for |
24 |
> rocFFT-5.1.3; just today I observe MIOpen is failing many tests, |
25 |
> probably due to vanilla clang. I'll open issues and report those test |
26 |
> failures to upstream. Running tests suite takes a lot of time, and |
27 |
> often drain the GPU. It may takes more than 15 hours testing rocBLAS, |
28 |
> even on performant CPU like Ryzen 5950X. If I use the GPU to render |
29 |
> graphics (run a desktop environment) and do test simultaneously, it |
30 |
> often result in amdgpu driver failure. I hope one day we can have a |
31 |
> testing farm for ROCm packages, but that would be expensive because |
32 |
> there are a lot of GPU architectures, and the compilation takes a lot |
33 |
> of time. |
34 |
|
35 |
Have the problems on Radeon VII reproduced on MI100? |
36 |
|
37 |
> I planned to finish the draft of wiki pages [2,3], but turns out I'm |
38 |
> running out of time. I'll catch up in week 11. My mentor is also busy in |
39 |
> week 10, so my PR about rocm-opencl-runtime is still pending for review. |
40 |
> Now we are working on solving the dependency issue of ROCm packages -- |
41 |
> gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current |
42 |
> stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the |
43 |
> current unstable gcc, gcc-12, is unable to compile nearly all ROCm |
44 |
> packages [5]. |
45 |
|
46 |
If we cannot backport the gcc-12 compatiblity patch to llvm-14, the only |
47 |
way left is to live with gcc-11.4 while waiting for llvm-15. Good |
48 |
documentation on the wiki will educate the users why we are here. |
49 |
|
50 |
> I'll continue to do what's postponed in week 10 -- landing rocm.eclass |
51 |
> and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki |
52 |
> pages. I'll investigate MIOpen's situation as well. |
53 |
|
54 |
Cheers, |
55 |
Benda |