1 |
Hello all, |
2 |
|
3 |
Sorry for the late report. I have been busy last week, so the actual |
4 |
progress is slower than expected. |
5 |
|
6 |
This week I have leant a lot from Ulrich's comments on rocm.eclass. I |
7 |
polished the eclass to v3 and send to gentoo-dev mailing list. However, |
8 |
I observed another error introduced in v3, and I'll include a fix for it |
9 |
in the v4 in the following days. |
10 |
|
11 |
Another half of my time is spent on testing sci-libs/roc-* packages on |
12 |
various platforms, utilizing rocm.eclass. I can say that rocm.eclass did |
13 |
its job as expected, so I believe after v4 it can be merged. |
14 |
|
15 |
With src_test enabled, I have found various test failures. rocBLAS-5.1.3 |
16 |
fails 3 tests on Radeon RX 6700XT, slightly exceeding tolerance, which |
17 |
seems not a big issue; rocFFT-5.1.3 fails 16 suites on Radeon VII [1], |
18 |
which is serious and confirmed by upstream, so I suggest masking |
19 |
amdgpu_targets_gfx906 USE flag for rocFFT-5.1.3; just today I observe |
20 |
MIOpen is failing many tests, probably due to vanilla clang. I'll open |
21 |
issues and report those test failures to upstream. Running tests suite |
22 |
takes a lot of time, and often drain the GPU. It may takes more than 15 |
23 |
hours testing rocBLAS, even on performant CPU like Ryzen 5950X. If I use |
24 |
the GPU to render graphics (run a desktop environment) and do test |
25 |
simultaneously, it often result in amdgpu driver failure. I hope one day |
26 |
we can have a testing farm for ROCm packages, but that would be |
27 |
expensive because there are a lot of GPU architectures, and the |
28 |
compilation takes a lot of time. |
29 |
|
30 |
I planned to finish the draft of wiki pages [2,3], but turns out I'm |
31 |
running out of time. I'll catch up in week 11. My mentor is also busy in |
32 |
week 10, so my PR about rocm-opencl-runtime is still pending for review. |
33 |
Now we are working on solving the dependency issue of ROCm packages -- |
34 |
gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current |
35 |
stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the |
36 |
current unstable gcc, gcc-12, is unable to compile nearly all ROCm |
37 |
packages [5]. |
38 |
|
39 |
I'll continue to do what's postponed in week 10 -- landing rocm.eclass |
40 |
and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki |
41 |
pages. I'll investigate MIOpen's situation as well. |
42 |
|
43 |
[1] https://github.com/ROCmSoftwarePlatform/rocFFT/issues/369 |
44 |
[2] https://wiki.gentoo.org/wiki/ROCm |
45 |
[3] https://wiki.gentoo.org/wiki/HIP |
46 |
[4] https://bugs.gentoo.org/842405 |
47 |
[5] https://bugs.gentoo.org/857660 |
48 |
|
49 |
Yours, |
50 |
-- |
51 |
Yiyang Wu |