1 |
Hello all, |
2 |
|
3 |
Sorry for the late report. I was focusing on server hardware upgrade at the lab |
4 |
I work at, and forgot to send this report yesterday. Also, I apologize that I |
5 |
didn't take the blog seriously, because I hadn't wrote some interesting posts |
6 |
there, and thought it was another place to archive weekly report if not adding |
7 |
new stuff. Yury kindly reminded me last week that there is not posts from me |
8 |
yet, so I uploaded the week reports and two figures of blender rendering using |
9 |
HIP cycles. I'll utilize this platform and spend more time on collecting |
10 |
materials for posts in the coming days. |
11 |
|
12 |
The forth week working on packaging ROCm is quite smooth. There are some bug |
13 |
fixes, and also major improvements on rocm.eclass. |
14 |
|
15 |
Bug fixes cover rocBLAS and rocFFT. For rocBLAS, I backported a patch to |
16 |
sci-libs/rocBLAS-5.0.2-r1 and dev-util/Tensile-r1, to pass `-j N` from |
17 |
${MAKEOPTS} to TensileCreateLibrary when building rocBLAS, which fixed [1]. As |
18 |
of rocFFT, I corrected its BDEPEND [2], added missing sys-libs/omp for omp.h |
19 |
[3], and let it depend on dev-util/rocm-cmake-5.0.2-r1 which does not install |
20 |
files to unexpected paths [4]. However, as the gcc-12.1.0 lands, bugs about |
21 |
clang expanding __noinline__ macro in g++-v12/bits/shared_ptr_base.h emeregs |
22 |
[5,6]. Details can be seen on [5], and I'm working on resolving this (see PR |
23 |
[7]). |
24 |
|
25 |
For rocm.eclass, I finished the draft for three major functions: USE_EXPAND, |
26 |
src_configure and src_test. I also wrote get_amdgpu_flags function used by |
27 |
src_configure. My latest work on rocm.eclass is located at |
28 |
https://github.com/littlewu2508/gentoo/blob/rocm-5.1.3/eclass/rocm.eclass. Below |
29 |
are its status and my questions I'd like to share: |
30 |
|
31 |
1. Default architectures. Now I implement the USE_EXPAND of AMDGPU_TARGETS, I |
32 |
need to specify the default value of each use. The straightforward way is to |
33 |
enable all targets by default, but that can be **extremely** slow and |
34 |
disk-hungry when compiling ROCm libraries such as rocBLAS or rocFFT (expect to |
35 |
compile for several hours if the CPU is not powerful enough). Currently I |
36 |
defined a variable OFFICIAL_AMDGPU_TARGETS, which is referenced from ROCm |
37 |
installation documents [8]. Although the support range is much larger, and |
38 |
different components have their own support matrices, AMD promise to fully |
39 |
support these enterprise cards. For enterprise users, they can just emerge ROCm |
40 |
packages without setting specific use flag, and have out-of-box experience on |
41 |
Gentoo. For users with consumer end cards, they can read the wiki page (covered |
42 |
later in my GSoC project) and seek instructions to set the correct use flag. |
43 |
|
44 |
2. Whether setting -DSKIP_RPATH=true in mycmakeargs. Previously this is set to |
45 |
avoid including rpath if USE=benchmark when building ROCm packages like |
46 |
sci-libs/roc-* and sci-libs/hip-*. The test and benchmark executables are named |
47 |
"clients" (take rocBLAS as example, clients are programs that uses functions and |
48 |
link librocblas.so). In order to run tests and benchmarks before install |
49 |
libraries to system, rpath is set on these executables, but gentoo does not have |
50 |
a src_benchmark phase, so the benchmark binaries is just installed, and user can |
51 |
run it afterwards (actually I use it in my research to tune algorithms). So |
52 |
there should not be rpath in benchmark binaries, and this is achieved by setting |
53 |
-DSKIP_RPATH=true. However, after this, test program cannot execute because |
54 |
rpath is also eliminated, so I have to specify LD_LIBRARY_PATH in src_test |
55 |
manually. Another resolution is not skipping rpath, but run chrpath on affected |
56 |
binaries, which means maintainers have to write a dedicated src_install and |
57 |
remember to add chrpath command applying on every new executables when bumping |
58 |
versions. The third solution is to patch CMakeLists.txt to include rpath only in |
59 |
test programs, but this method also introduce more maintenance work. What's your |
60 |
opinion? |
61 |
|
62 |
3. Detect AMDGPU in src_test. This blocks https://bugs.gentoo.org/817440, and I |
63 |
also raise questions in the bug report. Tinderbox cannot run tests on ROCm |
64 |
packages like rocBLAS, because there is no AMDGPU available. I implement the |
65 |
detection mechanism, with one problem left: if no GPU available, fail the test |
66 |
or exit normally? Personally, I think the best solution is to detect AMDGPU |
67 |
during pretend or setup phase, turn off the test USE flag if no GPU available, |
68 |
or the architecture compiled does not match the detected GPU. But is operating |
69 |
USE flag inside ebuild phase functions possible? |
70 |
|
71 |
Despite these issues I managed a working version of rocm.eclass, and used it on |
72 |
rocBLAS. The use expand works successful, while src_test can properly detect |
73 |
hardware and execute in both sandboxed vanilla Gentoo, and non-sandboxed Gentoo |
74 |
prefix. There are still things to work on rocm.eclass: |
75 |
|
76 |
1. ROCM_USEDEP, similar to PYTHON_USEDEP. For example, hipBLAS uses |
77 |
architectures gfx906 and gfx1030, then its dependency, rocBLAS, must contains |
78 |
gfx906 and gfx1030. 2. SRC_URI. 3. A way to automatically add PORTAGE_USERNAME |
79 |
to render group, to access amdgpu and perform src_test. I don't have any clue on |
80 |
this yet, maybe meta package in acct-group can do this? |
81 |
|
82 |
In the coming week I'll finish rocm.eclass as planed, and send out for early |
83 |
review. Meanwhile I'll continue fixing bugs [5,6,9,10], answering questions |
84 |
about enabling rocm in packages [11,12], and prepare to land ROCm-5.1.3. One of |
85 |
my friend is also plugging Radeon VII on there arm64 server, and if everything |
86 |
goes well I can try ROCm on arm64 (in kernel document, the GPGPU driver, amdkfd, |
87 |
support amd64, arm64 and ppc64), and add the ~arm64 KEYWORD in the future. |
88 |
|
89 |
[1] https://bugs.gentoo.org/852236 |
90 |
[2] https://bugs.gentoo.org/836248 |
91 |
[3] https://bugs.gentoo.org/850937 |
92 |
[4] https://bugs.gentoo.org/836274 |
93 |
[5] https://bugs.gentoo.org/857126 |
94 |
[6] https://bugs.gentoo.org/857660 |
95 |
[7] https://github.com/gentoo/gentoo/pull/26311 |
96 |
[8] https://docs.amd.com/bundle/ROCm-Getting-Started-Guide-v5.1.3/page/Overview_of_ROCm_Installation.html |
97 |
[9] https://bugs.gentoo.org/842366 |
98 |
[10] https://bugs.gentoo.org/836275 |
99 |
[11] https://github.com/gentoo/gentoo/pull/25836 |
100 |
[12] https://github.com/gentoo/gentoo/pull/25837 |
101 |
|
102 |
Cheers, |
103 |
-- |
104 |
Yiyang Wu |