1 |
wuyy <xgreenlandforwyy@×××××.com> writes: |
2 |
|
3 |
> Bug fixes cover rocBLAS and rocFFT. For rocBLAS, I backported a patch to |
4 |
> sci-libs/rocBLAS-5.0.2-r1 and dev-util/Tensile-r1, to pass `-j N` from |
5 |
> ${MAKEOPTS} to TensileCreateLibrary when building rocBLAS, which fixed [1]. As |
6 |
> of rocFFT, I corrected its BDEPEND [2], added missing sys-libs/omp for omp.h |
7 |
> [3], and let it depend on dev-util/rocm-cmake-5.0.2-r1 which does not install |
8 |
> files to unexpected paths [4]. However, as the gcc-12.1.0 lands, bugs about |
9 |
> clang expanding __noinline__ macro in g++-v12/bits/shared_ptr_base.h emeregs |
10 |
> [5,6]. Details can be seen on [5], and I'm working on resolving this (see PR |
11 |
> [7]). |
12 |
|
13 |
Good. Nice progress! |
14 |
|
15 |
> For rocm.eclass, I finished the draft for three major functions: USE_EXPAND, |
16 |
> src_configure and src_test. I also wrote get_amdgpu_flags function used by |
17 |
> src_configure. My latest work on rocm.eclass is located at |
18 |
> https://github.com/littlewu2508/gentoo/blob/rocm-5.1.3/eclass/rocm.eclass. Below |
19 |
> are its status and my questions I'd like to share: |
20 |
> |
21 |
> 1. Default architectures. Now I implement the USE_EXPAND of AMDGPU_TARGETS, I |
22 |
> need to specify the default value of each use. The straightforward way is to |
23 |
> enable all targets by default, but that can be **extremely** slow and |
24 |
> disk-hungry when compiling ROCm libraries such as rocBLAS or rocFFT (expect to |
25 |
> compile for several hours if the CPU is not powerful enough). Currently I |
26 |
> defined a variable OFFICIAL_AMDGPU_TARGETS, which is referenced from ROCm |
27 |
> installation documents [8]. Although the support range is much larger, and |
28 |
> different components have their own support matrices, AMD promise to fully |
29 |
> support these enterprise cards. For enterprise users, they can just emerge ROCm |
30 |
> packages without setting specific use flag, and have out-of-box experience on |
31 |
> Gentoo. For users with consumer end cards, they can read the wiki page (covered |
32 |
> later in my GSoC project) and seek instructions to set the correct use |
33 |
> flag. |
34 |
|
35 |
Fine. |
36 |
|
37 |
> 2. Whether setting -DSKIP_RPATH=true in mycmakeargs. Previously this is set to |
38 |
> avoid including rpath if USE=benchmark when building ROCm packages like |
39 |
> sci-libs/roc-* and sci-libs/hip-*. The test and benchmark executables are named |
40 |
> "clients" (take rocBLAS as example, clients are programs that uses functions and |
41 |
> link librocblas.so). In order to run tests and benchmarks before install |
42 |
> libraries to system, rpath is set on these executables, but gentoo does not have |
43 |
> a src_benchmark phase, so the benchmark binaries is just installed, and user can |
44 |
> run it afterwards (actually I use it in my research to tune algorithms). So |
45 |
> there should not be rpath in benchmark binaries, and this is achieved by setting |
46 |
> -DSKIP_RPATH=true. However, after this, test program cannot execute because |
47 |
> rpath is also eliminated, so I have to specify LD_LIBRARY_PATH in src_test |
48 |
> manually. |
49 |
|
50 |
Go for it. This is exactly LD_LIBRARY_PATH is designed for: test of |
51 |
libraries installed in non-standard locations. |
52 |
|
53 |
> [...] |
54 |
> |
55 |
> 3. Detect AMDGPU in src_test. This blocks https://bugs.gentoo.org/817440, and I |
56 |
> also raise questions in the bug report. Tinderbox cannot run tests on ROCm |
57 |
> packages like rocBLAS, because there is no AMDGPU available. I implement the |
58 |
> detection mechanism, with one problem left: if no GPU available, fail the test |
59 |
> or exit normally? |
60 |
|
61 |
Just follow the CI runner's advice at https://bugs.gentoo.org/817440#c4 |
62 |
|
63 |
,---- |
64 |
| Agostino Sarubbo gentoo-dev 2021-10-11 07:16:42 UTC |
65 |
| |
66 |
| (In reply to Benda Xu from comment #0) |
67 |
| > Tinderbox does not have the hardware for GPGPU. The ROCm GPGPU ebuilds |
68 |
| > unconditionally fail. |
69 |
| |
70 |
| Is there a way for the ebuild to die if the hw does not meet the requisites? |
71 |
`---- |
72 |
|
73 |
> Personally, I think the best solution is to detect AMDGPU during |
74 |
> pretend or setup phase, |
75 |
|
76 |
Just die otherwise. |
77 |
|
78 |
> turn off the test USE flag if no GPU available, or the architecture |
79 |
> compiled does not match the detected GPU. But is operating USE flag |
80 |
> inside ebuild phase functions possible? |
81 |
|
82 |
No, don't modify USE flags at ebuild runtime. |
83 |
|
84 |
> Despite these issues I managed a working version of rocm.eclass, and used it on |
85 |
> rocBLAS. The use expand works successful, while src_test can properly detect |
86 |
> hardware and execute in both sandboxed vanilla Gentoo, and non-sandboxed Gentoo |
87 |
> prefix. There are still things to work on rocm.eclass: |
88 |
> |
89 |
> 1. ROCM_USEDEP, similar to PYTHON_USEDEP. For example, hipBLAS uses |
90 |
> architectures gfx906 and gfx1030, then its dependency, rocBLAS, must contains |
91 |
> gfx906 and gfx1030. 2. SRC_URI. 3. A way to automatically add PORTAGE_USERNAME |
92 |
> to render group, to access amdgpu and perform src_test. I don't have any clue on |
93 |
> this yet, maybe meta package in acct-group can do this? |
94 |
|
95 |
I have no idea. |
96 |
|
97 |
> In the coming week I'll finish rocm.eclass as planed, and send out for early |
98 |
> review. Meanwhile I'll continue fixing bugs [5,6,9,10], answering questions |
99 |
> about enabling rocm in packages [11,12], and prepare to land ROCm-5.1.3. One of |
100 |
> my friend is also plugging Radeon VII on there arm64 server, and if everything |
101 |
> goes well I can try ROCm on arm64 (in kernel document, the GPGPU driver, amdkfd, |
102 |
> support amd64, arm64 and ppc64), and add the ~arm64 KEYWORD in the future. |
103 |
|
104 |
Cheers, |
105 |
Benda |