1 |
Hello all, |
2 |
|
3 |
I'm back on GSoC after a two-week temporary leave. |
4 |
|
5 |
This week there are two major progress: dev-util/rocprofiler and |
6 |
rocm.eclass. |
7 |
|
8 |
I have implemented all the functions I think necessary for rocm.eclass. |
9 |
It was just send to rocm.eclass draft to gentoo-dev mailing list (also |
10 |
with a Github PR at [1]), please have a review. In the following weeks, |
11 |
I will collect feedbacks and continue to polish it. |
12 |
|
13 |
In summary, I have implemented those functions which is listed in my |
14 |
proposal: |
15 |
- USE_EXPNAD of amdgpu_targets_, and ROCM_USEDEP to make the use flag |
16 |
coherent among dependencies; |
17 |
- rocm_src_configure contains common arguments in src_prepare; |
18 |
- rocm_src_test which checks the permission on /dev/kfd and |
19 |
/dev/dri/render* |
20 |
|
21 |
There are also something listed in proposal but I decided not to |
22 |
implement now: |
23 |
- rocm_src_prepare: although there are some similarities among ebuilds, |
24 |
src_prepare are highly customized to each ROCm components. Unifying |
25 |
would take extra work. |
26 |
- SRC_URI: currently all SRC_URI is already specified in each ebuilds. |
27 |
It does not hurt to keep the status quo. |
28 |
|
29 |
Moreover, during implementation I found another feature necessary |
30 |
- rocm_src_test: correctly handles different scenarios. ROCm packages |
31 |
may have cmake test, which can be run using cmake_src_test, or only |
32 |
compiled some testing binaries which requires execution from |
33 |
command-line. I made rocm_src_test automatically detect the method, so |
34 |
ROCm packages just have to call this function directly without doing |
35 |
anything. |
36 |
|
37 |
Actually I have never imagined rocm.eclass could be in this shape |
38 |
eventually. Initially I just thought it would provide some utilities, |
39 |
mainly src_test and USE_EXPAND. But when implementing I found all these |
40 |
feature requires careful treatment. The comments (mainly examples) also |
41 |
takes half of the length. It ends up in 278 lines, which is a |
42 |
middle-sized among current eclasses. Maybe it can be further trimmed |
43 |
down after polishing, because there could be awkward implementations or |
44 |
re-inventions in it. |
45 |
|
46 |
Based on my draft rocm.eclass, I have prepared sci-libs/roc*=5.1.3, |
47 |
sci-lib/hip-*-5.1.3 and dev-python/cupy making use of it. It feels great |
48 |
to simplify the ebuilds, and portage can handles the USE_EXPAND and |
49 |
dependencies just as expected. Once the rocm.eclass get in tree, I'll |
50 |
push those ROCm-5.1.3 ebuilds. |
51 |
|
52 |
|
53 |
Anther thing to mention is that ROCm-5.1.3 toolchains finally get merged |
54 |
[5], with the fixed dev-util/rocprofiler-{4.3.0,5.0.2,5.1.3}. |
55 |
rocprofiler is actually buggy before, because I thought I committed the |
56 |
patch which stripped the libhsa-amd-aqlprofile.so loading (I even |
57 |
claimed it in the commit message), but it was not committed and lost in |
58 |
history. So I reproduced the patch. Also, I did some research about this |
59 |
proprietary lib. By default, not loading it means tracing hsa/hip is not |
60 |
possible -- you only get basic information like name and time of each |
61 |
GPU kernel execution, but do not know the pipeline of kernel execution |
62 |
(which one has spawned which kernel). AQL should be HSA architected |
63 |
queuing language (HSA AQL), where |
64 |
https://llvm.org/docs/AMDGPUUsage.html#hsa-aql-queue documented. It did |
65 |
sound related to the pipeline of kernel dispatching. By the description, |
66 |
libhsa-amd-aqlprofile.so is an extension API of AQL Profile. But |
67 |
actually, patching the source code to let rocprofiler not loading |
68 |
libhsa-amd-aqlprofile.so does not breaks the tracing of hsa/hip. So, I'm |
69 |
not sure why libhsa-amd-aqlprofile.so is needed, and raised a question |
70 |
at [2]. So I complete the fix in [3,4]. |
71 |
|
72 |
According to the renewed proposal (I have been leaving for two weeks, so |
73 |
there are changes in plan), I should collect feedback and refine |
74 |
rocm.eclass, and prepare dev-python/cupy and sci-libs/rocWMMA. I'll |
75 |
investigate ROCgdb, too. Also, rocm-device-libs is a major package |
76 |
because many users relies on it to provide opencl. I'll work on bumping |
77 |
its version, too. What's more, with hip-5.1.3 against vanilla clang, |
78 |
rocm for blender can land in ::gentoo. |
79 |
|
80 |
[1] https://github.com/gentoo/gentoo/pull/26784 |
81 |
[2] https://github.com/RadeonOpenCompute/ROCm/issues/1781 |
82 |
[3] https://github.com/gentoo/gentoo/pull/26755 |
83 |
[4] https://github.com/gentoo/gentoo/pull/26771 |
84 |
[5] https://github.com/gentoo/gentoo/pull/26441 |
85 |
|
86 |
Best wishes, |
87 |
-- |
88 |
Yiyang Wu |