1 |
Hello all, |
2 |
|
3 |
Although this is the final week, I would like to say that it is as |
4 |
exciting as the first week. |
5 |
|
6 |
I kept polishing rocm.eclass with the help of Michał and my mentor, and |
7 |
it is now in good shape [1]. I must admit that the time to write an |
8 |
eclass for a beginner like me is much more than what I expected. In my |
9 |
proposal, I leave 4 weeks to finish it, 2-week implementation and 2-week |
10 |
polishing. In reality, I implemented within 2 weeks, but polished it for |
11 |
4 weeks. I made a lot of QA issues and was not aware, which increases |
12 |
the number of review-modify cycles. During this process, I leant a lot: |
13 |
|
14 |
1. Always re-read the eclass, especially comments and examples |
15 |
thoroughly after modification. Many times I forgot there is an example |
16 |
far from the change that should be updated because one functions changes |
17 |
its behavior. |
18 |
|
19 |
2. Read the bash manual carefully, because properly usage of features |
20 |
like bash array can greatly simplify code. |
21 |
|
22 |
3. Consider the maintenance difficulty of the eclass. I wrote a oddly |
23 |
specific `src_test`, which can cover all the cases of ROCm packages. But |
24 |
it's not worth it, because specialized code should be placed into |
25 |
ebuilds, not one eclass. So instead, I remain the most common part, |
26 |
`check_amdgpu`, and get rid of phase functions, which made the eclass |
27 |
much cleaner. |
28 |
|
29 |
I also find some bugs and their solutions. As I mentioned in week 10's |
30 |
report, I observed many test failures in sci-libs/miopen based on |
31 |
vanilla clang. In this week, I figured out that they have 3 different |
32 |
reasons, and I've provided the two fixes for two failures ([2, 3]). The |
33 |
third issue, I've found it's root cause [4]. I believe there would be a |
34 |
simple solution to this. |
35 |
|
36 |
For gcc-12 issues, I also come to a brutal workaround [5]: undef the |
37 |
__noinline__ macro before including stdc++ headers and def it |
38 |
afterwards. I also observed that clang-15 does not fix this issue as |
39 |
expected, and provided a MWE at [6]. |
40 |
|
41 |
I'm also writing wiki pages, filling installation and developing guide. |
42 |
|
43 |
In this 12-week project, I proposed to deliver rocm.eclass, and packages |
44 |
like pytorch, tensorflow with rocm enabled. Instead, I delivered |
45 |
rocm.eclass as proposed, but migrated the ROCm toolchain to vanilla |
46 |
clang. I thought porting ROCm toolchain to vanilla clang is closer to my |
47 |
project title "Refining ROCm Packages" :-) |
48 |
|
49 |
[1] https://github.com/gentoo/gentoo/pull/26784 |
50 |
[2] https://github.com/littlewu2508/gentoo/commit/2bfae2e26a23d78b634a87ef4a0b3f0cc242dbc4 |
51 |
[3] https://github.com/littlewu2508/gentoo/commit/cd11b542aec825338ec396bce5c63bbced534e27 |
52 |
[4] https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1731 |
53 |
[5] https://github.com/littlewu2508/gentoo/commit/2a49b4db336b075f2ac1fdfbc907f828105ea7e1 |
54 |
[6] https://github.com/llvm/llvm-project/issues/57544 |
55 |
-- |
56 |
Yiyang Wu |