Gentoo Archives: gentoo-soc

From: wuyy <xgreenlandforwyy@×××××.com>
To: gentoo-soc <gentoo-soc@l.g.o>
Subject: [gentoo-soc] Week 12 Report for Refining ROCm Packages in Gentoo
Date: Mon, 05 Sep 2022 14:16:37
Message-Id: YxYEwYeLJJlC8bfs@HEPwuyy
1 Hello all,
2
3 Although this is the final week, I would like to say that it is as
4 exciting as the first week.
5
6 I kept polishing rocm.eclass with the help of Michał and my mentor, and
7 it is now in good shape [1]. I must admit that the time to write an
8 eclass for a beginner like me is much more than what I expected. In my
9 proposal, I leave 4 weeks to finish it, 2-week implementation and 2-week
10 polishing. In reality, I implemented within 2 weeks, but polished it for
11 4 weeks. I made a lot of QA issues and was not aware, which increases
12 the number of review-modify cycles. During this process, I leant a lot:
13
14 1. Always re-read the eclass, especially comments and examples
15 thoroughly after modification. Many times I forgot there is an example
16 far from the change that should be updated because one functions changes
17 its behavior.
18
19 2. Read the bash manual carefully, because properly usage of features
20 like bash array can greatly simplify code.
21
22 3. Consider the maintenance difficulty of the eclass. I wrote a oddly
23 specific `src_test`, which can cover all the cases of ROCm packages. But
24 it's not worth it, because specialized code should be placed into
25 ebuilds, not one eclass. So instead, I remain the most common part,
26 `check_amdgpu`, and get rid of phase functions, which made the eclass
27 much cleaner.
28
29 I also find some bugs and their solutions. As I mentioned in week 10's
30 report, I observed many test failures in sci-libs/miopen based on
31 vanilla clang. In this week, I figured out that they have 3 different
32 reasons, and I've provided the two fixes for two failures ([2, 3]). The
33 third issue, I've found it's root cause [4]. I believe there would be a
34 simple solution to this.
35
36 For gcc-12 issues, I also come to a brutal workaround [5]: undef the
37 __noinline__ macro before including stdc++ headers and def it
38 afterwards. I also observed that clang-15 does not fix this issue as
39 expected, and provided a MWE at [6].
40
41 I'm also writing wiki pages, filling installation and developing guide.
42
43 In this 12-week project, I proposed to deliver rocm.eclass, and packages
44 like pytorch, tensorflow with rocm enabled. Instead, I delivered
45 rocm.eclass as proposed, but migrated the ROCm toolchain to vanilla
46 clang. I thought porting ROCm toolchain to vanilla clang is closer to my
47 project title "Refining ROCm Packages" :-)
48
49 [1] https://github.com/gentoo/gentoo/pull/26784
50 [2] https://github.com/littlewu2508/gentoo/commit/2bfae2e26a23d78b634a87ef4a0b3f0cc242dbc4
51 [3] https://github.com/littlewu2508/gentoo/commit/cd11b542aec825338ec396bce5c63bbced534e27
52 [4] https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1731
53 [5] https://github.com/littlewu2508/gentoo/commit/2a49b4db336b075f2ac1fdfbc907f828105ea7e1
54 [6] https://github.com/llvm/llvm-project/issues/57544
55 --
56 Yiyang Wu

Replies