Re: [gentoo-soc] Week 4 Report for Refining ROCm Packages in Gentoo - gentoo-soc

From:	Benda Xu <heroxbd@g.o>
To:	gentoo-soc <gentoo-soc@l.g.o>
Subject:	Re: [gentoo-soc] Week 4 Report for Refining ROCm Packages in Gentoo
Date:	Tue, 12 Jul 2022 10:23:08
Message-Id:	`8735f6635n.fsf@gentoo.org`
In Reply to:	[gentoo-soc] Week 4 Report for Refining ROCm Packages in Gentoo by wuyy

1

wuyy <xgreenlandforwyy@×××××.com> writes:

2

3

> Bug fixes cover rocBLAS and rocFFT. For rocBLAS, I backported a patch to

4

> sci-libs/rocBLAS-5.0.2-r1 and dev-util/Tensile-r1, to pass `-j N` from

5

> ${MAKEOPTS} to TensileCreateLibrary when building rocBLAS, which fixed [1]. As

6

> of rocFFT, I corrected its BDEPEND [2], added missing sys-libs/omp for omp.h

7

> [3], and let it depend on dev-util/rocm-cmake-5.0.2-r1 which does not install

8

> files to unexpected paths [4]. However, as the gcc-12.1.0 lands, bugs about

9

> clang expanding __noinline__ macro in g++-v12/bits/shared_ptr_base.h emeregs

10

> [5,6]. Details can be seen on [5], and I'm working on resolving this (see PR

11

> [7]).

12

13

Good.  Nice progress!

14

15

> For rocm.eclass, I finished the draft for three major functions: USE_EXPAND,

16

> src_configure and src_test. I also wrote get_amdgpu_flags function used by

17

> src_configure. My latest work on rocm.eclass is located at

18

> https://github.com/littlewu2508/gentoo/blob/rocm-5.1.3/eclass/rocm.eclass. Below

19

> are its status and my questions I'd like to share:

20

>

21

> 1. Default architectures. Now I implement the USE_EXPAND of AMDGPU_TARGETS, I

22

> need to specify the default value of each use. The straightforward way is to

23

> enable all targets by default, but that can be **extremely** slow and

24

> disk-hungry when compiling ROCm libraries such as rocBLAS or rocFFT (expect to

25

> compile for several hours if the CPU is not powerful enough). Currently I

26

> defined a variable OFFICIAL_AMDGPU_TARGETS, which is referenced from ROCm

27

> installation documents [8]. Although the support range is much larger, and

28

> different components have their own support matrices, AMD promise to fully

29

> support these enterprise cards. For enterprise users, they can just emerge ROCm

30

> packages without setting specific use flag, and have out-of-box experience on

31

> Gentoo. For users with consumer end cards, they can read the wiki page (covered

32

> later in my GSoC project) and seek instructions to set the correct use

33

> flag.

34

35

Fine.

36

37

> 2. Whether setting -DSKIP_RPATH=true in mycmakeargs. Previously this is set to

38

> avoid including rpath if USE=benchmark when building ROCm packages like

39

> sci-libs/roc-* and sci-libs/hip-*. The test and benchmark executables are named

40

> "clients" (take rocBLAS as example, clients are programs that uses functions and

41

> link librocblas.so). In order to run tests and benchmarks before install

42

> libraries to system, rpath is set on these executables, but gentoo does not have

43

> a src_benchmark phase, so the benchmark binaries is just installed, and user can

44

> run it afterwards (actually I use it in my research to tune algorithms). So

45

> there should not be rpath in benchmark binaries, and this is achieved by setting

46

> -DSKIP_RPATH=true. However, after this, test program cannot execute because

47

> rpath is also eliminated, so I have to specify LD_LIBRARY_PATH in src_test

48

> manually.

49

50

Go for it. This is exactly LD_LIBRARY_PATH is designed for: test of

51

libraries installed in non-standard locations.

52

53

> [...]

54

>

55

> 3. Detect AMDGPU in src_test. This blocks https://bugs.gentoo.org/817440, and I

56

> also raise questions in the bug report. Tinderbox cannot run tests on ROCm

57

> packages like rocBLAS, because there is no AMDGPU available. I implement the

58

> detection mechanism, with one problem left: if no GPU available, fail the test

59

> or exit normally?

60

61

Just follow the CI runner's advice at https://bugs.gentoo.org/817440#c4

62

63

,----

64

| Agostino Sarubbo gentoo-dev 2021-10-11 07:16:42 UTC

65

|

66

| (In reply to Benda Xu from comment #0)

67

| > Tinderbox does not have the hardware for GPGPU.  The ROCm GPGPU ebuilds

68

| > unconditionally fail.

69

|

70

| Is there a way for the ebuild to die if the hw does not meet the requisites?

71

`----

72

73

> Personally, I think the best solution is to detect AMDGPU during

74

> pretend or setup phase,

75

76

Just die otherwise.

77

78

> turn off the test USE flag if no GPU available, or the architecture

79

> compiled does not match the detected GPU. But is operating USE flag

80

> inside ebuild phase functions possible?

81

82

No, don't modify USE flags at ebuild runtime.

83

84

> Despite these issues I managed a working version of rocm.eclass, and used it on

85

> rocBLAS. The use expand works successful, while src_test can properly detect

86

> hardware and execute in both sandboxed vanilla Gentoo, and non-sandboxed Gentoo

87

> prefix. There are still things to work on rocm.eclass:

88

>

89

> 1. ROCM_USEDEP, similar to PYTHON_USEDEP. For example, hipBLAS uses

90

> architectures gfx906 and gfx1030, then its dependency, rocBLAS, must contains

91

> gfx906 and gfx1030.  2. SRC_URI.  3. A way to automatically add PORTAGE_USERNAME

92

> to render group, to access amdgpu and perform src_test. I don't have any clue on

93

> this yet, maybe meta package in acct-group can do this?

94

95

I have no idea.

96

97

> In the coming week I'll finish rocm.eclass as planed, and send out for early

98

> review. Meanwhile I'll continue fixing bugs [5,6,9,10], answering questions

99

> about enabling rocm in packages [11,12], and prepare to land ROCm-5.1.3. One of

100

> my friend is also plugging Radeon VII on there arm64 server, and if everything

101

> goes well I can try ROCm on arm64 (in kernel document, the GPGPU driver, amdkfd,

102

> support amd64, arm64 and ppc64), and add the ~arm64 KEYWORD in the future.

103

104

Cheers,

105

Benda

1	wuyy <xgreenlandforwyy@×××××.com> writes:
2
3	> Bug fixes cover rocBLAS and rocFFT. For rocBLAS, I backported a patch to
4	> sci-libs/rocBLAS-5.0.2-r1 and dev-util/Tensile-r1, to pass `-j N` from
5	> ${MAKEOPTS} to TensileCreateLibrary when building rocBLAS, which fixed [1]. As
6	> of rocFFT, I corrected its BDEPEND [2], added missing sys-libs/omp for omp.h
7	> [3], and let it depend on dev-util/rocm-cmake-5.0.2-r1 which does not install
8	> files to unexpected paths [4]. However, as the gcc-12.1.0 lands, bugs about
9	> clang expanding __noinline__ macro in g++-v12/bits/shared_ptr_base.h emeregs
10	> [5,6]. Details can be seen on [5], and I'm working on resolving this (see PR
11	> [7]).
12
13	Good. Nice progress!
14
15	> For rocm.eclass, I finished the draft for three major functions: USE_EXPAND,
16	> src_configure and src_test. I also wrote get_amdgpu_flags function used by
17	> src_configure. My latest work on rocm.eclass is located at
18	> https://github.com/littlewu2508/gentoo/blob/rocm-5.1.3/eclass/rocm.eclass. Below
19	> are its status and my questions I'd like to share:
20	>
21	> 1. Default architectures. Now I implement the USE_EXPAND of AMDGPU_TARGETS, I
22	> need to specify the default value of each use. The straightforward way is to
23	> enable all targets by default, but that can be extremely slow and
24	> disk-hungry when compiling ROCm libraries such as rocBLAS or rocFFT (expect to
25	> compile for several hours if the CPU is not powerful enough). Currently I
26	> defined a variable OFFICIAL_AMDGPU_TARGETS, which is referenced from ROCm
27	> installation documents [8]. Although the support range is much larger, and
28	> different components have their own support matrices, AMD promise to fully
29	> support these enterprise cards. For enterprise users, they can just emerge ROCm
30	> packages without setting specific use flag, and have out-of-box experience on
31	> Gentoo. For users with consumer end cards, they can read the wiki page (covered
32	> later in my GSoC project) and seek instructions to set the correct use
33	> flag.
34
35	Fine.
36
37	> 2. Whether setting -DSKIP_RPATH=true in mycmakeargs. Previously this is set to
38	> avoid including rpath if USE=benchmark when building ROCm packages like
39	> sci-libs/roc-* and sci-libs/hip-*. The test and benchmark executables are named
40	> "clients" (take rocBLAS as example, clients are programs that uses functions and
41	> link librocblas.so). In order to run tests and benchmarks before install
42	> libraries to system, rpath is set on these executables, but gentoo does not have
43	> a src_benchmark phase, so the benchmark binaries is just installed, and user can
44	> run it afterwards (actually I use it in my research to tune algorithms). So
45	> there should not be rpath in benchmark binaries, and this is achieved by setting
46	> -DSKIP_RPATH=true. However, after this, test program cannot execute because
47	> rpath is also eliminated, so I have to specify LD_LIBRARY_PATH in src_test
48	> manually.
49
50	Go for it. This is exactly LD_LIBRARY_PATH is designed for: test of
51	libraries installed in non-standard locations.
52
53	> [...]
54	>
55	> 3. Detect AMDGPU in src_test. This blocks https://bugs.gentoo.org/817440, and I
56	> also raise questions in the bug report. Tinderbox cannot run tests on ROCm
57	> packages like rocBLAS, because there is no AMDGPU available. I implement the
58	> detection mechanism, with one problem left: if no GPU available, fail the test
59	> or exit normally?
60
61	Just follow the CI runner's advice at https://bugs.gentoo.org/817440#c4
62
63	,----
64	\| Agostino Sarubbo gentoo-dev 2021-10-11 07:16:42 UTC
65	\|
66	\| (In reply to Benda Xu from comment #0)
67	\| > Tinderbox does not have the hardware for GPGPU. The ROCm GPGPU ebuilds
68	\| > unconditionally fail.
69	\|
70	\| Is there a way for the ebuild to die if the hw does not meet the requisites?
71	`----
72
73	> Personally, I think the best solution is to detect AMDGPU during
74	> pretend or setup phase,
75
76	Just die otherwise.
77
78	> turn off the test USE flag if no GPU available, or the architecture
79	> compiled does not match the detected GPU. But is operating USE flag
80	> inside ebuild phase functions possible?
81
82	No, don't modify USE flags at ebuild runtime.
83
84	> Despite these issues I managed a working version of rocm.eclass, and used it on
85	> rocBLAS. The use expand works successful, while src_test can properly detect
86	> hardware and execute in both sandboxed vanilla Gentoo, and non-sandboxed Gentoo
87	> prefix. There are still things to work on rocm.eclass:
88	>
89	> 1. ROCM_USEDEP, similar to PYTHON_USEDEP. For example, hipBLAS uses
90	> architectures gfx906 and gfx1030, then its dependency, rocBLAS, must contains
91	> gfx906 and gfx1030. 2. SRC_URI. 3. A way to automatically add PORTAGE_USERNAME
92	> to render group, to access amdgpu and perform src_test. I don't have any clue on
93	> this yet, maybe meta package in acct-group can do this?
94
95	I have no idea.
96
97	> In the coming week I'll finish rocm.eclass as planed, and send out for early
98	> review. Meanwhile I'll continue fixing bugs [5,6,9,10], answering questions
99	> about enabling rocm in packages [11,12], and prepare to land ROCm-5.1.3. One of
100	> my friend is also plugging Radeon VII on there arm64 server, and if everything
101	> goes well I can try ROCm on arm64 (in kernel document, the GPGPU driver, amdkfd,
102	> support amd64, arm64 and ppc64), and add the ~arm64 KEYWORD in the future.
103
104	Cheers,
105	Benda

Gentoo Archives: gentoo-soc