Re: [gentoo-dev] New project: LLVM - gentoo-dev

From:	james <garftd@×××××××.net>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] New project: LLVM
Date:	Fri, 19 Aug 2016 20:53:07
Message-Id:	`905c6752-af7d-f19e-2698-b31f2ee5f89d@verizon.net`
In Reply to:	Re: [gentoo-dev] New project: LLVM by "C Bergström"

1

On 08/19/2016 02:20 PM, C Bergström wrote:

2

> Sorry to be the party crasher, but...

3

>

4

> I'd love to have optimizations for everything out there, but it takes

5

> a lot of work to fine tune for something specific.

6

7

Agreed. Right now on Armv8 alone, there are dozens of teams working on 

8

the identical concepts presented in this thread. Most are also targeting 

9

specific domains. At some point there with pathways, just like in 

10

Computational Chemistry, where the optimization pathway for new silicon 

11

is fast and previous work helps tremendously. That is, you are not alone 

12

in your quests, far, far from it.

13

14

15

> Right now I see a few variants of ARMv8

16

> ------------

17

> ARM reference stuff - A57 cores and the newer bits.. The scheduling

18

> and stuff seems more-or-less similar enough that one tuning could

19

> probably work for the vast majority of these parts.

20

>

21

> Cavium ThunderX - It's ground up and quite different from the ARM

22

> reference stuff under the hood

23

>

24

> APM - Mustang, again ground up and different. I don't have enough

25

> hands on to know how different from reference.

26

>

27

> Broadcom - Coming Soon(tm) - Again no hands on or any data, but

28

> certainly very interesting..

29

>

30

> ... now add in every variant of ground up implementation and you have

31

> 50 shades of gray..

32

33

And billions of dollars financing those efforts in parallel. It's an 

34

arms race, (like the pun?). Wonder why a Japanese conglomerate offered 

35

to purchase ARM ltd. for such a large figure? Wonder why intel has arm 

36

licenses now?  Your group might only be able to focus on a few ARM 

37

offerings, but there are dozens and dozens of ARM teams alone that would 

38

dispute your arithmetic above.

39

40

> -------------

41

> Soo.. depending on your target hardware, you may be better off with

42

> gcc if the end goal is general all-around performance. (It does a

43

> quite respectable job of being generic) I realize a lot of people have

44

> strong feelings for or against it. I leave that to the reader to

45

> decide..

46

47

You misconstrue concepts. Nobody, especially me, implies that one 

48

pathway (to a Unikernel [1] if you like) suites all near-optimized 

49

solutions. That would be pointless. What you allude to, already exists 

50

in some of the more progressive data/cloud vendor clouds. We are talking 

51

about a unikernel for different classes of problems, across arm8 and 

52

x86-64 and GPU architectures, not thousands of (arch) processor 

53

variants. However, those other processor (arch) variants and the folks 

54

that earn a living off of those variants, are not sitting back idle, either.

55

56

57

> Back to my own glass house.. It will take a few years, but I am trying

58

> to make it easier (internally) to expose in some clear way all the

59

> pieces which compose a fine tuning per-processor. If this was "just"

60

> scheduling models it would be really easy, but it's not.. Those

61

> latencies and other magic bits decide things like.. "should I unroll

62

> this loop or do something else" and then you venture into the land of

63

> accelerators where a custom regalloc may be what you really need and

64

> *nothing* off the shelf fits to meet your goals.. (projects like that

65

> can take 9 months and in the end only give a general 1-5% median

66

> performance gain..)

67

68

If this is your mantra, I resend the generous comments. Cray use to work 

69

that way, milking the Petroleum Industry for tons of money, but, things 

70

have changed and the change is accelerating, rapidly. Perhaps too much 

71

off those Cray patents that your company owns are leaking toxins into 

72

the brain-trust where you park?

73

74

Vendor walk-back is sad, imho. ymmv.

75

76

Best of luck to your company's  5-year plan....

77

78

79

[2] http://unikernel.org/

80

81

hth,

82

James

83

84

85

> --------------

86

>

87

>

88

> On Sat, Aug 20, 2016 at 2:02 AM, james <garftd@×××××××.net> wrote:

89

>> On 08/19/2016 11:15 AM, C Bergström wrote:

90

>>>

91

>>> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_zero@g.o> wrote:

92

>>>>

93

>>>> BTW is pathscale ready to be used as system compiler as well?

94

>>>

95

>>>

96

>>> I wish, but no. We have known issues when building grub2, glibc and

97

>>> the Linux kernel at the very least. Someone* did report a long time

98

>>> ago that with their unofficial port, were able to build/boot the

99

>>> NetBSD kernel.

100

>>> (*A community dev we trusted with our sources and was helping us with

101

>>> portability across platforms)

102

>>>

103

>>> The stuff with grub2 may potentially be fixed in the "near" future...

104

>>> the others are more tricky. In general if clang can do it, we have a

105

>>> strong chance as well.

106

>>>

107

>>> As a philosophy - "we" aren't really trying to be the best generic

108

>>> compiler in the world. We aim more on optimizing as much for known

109

>>> targets. So if by system you mean, a compiler that would produce an

110

>>> "OS" which only runs on a single class of hardware, then yeah it could

111

>>> work at some point in the future. Specifically, on x86 we default on

112

>>> host CPU optimizations. So on newer Intel hardware it's easy to get a

113

>>> binary that won't run on AMD or older 64bit Intel.

114

>>>

115

>>> More recently on ARMv8 - we turn on processor specific tuning. So

116

>>> while it may "run", the difference between APM's mustang and Cavium

117

>>> ThunderX is pretty big and running binaries intended for A and ran on

118

>>> B would certainly take a hit.. (this is just the tip of the iceberg)

119

>>>

120

>>> For general scalar OS code it isn't likely to matter... the real

121

>>> impact being like 1-10% difference (being very general.. it could be

122

>>> less or more in the real world..)

123

>>>

124

>>> For HPC codes or anything where you get loops or computationally

125

>>> complex - the gloves are off and I could see big differences... (again

126

>>> being general and maybe a bit dramatic for fun)

127

>>

128

>>

129

>>

130

>> OK (actually fantastic!). Looking at the pathscale site pages and github,

131

>> perhaps a cheap arm embedded board where llvm is the centerpiece of

132

>> compiling a minimal system to entice gentoo-llvm testers, would be possible

133

>> in the near future?. I have a 96boards, HiKey arm64v8  that I could dedicate

134

>> to gentoo+armv8-llvm testing, if that'd help. [1]

135

>>

136

>> Perhaps a  baseline bootstrap iso (or such) version  targeted at

137

>> llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or

138

>> lilo or (?), since there seems to be issues with llvm-grub2.

139

>>

140

>>

141

>> [1] http://dev.gentoo.org/~tgall/

142

>>

143

>>

144

>> No matter how you slice it, from someone who is focused on building

145

>> minimized and embedded (bare metal) systems that are customized and

146

>> coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful

147

>> news. Finally a vendor in the cluster space, with some vision and

148

>> common-sense, imho. Heterogeneous and open  HPC is where is at, imho. If

149

>> there is a forum where the community and pathscale folks discuss issues,

150

>> point that out as I could not find one for deeper reading....

151

>>

152

>>

153

>> hth,

154

>> James

155

>>

156

>

157

>

Gentoo Archives: gentoo-dev

Replies

1	On 08/19/2016 02:20 PM, C Bergström wrote:
2	> Sorry to be the party crasher, but...
3	>
4	> I'd love to have optimizations for everything out there, but it takes
5	> a lot of work to fine tune for something specific.
6
7	Agreed. Right now on Armv8 alone, there are dozens of teams working on
8	the identical concepts presented in this thread. Most are also targeting
9	specific domains. At some point there with pathways, just like in
10	Computational Chemistry, where the optimization pathway for new silicon
11	is fast and previous work helps tremendously. That is, you are not alone
12	in your quests, far, far from it.
13
14
15	> Right now I see a few variants of ARMv8
16	> ------------
17	> ARM reference stuff - A57 cores and the newer bits.. The scheduling
18	> and stuff seems more-or-less similar enough that one tuning could
19	> probably work for the vast majority of these parts.
20	>
21	> Cavium ThunderX - It's ground up and quite different from the ARM
22	> reference stuff under the hood
23	>
24	> APM - Mustang, again ground up and different. I don't have enough
25	> hands on to know how different from reference.
26	>
27	> Broadcom - Coming Soon(tm) - Again no hands on or any data, but
28	> certainly very interesting..
29	>
30	> ... now add in every variant of ground up implementation and you have
31	> 50 shades of gray..
32
33	And billions of dollars financing those efforts in parallel. It's an
34	arms race, (like the pun?). Wonder why a Japanese conglomerate offered
35	to purchase ARM ltd. for such a large figure? Wonder why intel has arm
36	licenses now? Your group might only be able to focus on a few ARM
37	offerings, but there are dozens and dozens of ARM teams alone that would
38	dispute your arithmetic above.
39
40	> -------------
41	> Soo.. depending on your target hardware, you may be better off with
42	> gcc if the end goal is general all-around performance. (It does a
43	> quite respectable job of being generic) I realize a lot of people have
44	> strong feelings for or against it. I leave that to the reader to
45	> decide..
46
47	You misconstrue concepts. Nobody, especially me, implies that one
48	pathway (to a Unikernel [1] if you like) suites all near-optimized
49	solutions. That would be pointless. What you allude to, already exists
50	in some of the more progressive data/cloud vendor clouds. We are talking
51	about a unikernel for different classes of problems, across arm8 and
52	x86-64 and GPU architectures, not thousands of (arch) processor
53	variants. However, those other processor (arch) variants and the folks
54	that earn a living off of those variants, are not sitting back idle, either.
55
56
57	> Back to my own glass house.. It will take a few years, but I am trying
58	> to make it easier (internally) to expose in some clear way all the
59	> pieces which compose a fine tuning per-processor. If this was "just"
60	> scheduling models it would be really easy, but it's not.. Those
61	> latencies and other magic bits decide things like.. "should I unroll
62	> this loop or do something else" and then you venture into the land of
63	> accelerators where a custom regalloc may be what you really need and
64	> nothing off the shelf fits to meet your goals.. (projects like that
65	> can take 9 months and in the end only give a general 1-5% median
66	> performance gain..)
67
68	If this is your mantra, I resend the generous comments. Cray use to work
69	that way, milking the Petroleum Industry for tons of money, but, things
70	have changed and the change is accelerating, rapidly. Perhaps too much
71	off those Cray patents that your company owns are leaking toxins into
72	the brain-trust where you park?
73
74	Vendor walk-back is sad, imho. ymmv.
75
76	Best of luck to your company's 5-year plan....
77
78
79	[2] http://unikernel.org/
80
81	hth,
82	James
83
84
85	> --------------
86	>
87	>
88	> On Sat, Aug 20, 2016 at 2:02 AM, james <garftd@×××××××.net> wrote:
89	>> On 08/19/2016 11:15 AM, C Bergström wrote:
90	>>>
91	>>> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_zero@g.o> wrote:
92	>>>>
93	>>>> BTW is pathscale ready to be used as system compiler as well?
94	>>>
95	>>>
96	>>> I wish, but no. We have known issues when building grub2, glibc and
97	>>> the Linux kernel at the very least. Someone* did report a long time
98	>>> ago that with their unofficial port, were able to build/boot the
99	>>> NetBSD kernel.
100	>>> (*A community dev we trusted with our sources and was helping us with
101	>>> portability across platforms)
102	>>>
103	>>> The stuff with grub2 may potentially be fixed in the "near" future...
104	>>> the others are more tricky. In general if clang can do it, we have a
105	>>> strong chance as well.
106	>>>
107	>>> As a philosophy - "we" aren't really trying to be the best generic
108	>>> compiler in the world. We aim more on optimizing as much for known
109	>>> targets. So if by system you mean, a compiler that would produce an
110	>>> "OS" which only runs on a single class of hardware, then yeah it could
111	>>> work at some point in the future. Specifically, on x86 we default on
112	>>> host CPU optimizations. So on newer Intel hardware it's easy to get a
113	>>> binary that won't run on AMD or older 64bit Intel.
114	>>>
115	>>> More recently on ARMv8 - we turn on processor specific tuning. So
116	>>> while it may "run", the difference between APM's mustang and Cavium
117	>>> ThunderX is pretty big and running binaries intended for A and ran on
118	>>> B would certainly take a hit.. (this is just the tip of the iceberg)
119	>>>
120	>>> For general scalar OS code it isn't likely to matter... the real
121	>>> impact being like 1-10% difference (being very general.. it could be
122	>>> less or more in the real world..)
123	>>>
124	>>> For HPC codes or anything where you get loops or computationally
125	>>> complex - the gloves are off and I could see big differences... (again
126	>>> being general and maybe a bit dramatic for fun)
127	>>
128	>>
129	>>
130	>> OK (actually fantastic!). Looking at the pathscale site pages and github,
131	>> perhaps a cheap arm embedded board where llvm is the centerpiece of
132	>> compiling a minimal system to entice gentoo-llvm testers, would be possible
133	>> in the near future?. I have a 96boards, HiKey arm64v8 that I could dedicate
134	>> to gentoo+armv8-llvm testing, if that'd help. [1]
135	>>
136	>> Perhaps a baseline bootstrap iso (or such) version targeted at
137	>> llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or
138	>> lilo or (?), since there seems to be issues with llvm-grub2.
139	>>
140	>>
141	>> [1] http://dev.gentoo.org/~tgall/
142	>>
143	>>
144	>> No matter how you slice it, from someone who is focused on building
145	>> minimized and embedded (bare metal) systems that are customized and
146	>> coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful
147	>> news. Finally a vendor in the cluster space, with some vision and
148	>> common-sense, imho. Heterogeneous and open HPC is where is at, imho. If
149	>> there is a forum where the community and pathscale folks discuss issues,
150	>> point that out as I could not find one for deeper reading....
151	>>
152	>>
153	>> hth,
154	>> James
155	>>
156	>
157	>