1 |
On 08/19/2016 02:20 PM, C Bergström wrote: |
2 |
> Sorry to be the party crasher, but... |
3 |
> |
4 |
> I'd love to have optimizations for everything out there, but it takes |
5 |
> a lot of work to fine tune for something specific. |
6 |
|
7 |
Agreed. Right now on Armv8 alone, there are dozens of teams working on |
8 |
the identical concepts presented in this thread. Most are also targeting |
9 |
specific domains. At some point there with pathways, just like in |
10 |
Computational Chemistry, where the optimization pathway for new silicon |
11 |
is fast and previous work helps tremendously. That is, you are not alone |
12 |
in your quests, far, far from it. |
13 |
|
14 |
|
15 |
> Right now I see a few variants of ARMv8 |
16 |
> ------------ |
17 |
> ARM reference stuff - A57 cores and the newer bits.. The scheduling |
18 |
> and stuff seems more-or-less similar enough that one tuning could |
19 |
> probably work for the vast majority of these parts. |
20 |
> |
21 |
> Cavium ThunderX - It's ground up and quite different from the ARM |
22 |
> reference stuff under the hood |
23 |
> |
24 |
> APM - Mustang, again ground up and different. I don't have enough |
25 |
> hands on to know how different from reference. |
26 |
> |
27 |
> Broadcom - Coming Soon(tm) - Again no hands on or any data, but |
28 |
> certainly very interesting.. |
29 |
> |
30 |
> ... now add in every variant of ground up implementation and you have |
31 |
> 50 shades of gray.. |
32 |
|
33 |
And billions of dollars financing those efforts in parallel. It's an |
34 |
arms race, (like the pun?). Wonder why a Japanese conglomerate offered |
35 |
to purchase ARM ltd. for such a large figure? Wonder why intel has arm |
36 |
licenses now? Your group might only be able to focus on a few ARM |
37 |
offerings, but there are dozens and dozens of ARM teams alone that would |
38 |
dispute your arithmetic above. |
39 |
|
40 |
> ------------- |
41 |
> Soo.. depending on your target hardware, you may be better off with |
42 |
> gcc if the end goal is general all-around performance. (It does a |
43 |
> quite respectable job of being generic) I realize a lot of people have |
44 |
> strong feelings for or against it. I leave that to the reader to |
45 |
> decide.. |
46 |
|
47 |
You misconstrue concepts. Nobody, especially me, implies that one |
48 |
pathway (to a Unikernel [1] if you like) suites all near-optimized |
49 |
solutions. That would be pointless. What you allude to, already exists |
50 |
in some of the more progressive data/cloud vendor clouds. We are talking |
51 |
about a unikernel for different classes of problems, across arm8 and |
52 |
x86-64 and GPU architectures, not thousands of (arch) processor |
53 |
variants. However, those other processor (arch) variants and the folks |
54 |
that earn a living off of those variants, are not sitting back idle, either. |
55 |
|
56 |
|
57 |
> Back to my own glass house.. It will take a few years, but I am trying |
58 |
> to make it easier (internally) to expose in some clear way all the |
59 |
> pieces which compose a fine tuning per-processor. If this was "just" |
60 |
> scheduling models it would be really easy, but it's not.. Those |
61 |
> latencies and other magic bits decide things like.. "should I unroll |
62 |
> this loop or do something else" and then you venture into the land of |
63 |
> accelerators where a custom regalloc may be what you really need and |
64 |
> *nothing* off the shelf fits to meet your goals.. (projects like that |
65 |
> can take 9 months and in the end only give a general 1-5% median |
66 |
> performance gain..) |
67 |
|
68 |
If this is your mantra, I resend the generous comments. Cray use to work |
69 |
that way, milking the Petroleum Industry for tons of money, but, things |
70 |
have changed and the change is accelerating, rapidly. Perhaps too much |
71 |
off those Cray patents that your company owns are leaking toxins into |
72 |
the brain-trust where you park? |
73 |
|
74 |
Vendor walk-back is sad, imho. ymmv. |
75 |
|
76 |
Best of luck to your company's 5-year plan.... |
77 |
|
78 |
|
79 |
[2] http://unikernel.org/ |
80 |
|
81 |
hth, |
82 |
James |
83 |
|
84 |
|
85 |
> -------------- |
86 |
> |
87 |
> |
88 |
> On Sat, Aug 20, 2016 at 2:02 AM, james <garftd@×××××××.net> wrote: |
89 |
>> On 08/19/2016 11:15 AM, C Bergström wrote: |
90 |
>>> |
91 |
>>> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_zero@g.o> wrote: |
92 |
>>>> |
93 |
>>>> BTW is pathscale ready to be used as system compiler as well? |
94 |
>>> |
95 |
>>> |
96 |
>>> I wish, but no. We have known issues when building grub2, glibc and |
97 |
>>> the Linux kernel at the very least. Someone* did report a long time |
98 |
>>> ago that with their unofficial port, were able to build/boot the |
99 |
>>> NetBSD kernel. |
100 |
>>> (*A community dev we trusted with our sources and was helping us with |
101 |
>>> portability across platforms) |
102 |
>>> |
103 |
>>> The stuff with grub2 may potentially be fixed in the "near" future... |
104 |
>>> the others are more tricky. In general if clang can do it, we have a |
105 |
>>> strong chance as well. |
106 |
>>> |
107 |
>>> As a philosophy - "we" aren't really trying to be the best generic |
108 |
>>> compiler in the world. We aim more on optimizing as much for known |
109 |
>>> targets. So if by system you mean, a compiler that would produce an |
110 |
>>> "OS" which only runs on a single class of hardware, then yeah it could |
111 |
>>> work at some point in the future. Specifically, on x86 we default on |
112 |
>>> host CPU optimizations. So on newer Intel hardware it's easy to get a |
113 |
>>> binary that won't run on AMD or older 64bit Intel. |
114 |
>>> |
115 |
>>> More recently on ARMv8 - we turn on processor specific tuning. So |
116 |
>>> while it may "run", the difference between APM's mustang and Cavium |
117 |
>>> ThunderX is pretty big and running binaries intended for A and ran on |
118 |
>>> B would certainly take a hit.. (this is just the tip of the iceberg) |
119 |
>>> |
120 |
>>> For general scalar OS code it isn't likely to matter... the real |
121 |
>>> impact being like 1-10% difference (being very general.. it could be |
122 |
>>> less or more in the real world..) |
123 |
>>> |
124 |
>>> For HPC codes or anything where you get loops or computationally |
125 |
>>> complex - the gloves are off and I could see big differences... (again |
126 |
>>> being general and maybe a bit dramatic for fun) |
127 |
>> |
128 |
>> |
129 |
>> |
130 |
>> OK (actually fantastic!). Looking at the pathscale site pages and github, |
131 |
>> perhaps a cheap arm embedded board where llvm is the centerpiece of |
132 |
>> compiling a minimal system to entice gentoo-llvm testers, would be possible |
133 |
>> in the near future?. I have a 96boards, HiKey arm64v8 that I could dedicate |
134 |
>> to gentoo+armv8-llvm testing, if that'd help. [1] |
135 |
>> |
136 |
>> Perhaps a baseline bootstrap iso (or such) version targeted at |
137 |
>> llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or |
138 |
>> lilo or (?), since there seems to be issues with llvm-grub2. |
139 |
>> |
140 |
>> |
141 |
>> [1] http://dev.gentoo.org/~tgall/ |
142 |
>> |
143 |
>> |
144 |
>> No matter how you slice it, from someone who is focused on building |
145 |
>> minimized and embedded (bare metal) systems that are customized and |
146 |
>> coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful |
147 |
>> news. Finally a vendor in the cluster space, with some vision and |
148 |
>> common-sense, imho. Heterogeneous and open HPC is where is at, imho. If |
149 |
>> there is a forum where the community and pathscale folks discuss issues, |
150 |
>> point that out as I could not find one for deeper reading.... |
151 |
>> |
152 |
>> |
153 |
>> hth, |
154 |
>> James |
155 |
>> |
156 |
> |
157 |
> |