1 |
Sorry to be the party crasher, but... |
2 |
|
3 |
I'd love to have optimizations for everything out there, but it takes |
4 |
a lot of work to fine tune for something specific. |
5 |
|
6 |
Right now I see a few variants of ARMv8 |
7 |
------------ |
8 |
ARM reference stuff - A57 cores and the newer bits.. The scheduling |
9 |
and stuff seems more-or-less similar enough that one tuning could |
10 |
probably work for the vast majority of these parts. |
11 |
|
12 |
Cavium ThunderX - It's ground up and quite different from the ARM |
13 |
reference stuff under the hood |
14 |
|
15 |
APM - Mustang, again ground up and different. I don't have enough |
16 |
hands on to know how different from reference. |
17 |
|
18 |
Broadcom - Coming Soon(tm) - Again no hands on or any data, but |
19 |
certainly very interesting.. |
20 |
|
21 |
... now add in every variant of ground up implementation and you have |
22 |
50 shades of gray.. |
23 |
------------- |
24 |
Soo.. depending on your target hardware, you may be better off with |
25 |
gcc if the end goal is general all-around performance. (It does a |
26 |
quite respectable job of being generic) I realize a lot of people have |
27 |
strong feelings for or against it. I leave that to the reader to |
28 |
decide.. |
29 |
|
30 |
Back to my own glass house.. It will take a few years, but I am trying |
31 |
to make it easier (internally) to expose in some clear way all the |
32 |
pieces which compose a fine tuning per-processor. If this was "just" |
33 |
scheduling models it would be really easy, but it's not.. Those |
34 |
latencies and other magic bits decide things like.. "should I unroll |
35 |
this loop or do something else" and then you venture into the land of |
36 |
accelerators where a custom regalloc may be what you really need and |
37 |
*nothing* off the shelf fits to meet your goals.. (projects like that |
38 |
can take 9 months and in the end only give a general 1-5% median |
39 |
performance gain..) |
40 |
-------------- |
41 |
|
42 |
|
43 |
On Sat, Aug 20, 2016 at 2:02 AM, james <garftd@×××××××.net> wrote: |
44 |
> On 08/19/2016 11:15 AM, C Bergström wrote: |
45 |
>> |
46 |
>> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_zero@g.o> wrote: |
47 |
>>> |
48 |
>>> BTW is pathscale ready to be used as system compiler as well? |
49 |
>> |
50 |
>> |
51 |
>> I wish, but no. We have known issues when building grub2, glibc and |
52 |
>> the Linux kernel at the very least. Someone* did report a long time |
53 |
>> ago that with their unofficial port, were able to build/boot the |
54 |
>> NetBSD kernel. |
55 |
>> (*A community dev we trusted with our sources and was helping us with |
56 |
>> portability across platforms) |
57 |
>> |
58 |
>> The stuff with grub2 may potentially be fixed in the "near" future... |
59 |
>> the others are more tricky. In general if clang can do it, we have a |
60 |
>> strong chance as well. |
61 |
>> |
62 |
>> As a philosophy - "we" aren't really trying to be the best generic |
63 |
>> compiler in the world. We aim more on optimizing as much for known |
64 |
>> targets. So if by system you mean, a compiler that would produce an |
65 |
>> "OS" which only runs on a single class of hardware, then yeah it could |
66 |
>> work at some point in the future. Specifically, on x86 we default on |
67 |
>> host CPU optimizations. So on newer Intel hardware it's easy to get a |
68 |
>> binary that won't run on AMD or older 64bit Intel. |
69 |
>> |
70 |
>> More recently on ARMv8 - we turn on processor specific tuning. So |
71 |
>> while it may "run", the difference between APM's mustang and Cavium |
72 |
>> ThunderX is pretty big and running binaries intended for A and ran on |
73 |
>> B would certainly take a hit.. (this is just the tip of the iceberg) |
74 |
>> |
75 |
>> For general scalar OS code it isn't likely to matter... the real |
76 |
>> impact being like 1-10% difference (being very general.. it could be |
77 |
>> less or more in the real world..) |
78 |
>> |
79 |
>> For HPC codes or anything where you get loops or computationally |
80 |
>> complex - the gloves are off and I could see big differences... (again |
81 |
>> being general and maybe a bit dramatic for fun) |
82 |
> |
83 |
> |
84 |
> |
85 |
> OK (actually fantastic!). Looking at the pathscale site pages and github, |
86 |
> perhaps a cheap arm embedded board where llvm is the centerpiece of |
87 |
> compiling a minimal system to entice gentoo-llvm testers, would be possible |
88 |
> in the near future?. I have a 96boards, HiKey arm64v8 that I could dedicate |
89 |
> to gentoo+armv8-llvm testing, if that'd help. [1] |
90 |
> |
91 |
> Perhaps a baseline bootstrap iso (or such) version targeted at |
92 |
> llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or |
93 |
> lilo or (?), since there seems to be issues with llvm-grub2. |
94 |
> |
95 |
> |
96 |
> [1] http://dev.gentoo.org/~tgall/ |
97 |
> |
98 |
> |
99 |
> No matter how you slice it, from someone who is focused on building |
100 |
> minimized and embedded (bare metal) systems that are customized and |
101 |
> coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful |
102 |
> news. Finally a vendor in the cluster space, with some vision and |
103 |
> common-sense, imho. Heterogeneous and open HPC is where is at, imho. If |
104 |
> there is a forum where the community and pathscale folks discuss issues, |
105 |
> point that out as I could not find one for deeper reading.... |
106 |
> |
107 |
> |
108 |
> hth, |
109 |
> James |
110 |
> |