1 |
Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes: |
2 |
|
3 |
|
4 |
> >>>>>> http://www.agner.org/optimize/calling_conventions.pdf |
5 |
> >>>>> |
6 |
> >>>>> Not sure what you're trying to say. |
7 |
> >>>>> |
8 |
> >>>> |
9 |
> >>>> that simd is not save in kernel if not carefully guarded. |
10 |
> >>>> |
11 |
> >>>> Really people, just don't fuck around with the cflags. |
12 |
> >>> |
13 |
> >>> I still fail to see the relevance. Unless you mean using a different |
14 |
> >>> -O level. In that case, yes. You shouldn't. But I was talking about |
15 |
> >>> -march. |
16 |
> >>> |
17 |
> >> |
18 |
> >> you said this |
19 |
> >> |
20 |
> >>> |
21 |
> >>> (note that SIMD is not FP and is perfectly fine in the kernel.) |
22 |
> >> |
23 |
> >> and I have shown you that you are wrong. |
24 |
> > |
25 |
> > Not sure why you think that. The kernel crypto routines are full of |
26 |
> > SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work. |
27 |
> > But -march is not going to introduce that |
28 |
> |
29 |
> and never used in interrupt context and carefully guarded. You act like |
30 |
> 'oh, you can use simd instructions without any consideration' and that |
31 |
> is just not true. |
32 |
|
33 |
|
34 |
Volker, |
35 |
Historically, you are correct. Looking forward, GCC-5.x will (can?) change |
36 |
this as the simd and other hardware, including (DDR_5) memory all become |
37 |
available for (compiler) usage. For the longest time, we the FOSS |
38 |
communities, have at best been given access to low lever APIs for access to |
39 |
some of these hardware resources. All processor architectures are at war. |
40 |
Intel (the bastards) have had FPGA and tools to reconfigure the amount and |
41 |
types of hardwware in some of their processors for quite some time. |
42 |
|
43 |
The Arm64 cores have simd (GPU if you like) centric cores on the same SOC as |
44 |
the arm64 bit licensed CPU cores. The new gpu has already been integrated |
45 |
into the processor cores (same substrate) just the the i387 FPU was some |
46 |
decades ago. So Arm is providing 'bare metal' access to various customers |
47 |
and compilers Since there are thousands of vendors building up customer |
48 |
arm64 SOCs there is no way for Arm to constrict, like Intel, Nvidia and AMD |
49 |
have historically done. Game_set_match. |
50 |
|
51 |
Even though those GPU cores available via arm64 are very weak compared to |
52 |
Nvidia and AMD; bare metal access to those (gpu) resources if far superior |
53 |
to what Intel (dragging their feet), Nvidia or AMD are offering. Just look |
54 |
at how AMD's Mantle has stalled for the FOSS communities. Amd, via |
55 |
competition from a myriad of arm SOC vendors, is being forced to roll out |
56 |
Arm64 bit server chips, just to stay relevant. Both of you guys are looking |
57 |
at this issue, from historically color-coded sunglasses. Change is here; get |
58 |
onboard with helping the masses help themselves to the feeding (coding) freenzy. |
59 |
|
60 |
|
61 |
What a pair of really smart guys like you (2) should be doing is setting up |
62 |
a gentoo wiki listing and demonstrating for others how to "profile" low |
63 |
level codes: both kernel and system level, so these other gentoo folks *can |
64 |
learn* about what you are saying by example; running tools such as |
65 |
kernelshark, and other performance/profiling types of analysis. Providing |
66 |
seemless and generic access to the gpu resources will go a long way towards |
67 |
revitalizing FOSS cryptographic dominance; and that is a very good thing. ymmv. |
68 |
|
69 |
|
70 |
For the record, most simd hardware really sucks for dense_matrix |
71 |
requirements. Most simd hardware only really works for sparse matrix |
72 |
apps, like x.264 because the overlying (embedded) algorithms used are poorly |
73 |
documented by intention from the hardware vendors. I do not have direct |
74 |
proof; but I strongly suspect this is the case because the simd pipelined |
75 |
memory that these low level APIs give to FOSS community, are memory |
76 |
constricted by design. |
77 |
|
78 |
|
79 |
peace, |
80 |
James |