Gentoo Archives: gentoo-user

From: James <wireless@×××××××××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: CFLAGs for kernel compilation
Date: Sat, 02 May 2015 18:35:16
Message-Id: loom.20150502T200340-93@post.gmane.org
In Reply to: Re: [gentoo-user] Re: CFLAGs for kernel compilation by Volker Armin Hemmann
1 Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
2
3
4 > >>>>>> http://www.agner.org/optimize/calling_conventions.pdf
5 > >>>>>
6 > >>>>> Not sure what you're trying to say.
7 > >>>>>
8 > >>>>
9 > >>>> that simd is not save in kernel if not carefully guarded.
10 > >>>>
11 > >>>> Really people, just don't fuck around with the cflags.
12 > >>>
13 > >>> I still fail to see the relevance. Unless you mean using a different
14 > >>> -O level. In that case, yes. You shouldn't. But I was talking about
15 > >>> -march.
16 > >>>
17 > >>
18 > >> you said this
19 > >>
20 > >>>
21 > >>> (note that SIMD is not FP and is perfectly fine in the kernel.)
22 > >>
23 > >> and I have shown you that you are wrong.
24 > >
25 > > Not sure why you think that. The kernel crypto routines are full of
26 > > SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work.
27 > > But -march is not going to introduce that
28 >
29 > and never used in interrupt context and carefully guarded. You act like
30 > 'oh, you can use simd instructions without any consideration' and that
31 > is just not true.
32
33
34 Volker,
35 Historically, you are correct. Looking forward, GCC-5.x will (can?) change
36 this as the simd and other hardware, including (DDR_5) memory all become
37 available for (compiler) usage. For the longest time, we the FOSS
38 communities, have at best been given access to low lever APIs for access to
39 some of these hardware resources. All processor architectures are at war.
40 Intel (the bastards) have had FPGA and tools to reconfigure the amount and
41 types of hardwware in some of their processors for quite some time.
42
43 The Arm64 cores have simd (GPU if you like) centric cores on the same SOC as
44 the arm64 bit licensed CPU cores. The new gpu has already been integrated
45 into the processor cores (same substrate) just the the i387 FPU was some
46 decades ago. So Arm is providing 'bare metal' access to various customers
47 and compilers Since there are thousands of vendors building up customer
48 arm64 SOCs there is no way for Arm to constrict, like Intel, Nvidia and AMD
49 have historically done. Game_set_match.
50
51 Even though those GPU cores available via arm64 are very weak compared to
52 Nvidia and AMD; bare metal access to those (gpu) resources if far superior
53 to what Intel (dragging their feet), Nvidia or AMD are offering. Just look
54 at how AMD's Mantle has stalled for the FOSS communities. Amd, via
55 competition from a myriad of arm SOC vendors, is being forced to roll out
56 Arm64 bit server chips, just to stay relevant. Both of you guys are looking
57 at this issue, from historically color-coded sunglasses. Change is here; get
58 onboard with helping the masses help themselves to the feeding (coding) freenzy.
59
60
61 What a pair of really smart guys like you (2) should be doing is setting up
62 a gentoo wiki listing and demonstrating for others how to "profile" low
63 level codes: both kernel and system level, so these other gentoo folks *can
64 learn* about what you are saying by example; running tools such as
65 kernelshark, and other performance/profiling types of analysis. Providing
66 seemless and generic access to the gpu resources will go a long way towards
67 revitalizing FOSS cryptographic dominance; and that is a very good thing. ymmv.
68
69
70 For the record, most simd hardware really sucks for dense_matrix
71 requirements. Most simd hardware only really works for sparse matrix
72 apps, like x.264 because the overlying (embedded) algorithms used are poorly
73 documented by intention from the hardware vendors. I do not have direct
74 proof; but I strongly suspect this is the case because the simd pipelined
75 memory that these low level APIs give to FOSS community, are memory
76 constricted by design.
77
78
79 peace,
80 James