1 |
Am 02.05.2015 um 07:04 schrieb Nikos Chantziaras: |
2 |
> On 01/05/15 10:44, Andrew Savchenko wrote: |
3 |
>> On Fri, 1 May 2015 05:09:51 +0000 (UTC) Martin Vaeth wrote: |
4 |
>>> Andrew Savchenko <bircoph@g.o> wrote: |
5 |
>>>> |
6 |
>>>> That's why kernel makes sure that no floating point instructions |
7 |
>>>> sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} |
8 |
>>>> flags when running make -V. |
9 |
>>> |
10 |
>>> So it should be sufficient that the kernel does not use "float" |
11 |
>>> or "double", shouldn't it? |
12 |
>> |
13 |
>> No. Optimizer paths may be very unobvious, i.e. I'll not be |
14 |
>> surprised if under some conditions vectorizer may use float |
15 |
>> instructions for int code. |
16 |
> |
17 |
> The kernel uses -O2 and several -march variants (e.g. -march=core2). |
18 |
> Several other options are used to prevent GCC from generating |
19 |
> unsuitable code. |
20 |
> |
21 |
> Specifying another -march variant does not affect the optimizer |
22 |
> though. It only affects the code generator. If you don't modify the |
23 |
> other CFLAGS and only change -march, you will not get FP instructions |
24 |
> unless you use FP in the code. |
25 |
> |
26 |
> Also, I'd be very interested to see *any* optimization that would |
27 |
> somehow transform integer code to FP code (note that SIMD is not FP |
28 |
> and is perfectly fine in the kernel.) In fact, optimizers tend to |
29 |
> transform FP into SIMD, at least on x86 (and other architectures that |
30 |
> have fast SIMD instructions.) If I inspect the generated assembly from |
31 |
> GCC or Clang, I cannot find FP anywhere, even for code using "float" |
32 |
> and "double" operations. They get converted to SIMD on modern CPUs |
33 |
> (unless you specify a compiler flag that tells it to use the FPU, for |
34 |
> example if you need 80-bit extended precision, which is supported by |
35 |
> the x86 FPU.) |
36 |
> |
37 |
> |
38 |
> |
39 |
|
40 |
http://www.agner.org/optimize/calling_conventions.pdf |
41 |
|
42 |
Device drivers under Linux |
43 |
Linux systems use lazy saving of floating point registers and vector |
44 |
registers. This means |
45 |
that these registers are not saved and restored on every task switch. |
46 |
Instead they are |
47 |
saved/restored on the first access after a task switch. This method |
48 |
saves time in case no |
49 |
more than one thread uses these registers. The lazy saving scheme is not |
50 |
supported in |
51 |
kernel mode. Any device driver that attempts to use these registers |
52 |
improperly will cause an |
53 |
exception that will probably make the system crash. A device driver that |
54 |
needs to use vector |
55 |
registers must first save these registers by calling the function |
56 |
kernel_fpu_begin() and |
57 |
restore the registers by calling kernel_fpu_end() before returning or |
58 |
sleeping. These |
59 |
functions also prevent pre-emptive interruption of the device driver |
60 |
which could otherwise |
61 |
mess up the registers. kernel_fpu_begin() saves all floating point |
62 |
registers and vector |
63 |
registers if available. |
64 |
There is no red zone in 64-bit Linux kernel mode. |
65 |
The programmer should be aware of these restrictions if calling any |
66 |
other library than the |
67 |
system kernel libraries from a device driver. |