Gentoo Archives: gentoo-user

From: James <wireless@×××××××××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: CFLAGs for kernel compilation
Date: Fri, 01 May 2015 17:21:51
Message-Id: loom.20150501T182640-190@post.gmane.org
In Reply to: Re: [gentoo-user] Re: CFLAGs for kernel compilation by Andrew Savchenko
1 Andrew Savchenko <bircoph <at> gentoo.org> writes:
2
3
4 > > I can hardly imagine that otherwise the compiler converts integer
5 > > or pointer arithmetic into floating point arithmetics, or is
6 > > this really the case for certain flags? If yes, why should these
7 > > flags *ever* be useful?
8 > > I mean: The context switching happens for non-kernel code as well,
9 > > doesn't it?
10
11
12 First off, reading this thread, I cannot really tell what the intended use
13 of the the "highly tuned" kernels is to be. For almost all workstation
14 and server proposes, what has been previously stated is mostly correct. If
15 you really want test these waters, do it on a system that is not in your
16 critical path. You tune and experiment, you are going to bork your box.
17 Water coolers on the CPUs is a good idea when taxing FPU and other simd
18 hareware on the CPU, imho. sys-power/Powertop is your friend.
19
20
21 > Yes, context switching happens for all code and have its costs. But
22 > for userspace code context switching happens for many other
23 > reasons, e.g. on each syscall (userspace <-> kernelspace switching).
24 > Also some user applications may need high precision or context
25 > switching pays off due to mass parallel data processing, e.g. SIMD
26 > instructions in scientific or multimedia applications.
27
28 (
29 Here here, I knew we had an LU expert int he crowd. Most scientific
30 or highly parallelized number cruncing does benefit from experimenting
31 with settings and *profiling* the results (trace-cdm + kernelshark)
32 are in portage and are very useful for analysis of hardware timings,
33 context switching and a myriad of other issues. Be careful, you can
34 sink a lifetime into such efforts with little to show for your efforts.
35 The best thing is to read up on specific optimizations for specific
36 codes as vetted by the specific hardware in your processors. Tuning for
37 one need will most likely retard other types of performances; that is
38 why before you delve into these waters, you really need to learn about
39 profiling both target (applicattion) and kernel codes, *BEFORE* randomly
40 tuning the advanced numerical intricacies of your hardware resources.
41 Start with memory and cgroups before worrying about the hardware inside
42 your processors (cpu and gpu).
43
44
45 > But unless special conditions mentioned above, fixed point is still
46 > faster in userspace, some ffmpeg codecs have both fixed and floating
47 > point implementations, you may compare them. Programming in fixed point
48 > is much harder, so most people avoid it unless they have a very
49 > goode reason to use it. And dont't forget that kernel is
50 > performance critical unlike most of userspace applications.
51
52 Video (mpeg, h.264 and such) massively benefits from the enhanced matrix
53 abilities of the simd hardware in your video card's GPU. These bare metal
54 resources are being integrated into gcc-5.1+ for experimentation. But,
55 it is likely going to take a year or so before ordinary users of linux
56 resources see these performance gains. I would encourage you
57 to experiment, but *never on your main workstation*. I'm purchasing
58 a new nvidia video card just to benchmark and tune some numerically
59 intesive codes that use sci-libs/magma. Although this will be my
60 currently fastest video card, it will sit in a box that not used
61 for visual eye candy (gaming, anime, ray_traces etc).
62
63
64 The mesos clustering codes (shark, storm, tachyon etc) and MP(I) codes are
65 going to fundamentally change the numerical processing landscape for even
66 small linux clusters. An excellent bit of code to get your feet_wet is
67 sys-apps/hwloc. More than FPU, MP(I) {sys-cluster/openmpi} and other
68 clustering codes are going to allow you to use the DDR(4|5) memory found in
69 many video cards (GPU) via *RDMA*. The world is rapidly changing and many
70 old "fixed point integer" folks do not see the Tsunami that is just
71 off_shore. Many computationally expensive codes have development project to
72 move to an "in-memory" [1] environment where HD resources are avoided as
73 much as possible in a cluster environment. Clustered resources "tuned" for
74 such things as a video rendering farm, will have very different optimized
75 kernels than your KDE(G*) workstation or web server. medica-gfx/Blender is
76 another excellent collection of codes that benefits from all sorts of tuning
77 on a special_purpose system.
78
79 So do you really have a valid need to tune the FPU performance due to a
80 numerically demanding applications? YMMV
81
82 > Best regards,
83 > Andrew Savchenko
84
85
86 hth,
87 James
88
89 [1] https://amplab.cs.berkeley.edu/