Gentoo Archives: gentoo-user

From:	James <wireless@×××××××××××.com>
To:	gentoo-user@l.g.o
Subject:	[gentoo-user] Re: CFLAGs for kernel compilation
Date:	Fri, 01 May 2015 17:21:51
Message-Id:	`loom.20150501T182640-190@post.gmane.org`
In Reply to:	Re: [gentoo-user] Re: CFLAGs for kernel compilation by Andrew Savchenko

1	Andrew Savchenko <bircoph <at> gentoo.org> writes:
2
3
4	> > I can hardly imagine that otherwise the compiler converts integer
5	> > or pointer arithmetic into floating point arithmetics, or is
6	> > this really the case for certain flags? If yes, why should these
7	> > flags ever be useful?
8	> > I mean: The context switching happens for non-kernel code as well,
9	> > doesn't it?
10
11
12	First off, reading this thread, I cannot really tell what the intended use
13	of the the "highly tuned" kernels is to be. For almost all workstation
14	and server proposes, what has been previously stated is mostly correct. If
15	you really want test these waters, do it on a system that is not in your
16	critical path. You tune and experiment, you are going to bork your box.
17	Water coolers on the CPUs is a good idea when taxing FPU and other simd
18	hareware on the CPU, imho. sys-power/Powertop is your friend.
19
20
21	> Yes, context switching happens for all code and have its costs. But
22	> for userspace code context switching happens for many other
23	> reasons, e.g. on each syscall (userspace <-> kernelspace switching).
24	> Also some user applications may need high precision or context
25	> switching pays off due to mass parallel data processing, e.g. SIMD
26	> instructions in scientific or multimedia applications.
27
28	(
29	Here here, I knew we had an LU expert int he crowd. Most scientific
30	or highly parallelized number cruncing does benefit from experimenting
31	with settings and profiling the results (trace-cdm + kernelshark)
32	are in portage and are very useful for analysis of hardware timings,
33	context switching and a myriad of other issues. Be careful, you can
34	sink a lifetime into such efforts with little to show for your efforts.
35	The best thing is to read up on specific optimizations for specific
36	codes as vetted by the specific hardware in your processors. Tuning for
37	one need will most likely retard other types of performances; that is
38	why before you delve into these waters, you really need to learn about
39	profiling both target (applicattion) and kernel codes, BEFORE randomly
40	tuning the advanced numerical intricacies of your hardware resources.
41	Start with memory and cgroups before worrying about the hardware inside
42	your processors (cpu and gpu).
43
44
45	> But unless special conditions mentioned above, fixed point is still
46	> faster in userspace, some ffmpeg codecs have both fixed and floating
47	> point implementations, you may compare them. Programming in fixed point
48	> is much harder, so most people avoid it unless they have a very
49	> goode reason to use it. And dont't forget that kernel is
50	> performance critical unlike most of userspace applications.
51
52	Video (mpeg, h.264 and such) massively benefits from the enhanced matrix
53	abilities of the simd hardware in your video card's GPU. These bare metal
54	resources are being integrated into gcc-5.1+ for experimentation. But,
55	it is likely going to take a year or so before ordinary users of linux
56	resources see these performance gains. I would encourage you
57	to experiment, but never on your main workstation. I'm purchasing
58	a new nvidia video card just to benchmark and tune some numerically
59	intesive codes that use sci-libs/magma. Although this will be my
60	currently fastest video card, it will sit in a box that not used
61	for visual eye candy (gaming, anime, ray_traces etc).
62
63
64	The mesos clustering codes (shark, storm, tachyon etc) and MP(I) codes are
65	going to fundamentally change the numerical processing landscape for even
66	small linux clusters. An excellent bit of code to get your feet_wet is
67	sys-apps/hwloc. More than FPU, MP(I) {sys-cluster/openmpi} and other
68	clustering codes are going to allow you to use the DDR(4\|5) memory found in
69	many video cards (GPU) via RDMA. The world is rapidly changing and many
70	old "fixed point integer" folks do not see the Tsunami that is just
71	off_shore. Many computationally expensive codes have development project to
72	move to an "in-memory" [1] environment where HD resources are avoided as
73	much as possible in a cluster environment. Clustered resources "tuned" for
74	such things as a video rendering farm, will have very different optimized
75	kernels than your KDE(G*) workstation or web server. medica-gfx/Blender is
76	another excellent collection of codes that benefits from all sorts of tuning
77	on a special_purpose system.
78
79	So do you really have a valid need to tune the FPU performance due to a
80	numerically demanding applications? YMMV
81
82	> Best regards,
83	> Andrew Savchenko
84
85
86	hth,
87	James
88
89	[1] https://amplab.cs.berkeley.edu/

Report Message

Find on MARC Find on Google Groups