Gentoo Archives: gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone?
Date:	Fri, 15 Sep 2006 20:41:32
Message-Id:	`eef31b$mjs$3@sea.gmane.org`
In Reply to:	Re: [gentoo-amd64] gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone? by Olivier Crete

1	Olivier Crete <tester@g.o> posted
2	1158340730.14248.61.camel@×××××××××××××.internal, excerpted below, on Fri,
3	15 Sep 2006 13:18:49 -0400:
4
5	> On Fri, 2006-15-09 at 10:08 -0700, felix@×××××××.com wrote:
6	>> On Fri, Sep 15, 2006 at 04:47:14PM +0000, Duncan wrote:
7	>>
8	>> > I'm unclear as to what "vectorization" means as used here. My
9	>> > understanding of "vector" is as a synonym for "line", thus implying
10	>> > loop unrolling of some form or another, which will increase size.
11	>> >
12	>> > I am however aware that vectorization has a somewhat different
13	>> > meaning in programming terms than the above, but am not sufficiently
14	>> > educated on the topic to make an informed choice[.]
15	>> >
16	>> > If you can sufficiently explain the concept to me such that I
17	>> > understand enough about it to feel comfortable going with other than
18	>> > the default (which means I can explain why I chose it and why it
19	>> > won't interfere with my overall strategy as outlined in the
20	>> > grandparent, or is worth it even if it does), I'd be very grateful!
21	>>
22	>> Back in the day, vectorization was, I believe, a supercomputer SIMD
23	>> (single instruction multiple data) concept, where instruction operands
24	>> were pointers to data, so it would, for instance, add two arrays of
25	>> numbers to produce a third array. Isn't this what the Altivec
26	>> instructions do?
27	>
28	> That's exactly what it means. On x86/amd64 some MMX, SSE/SSE2 and 3dnow
29	> operations are SIMD operations. Vectorizing a loop means that if you try
30	> to add two tables of lets say 12000 elements, instead of doing the loops
31	> 12000 times for 1 element each time, it will do the loop lets say 3000
32	> times with 4 element each time. Which should be faster... (but isn't
33	> always depending if the vector ops have been implemented properly).
34
35	I was somewhat aware of that, but hadn't considered the effect on loops,
36	and don't understand it enough to be able explain it as you did, nor enough
37	to grok why if it's so much more efficient, gcc doesn't do it by default
38	at least on archs sufficiently specified to know the instructions are
39	there and that it makes sense. (amd64 being new enough not to have all the
40	different generations of mmx/sse/sse2/etc/etc it should make sense, as it
41	would on x86 with -march=pentium4 or whatever, so it knows what
42	vectorization levels are available as opposed to plain pentium, which was
43	pre-mmx, let alone the later vectorization functions.)
44
45	IOW, that explains why it should be more efficient, but not why gcc isn't
46	already doing it on amd64, or maybe it is, and specifying the flag would
47	be redundant? This is precisely what I mean when I say I don't have
48	enough information to make a defensible decision, so I've chosen to stick
49	with the safe defaults. If it's not being done by default, there's likely
50	a good reason somewhere, and lacking enough information to make an
51	informed decision, the defaults are the safe way to go.
52
53	This is also one of those places where the manpage is frustratingly
54	uninformative. The explanation on -ftree-vect-loop-version explains that
55	it's enabled by default, that both vectorized and unvectorized versions of
56	loops are created where compile-time can't tell for sure that vectorizing
57	is possible, /except/ for -Os. Since this flag forces double-code in some
58	cases, disabling it for -Os makes perfect sense so no problem there. The
59	problem is that this implies that where it /can/ tell vectorization is
60	possible, it should be doing that by default as well -- only it never
61	/says/ it does it by default, neither under the regular -ftree-vectorize
62	description, nor under the lists of what gets enabled by default at the
63	various -OX levels. The documentation therefore leaves the answer to the
64	question of whether it's enabled by default very much up in the air,
65	implying it is in the description of something else, but nowhere stating
66	explicitly one way or the other.
67
68	Another example of unclearly specifying the default is -ftree-pre. It's
69	certainly the default for -O2 and -O3, and the section on -Os doesn't say
70	it's disabled there while saying all the -O2 except where that would
71	increase the size, and there's no direct indication this increases size,
72	but the description for -ftree-pre specifies -O2 and -O3 specifically
73	only, so one is left wondering what side of -O2 except where that would
74	increase size it falls on, and why. As you can see, I've chosen to
75	include it in my CFLAGS because it seems like it should be of benefit
76	(compare -ftree-fre, enabled at -O and higher including -Os) and shouldn't
77	increase size /too/ much, just in case it's /not/ default for -Os for some
78	reason.
79
80	With -ftree-pre I can be pretty sure it's safe to include since -O2 is
81	known to include it, but -ftree-vectorize is different as there's nothing
82	saying /where/ it's the default (if anywhere), tho as I explained it's
83	implied as the default by the description for the -ftree-vect-loop-version
84	entry.
85
86	--
87	Duncan - List replies preferred. No HTML msgs.
88	"Every nonfree program has a lord, a master --
89	and if you use the program, he is your master." Richard Stallman
90
91	--
92	gentoo-amd64@g.o mailing list

Replies

Subject	Author
Re: [gentoo-amd64] Re: gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone?	Richard Freeman <rich@××××××××××××××.net>

Report Message

Find on MARC Find on Google Groups