Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone?
Date: Fri, 15 Sep 2006 20:41:32
Message-Id: eef31b$mjs$3@sea.gmane.org
In Reply to: Re: [gentoo-amd64] gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone? by Olivier Crete
1 Olivier Crete <tester@g.o> posted
2 1158340730.14248.61.camel@×××××××××××××.internal, excerpted below, on Fri,
3 15 Sep 2006 13:18:49 -0400:
4
5 > On Fri, 2006-15-09 at 10:08 -0700, felix@×××××××.com wrote:
6 >> On Fri, Sep 15, 2006 at 04:47:14PM +0000, Duncan wrote:
7 >>
8 >> > I'm unclear as to what "vectorization" means as used here. My
9 >> > understanding of "vector" is as a synonym for "line", thus implying
10 >> > loop unrolling of some form or another, which will increase size.
11 >> >
12 >> > I am however aware that vectorization has a somewhat different
13 >> > meaning in programming terms than the above, but am not sufficiently
14 >> > educated on the topic to make an informed choice[.]
15 >> >
16 >> > If you can sufficiently explain the concept to me such that I
17 >> > understand enough about it to feel comfortable going with other than
18 >> > the default (which means I can explain why I chose it and why it
19 >> > won't interfere with my overall strategy as outlined in the
20 >> > grandparent, or is worth it even if it does), I'd be very grateful!
21 >>
22 >> Back in the day, vectorization was, I believe, a supercomputer SIMD
23 >> (single instruction multiple data) concept, where instruction operands
24 >> were pointers to data, so it would, for instance, add two arrays of
25 >> numbers to produce a third array. Isn't this what the Altivec
26 >> instructions do?
27 >
28 > That's exactly what it means. On x86/amd64 some MMX, SSE/SSE2 and 3dnow
29 > operations are SIMD operations. Vectorizing a loop means that if you try
30 > to add two tables of lets say 12000 elements, instead of doing the loops
31 > 12000 times for 1 element each time, it will do the loop lets say 3000
32 > times with 4 element each time. Which should be faster... (but isn't
33 > always depending if the vector ops have been implemented properly).
34
35 I was somewhat aware of that, but hadn't considered the effect on loops,
36 and don't understand it enough to be able explain it as you did, nor enough
37 to grok why if it's so much more efficient, gcc doesn't do it by default
38 at least on archs sufficiently specified to know the instructions are
39 there and that it makes sense. (amd64 being new enough not to have all the
40 different generations of mmx/sse/sse2/etc/etc it should make sense, as it
41 would on x86 with -march=pentium4 or whatever, so it knows what
42 vectorization levels are available as opposed to plain pentium, which was
43 pre-mmx, let alone the later vectorization functions.)
44
45 IOW, that explains why it should be more efficient, but not why gcc isn't
46 already doing it on amd64, or maybe it is, and specifying the flag would
47 be redundant? This is precisely what I mean when I say I don't have
48 enough information to make a defensible decision, so I've chosen to stick
49 with the safe defaults. If it's not being done by default, there's likely
50 a good reason somewhere, and lacking enough information to make an
51 informed decision, the defaults are the safe way to go.
52
53 This is also one of those places where the manpage is frustratingly
54 uninformative. The explanation on -ftree-vect-loop-version explains that
55 it's enabled by default, that both vectorized and unvectorized versions of
56 loops are created where compile-time can't tell for sure that vectorizing
57 is possible, /except/ for -Os. Since this flag forces double-code in some
58 cases, disabling it for -Os makes perfect sense so no problem there. The
59 problem is that this implies that where it /can/ tell vectorization is
60 possible, it should be doing that by default as well -- only it never
61 /says/ it does it by default, neither under the regular -ftree-vectorize
62 description, nor under the lists of what gets enabled by default at the
63 various -OX levels. The documentation therefore leaves the answer to the
64 question of whether it's enabled by default very much up in the air,
65 implying it is in the description of something else, but nowhere stating
66 explicitly one way or the other.
67
68 Another example of unclearly specifying the default is -ftree-pre. It's
69 certainly the default for -O2 and -O3, and the section on -Os doesn't say
70 it's disabled there while saying all the -O2 except where that would
71 increase the size, and there's no direct indication this increases size,
72 but the description for -ftree-pre specifies -O2 and -O3 specifically
73 only, so one is left wondering what side of -O2 except where that would
74 increase size it falls on, and why. As you can see, I've chosen to
75 include it in my CFLAGS because it seems like it should be of benefit
76 (compare -ftree-fre, enabled at -O and higher including -Os) and shouldn't
77 increase size /too/ much, just in case it's /not/ default for -Os for some
78 reason.
79
80 With -ftree-pre I can be pretty sure it's safe to include since -O2 is
81 known to include it, but -ftree-vectorize is different as there's nothing
82 saying /where/ it's the default (if anywhere), tho as I explained it's
83 implied as the default by the description for the -ftree-vect-loop-version
84 entry.
85
86 --
87 Duncan - List replies preferred. No HTML msgs.
88 "Every nonfree program has a lord, a master --
89 and if you use the program, he is your master." Richard Stallman
90
91 --
92 gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] Re: gcc4 CFLAGS Was: gcc 4.1 upgrade - bad desktop interactivity anyone? Richard Freeman <rich@××××××××××××××.net>