1 |
Olivier Crete <tester@g.o> posted |
2 |
1158340730.14248.61.camel@×××××××××××××.internal, excerpted below, on Fri, |
3 |
15 Sep 2006 13:18:49 -0400: |
4 |
|
5 |
> On Fri, 2006-15-09 at 10:08 -0700, felix@×××××××.com wrote: |
6 |
>> On Fri, Sep 15, 2006 at 04:47:14PM +0000, Duncan wrote: |
7 |
>> |
8 |
>> > I'm unclear as to what "vectorization" means as used here. My |
9 |
>> > understanding of "vector" is as a synonym for "line", thus implying |
10 |
>> > loop unrolling of some form or another, which will increase size. |
11 |
>> > |
12 |
>> > I am however aware that vectorization has a somewhat different |
13 |
>> > meaning in programming terms than the above, but am not sufficiently |
14 |
>> > educated on the topic to make an informed choice[.] |
15 |
>> > |
16 |
>> > If you can sufficiently explain the concept to me such that I |
17 |
>> > understand enough about it to feel comfortable going with other than |
18 |
>> > the default (which means I can explain why I chose it and why it |
19 |
>> > won't interfere with my overall strategy as outlined in the |
20 |
>> > grandparent, or is worth it even if it does), I'd be very grateful! |
21 |
>> |
22 |
>> Back in the day, vectorization was, I believe, a supercomputer SIMD |
23 |
>> (single instruction multiple data) concept, where instruction operands |
24 |
>> were pointers to data, so it would, for instance, add two arrays of |
25 |
>> numbers to produce a third array. Isn't this what the Altivec |
26 |
>> instructions do? |
27 |
> |
28 |
> That's exactly what it means. On x86/amd64 some MMX, SSE/SSE2 and 3dnow |
29 |
> operations are SIMD operations. Vectorizing a loop means that if you try |
30 |
> to add two tables of lets say 12000 elements, instead of doing the loops |
31 |
> 12000 times for 1 element each time, it will do the loop lets say 3000 |
32 |
> times with 4 element each time. Which should be faster... (but isn't |
33 |
> always depending if the vector ops have been implemented properly). |
34 |
|
35 |
I was somewhat aware of that, but hadn't considered the effect on loops, |
36 |
and don't understand it enough to be able explain it as you did, nor enough |
37 |
to grok why if it's so much more efficient, gcc doesn't do it by default |
38 |
at least on archs sufficiently specified to know the instructions are |
39 |
there and that it makes sense. (amd64 being new enough not to have all the |
40 |
different generations of mmx/sse/sse2/etc/etc it should make sense, as it |
41 |
would on x86 with -march=pentium4 or whatever, so it knows what |
42 |
vectorization levels are available as opposed to plain pentium, which was |
43 |
pre-mmx, let alone the later vectorization functions.) |
44 |
|
45 |
IOW, that explains why it should be more efficient, but not why gcc isn't |
46 |
already doing it on amd64, or maybe it is, and specifying the flag would |
47 |
be redundant? This is precisely what I mean when I say I don't have |
48 |
enough information to make a defensible decision, so I've chosen to stick |
49 |
with the safe defaults. If it's not being done by default, there's likely |
50 |
a good reason somewhere, and lacking enough information to make an |
51 |
informed decision, the defaults are the safe way to go. |
52 |
|
53 |
This is also one of those places where the manpage is frustratingly |
54 |
uninformative. The explanation on -ftree-vect-loop-version explains that |
55 |
it's enabled by default, that both vectorized and unvectorized versions of |
56 |
loops are created where compile-time can't tell for sure that vectorizing |
57 |
is possible, /except/ for -Os. Since this flag forces double-code in some |
58 |
cases, disabling it for -Os makes perfect sense so no problem there. The |
59 |
problem is that this implies that where it /can/ tell vectorization is |
60 |
possible, it should be doing that by default as well -- only it never |
61 |
/says/ it does it by default, neither under the regular -ftree-vectorize |
62 |
description, nor under the lists of what gets enabled by default at the |
63 |
various -OX levels. The documentation therefore leaves the answer to the |
64 |
question of whether it's enabled by default very much up in the air, |
65 |
implying it is in the description of something else, but nowhere stating |
66 |
explicitly one way or the other. |
67 |
|
68 |
Another example of unclearly specifying the default is -ftree-pre. It's |
69 |
certainly the default for -O2 and -O3, and the section on -Os doesn't say |
70 |
it's disabled there while saying all the -O2 except where that would |
71 |
increase the size, and there's no direct indication this increases size, |
72 |
but the description for -ftree-pre specifies -O2 and -O3 specifically |
73 |
only, so one is left wondering what side of -O2 except where that would |
74 |
increase size it falls on, and why. As you can see, I've chosen to |
75 |
include it in my CFLAGS because it seems like it should be of benefit |
76 |
(compare -ftree-fre, enabled at -O and higher including -Os) and shouldn't |
77 |
increase size /too/ much, just in case it's /not/ default for -Os for some |
78 |
reason. |
79 |
|
80 |
With -ftree-pre I can be pretty sure it's safe to include since -O2 is |
81 |
known to include it, but -ftree-vectorize is different as there's nothing |
82 |
saying /where/ it's the default (if anywhere), tho as I explained it's |
83 |
implied as the default by the description for the -ftree-vect-loop-version |
84 |
entry. |
85 |
|
86 |
-- |
87 |
Duncan - List replies preferred. No HTML msgs. |
88 |
"Every nonfree program has a lord, a master -- |
89 |
and if you use the program, he is your master." Richard Stallman |
90 |
|
91 |
-- |
92 |
gentoo-amd64@g.o mailing list |