1 |
Uwe Thiem wrote: |
2 |
> -funroll-loops isn't that good an idea because an unrolled loop might the |
3 |
> cache. |
4 |
|
5 |
All this started with Artur Grabowski's mail. And now everybody is |
6 |
repeating his idea. |
7 |
|
8 |
No, loop unrolling is not evil. Neither are inline functions. |
9 |
|
10 |
The following loop: |
11 |
|
12 |
do i=1,n |
13 |
sum=a(i) |
14 |
enddo |
15 |
|
16 |
is unrolled like |
17 |
|
18 |
do i=1,n,4 |
19 |
sum0=a(i) |
20 |
sum1=a(i+1) |
21 |
sum2=a(i+2) |
22 |
sum3=a(i+3) |
23 |
enddo |
24 |
sum=sum0+sum1+sum2+sum3 |
25 |
|
26 |
Some modulo logic must be added and I also omitted the after-loop cleanup. |
27 |
|
28 |
The gain is less frequent loop index variable comparison (and so a |
29 |
conditional jump), more register usage, less memory operations in more |
30 |
complex, nested loops, and a code which allows the scheduler to make |
31 |
more aggressive optimization in the code structure. And it also opens |
32 |
possibilities to other optimizations. |
33 |
|
34 |
All this compared to the increased instruction cache usage. And keep in |
35 |
mind that every program spends most of its runtime in loops. |
36 |
|
37 |
This was the theory. Practice shows that the optimizer in gcc was very |
38 |
weak before 3.4, and sill depends the efficiency of the scheduler. |
39 |
|
40 |
But remember: however good all this sounds, there are always programs |
41 |
where one or more optimization fails - there is no magic switch for a |
42 |
general, best performance. |
43 |
|
44 |
|
45 |
/Ervin |
46 |
|
47 |
-- |
48 |
gentoo-performance@g.o mailing list |