1 |
Markus Dittrich wrote: |
2 |
> On Fri, 25 Aug 2006, Adam Pityszek wrote: |
3 |
> |
4 |
>>> Dear Markus, gentoo-science guys, |
5 |
>>> |
6 |
>>> Please find below the reply from Clint to my yesterday's email related to |
7 |
>>> our work on ATLAS shared libraries in Gentoo. |
8 |
>>> |
9 |
>>> Markus, I think we can help with answering the questions (2) and (3). Of |
10 |
>>> course, volunteers from gentoo-science are welcome as well. |
11 |
>>> |
12 |
>>> BR, |
13 |
>>> /ediap |
14 |
>>> |
15 |
>>> (1) Is it true that the extra pointer may still be used if we restore |
16 |
>>> it at |
17 |
>>> end of assembly routine? |
18 |
>>> (2) Does throwing the -fpic or other required compiler flag changes |
19 |
>>> change |
20 |
>>> the best cases (thus necessitating doubling the arch defaults)? |
21 |
>>> (3) What is the overall performance affect when using .so? |
22 |
>>> |
23 |
>>> I've tried to answer (1) by looking at some docs, but never got convinced |
24 |
>>> either way. I've been meaning to write a resister stress-test to see if |
25 |
>>> I can make gcc use the reserved register in a function w/o global data. |
26 |
>>> Perhaps you know? |
27 |
>>> |
28 |
>>> You guys could help with (2) & (3) if you like. You could build |
29 |
>>> out-of-box |
30 |
>>> to .a on whatever machines you can, and then build it to .so using your |
31 |
>>> gentoo harness, and post some head-to-head timings . . . If, as we |
32 |
>>> suspect, |
33 |
>>> the difference is essentially zero, that makes .so a lot more |
34 |
>>> attractive . . . |
35 |
>>> |
36 |
> |
37 |
> Hi Adam, |
38 |
> |
39 |
> Thanks for talking to upstream about this and Clint's response |
40 |
> sounds encouraging. We could definitely help out with 2) and 3); |
41 |
> it would be good to know anyway how well we do with our shared libs. In |
42 |
> doing so we should also test the impact of using |
43 |
> the 387 floating point unit versus the sse instruction set. According to |
44 |
> Clint, the former can give a significant performance |
45 |
> gain on some CPU's. If that is the case it might be worth a note in the |
46 |
> ebuild to make our users aware of it. |
47 |
> |
48 |
> We should get a hold of a nice benchmark suite for this purpose; Clint |
49 |
> has posted one on this gcc bug |
50 |
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 |
51 |
> which we might be able to use. I'll have a look at it. |
52 |
> |
53 |
> Best, |
54 |
> Markus |
55 |
> |
56 |
> |
57 |
> -- Markus Dittrich (markusle) |
58 |
> Gentoo Linux Developer |
59 |
> Scientific applications |
60 |
|
61 |
If you have the time, you can turn off all of the pre-conceived notions |
62 |
Atlas has about your architecture and let it benchmark itself. In fact, |
63 |
for the hard-core number crunchers, you might actually want to put a USE |
64 |
flag in the ebuild to do a "brute-force" assume-nothing compile, warning |
65 |
them that it takes a long time and that it should be run after an |
66 |
"emerge -f" with Linux in single-user mode. My recollection is that it |
67 |
used to take about 8 hours on a 1.3 GHz Athlon Thunderbird. |
68 |
-- |
69 |
gentoo-science@g.o mailing list |