1 |
Bug #445053 deals with the new USE=fma flag in sci-libs/fftw-3.3.3 |
2 |
(~amd64). This flag enables upstream's new-for-that-version fma |
3 |
instruction set optimizations, but the problem is that there's two |
4 |
different fma instruction sets, fma3 and fma4. The wikipedia article |
5 |
explains the difference, history, etc, in some detail. |
6 |
|
7 |
Bug URL: https://bugs.gentoo.org/show_bug.cgi?id=445053 |
8 |
|
9 |
fma on wikipedia: http://en.wikipedia.org/wiki/FMA_instruction_set |
10 |
|
11 |
|
12 |
So when I go to do my update, I see the new USE flag, and having an amd |
13 |
bdver1 (bulldozer) with fma4, but seeing the USE flag is for fma (no |
14 |
number appended), I'm confused and start looking into things, then file |
15 |
that bug. |
16 |
|
17 |
I've now actually tested USE=fma on my bdver1 (fma4) hardware with both |
18 |
the ebuild's "small" tests, and manually run "make bigtest" in all three |
19 |
subdirs (single/double/long-double) created as part of the build process, |
20 |
passing all tests, so it seems fma works reliably for fma4 hardware. |
21 |
What we do NOT yet know for sure is whether it works reliably on fma3 |
22 |
hardware, so we now need someone with fma3 hardware to check there, as |
23 |
well. |
24 |
|
25 |
According to the wikipedia article, Intel hardware will support fma3 with |
26 |
hardware to be released in 2013, so AFAIK, there's no released Intel |
27 |
hardware with hardware fma support at all, yet. Still anyone with a |
28 |
current (definitely this year) Intel cpu/apu is welcome to check /proc/ |
29 |
cpuinfo and see, and run the tests if they have it. |
30 |
|
31 |
The newest amd hardware should already have fma support, however, but it |
32 |
could be fma3 or fma4 depending on CPU. |
33 |
|
34 |
Bulldozer (-march=bdver1 in gcc) chips, released in late 2011, should |
35 |
have fma4 listed in /proc/cpuinfo, as I do here. That's what I tested |
36 |
with USE=fma here, with all tests I ran passing. |
37 |
|
38 |
The new piledriver CPUs, and trinity APUs, however (I believe - |
39 |
march=bdver2, but am not positive on that), are supposed to support |
40 |
fma3. I'd guess /proc/cpuinfo should report either fma3 or simply fma, |
41 |
for them. That's what still needs tested. |
42 |
|
43 |
So, anyone with that hardware, could you at least set USE=fma and run |
44 |
ebuild ... test on sci-libs/fftw-3.3.3 , then report the results in the |
45 |
bug? Based on my results, the whole build and test (the ebuild runs make |
46 |
smalltest for all three subdirs) should only run perhaps five minutes or |
47 |
so (it was about three here, including the configure and build, tho my |
48 |
PORTAGE_TMPDIR is on tmpfs, so it might take a bit longer for those with |
49 |
it on a spinning hard drive). |
50 |
|
51 |
Ideally, once the ebuild test passes, you'd also manually cd into the |
52 |
work dir, source the environment file to get the portage build |
53 |
environment, and run emake bigtest in all three subdirs (the ebuild uses |
54 |
a loop thru the subdirs to run smalltest, you can do the same for bigtest, |
55 |
or cd into each and run the tests manually). That will take rather |
56 |
longer, perhaps an hour or so for the single subdir, longer, maybe two |
57 |
hours, for the double subdir, and the same or longer for long-double. |
58 |
However, the tests don't make very efficient use of the CPU, so if you |
59 |
have a quad-core or better, likely with piledriver anyway, you could |
60 |
probably run the tests for all three subdirs in parallel and still have |
61 |
CPU left to run other things. |
62 |
|
63 |
If it passes (e)make smalltest (in the ebuild test phase) and the manual |
64 |
(e)make bigtest, for all three subdirs, with USE=fma, on an fma3 hardware |
65 |
system, it should be safe to change the USE flag description to say it |
66 |
can be used for either fma3 or fma4 hardware. If not, then since it does |
67 |
seem to work on my fma4 hardware, perhaps the flag should be changed to |
68 |
fma4. |
69 |
|
70 |
So any help testing fma3 hardware would definitely be appreciated. Please |
71 |
report results on the bug. Anyone with fma4 hardware can double-check my |
72 |
results as well, but it does seem to work here. |
73 |
|
74 |
Thanks. =:^) |
75 |
|
76 |
-- |
77 |
Duncan - List replies preferred. No HTML msgs. |
78 |
"Every nonfree program has a lord, a master -- |
79 |
and if you use the program, he is your master." Richard Stallman |