Re: [gentoo-user] USE="mmx mmxext sse sse2 ssse3 3dnow 3dnowext" - gentoo-user

From:	Volker Armin Hemmann <volkerarmin@××××××××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] USE="mmx mmxext sse sse2 ssse3 3dnow 3dnowext"
Date:	Fri, 29 May 2009 06:36:23
Message-Id:	`200905290836.15586.volkerarmin@googlemail.com`
In Reply to:	Re: [gentoo-user] USE="mmx mmxext sse sse2 ssse3 3dnow 3dnowext" by Graham Murray

1

On Freitag 29 Mai 2009, Graham Murray wrote:

2

> Stroller <stroller@××××××××××××××××××.uk> writes:

3

> > But, surely "-march=" also instructs gcc to support the additional

4

> > instructions.  Suggest you re-read Daniel's post that I was replying

5

> > to.

6

> >

7

> > What's the difference between supporting the "certain set of

8

> > instructions" with "-march=" and doing so with USEs?

9

> >

10

> > Or doesn't "-march=" support additional "certain sets of

11

> > instructions". What does it do, then?

12

>

13

> I am not sure,

14

>

15

> $ gcc -Q --help=target -march=core2

16

> The following options are target specific:

17

>   -m128bit-long-double                  [disabled]

18

>   -m32                                  [enabled]

19

>   -m3dnow                               [disabled]

20

>   -m3dnowa                              [disabled]

21

>   -m64                                  [disabled]

22

>   -m80387                               [enabled]

23

>   -m96bit-long-double                   [enabled]

24

>   -mabm                                 [disabled]

25

>   -maccumulate-outgoing-args            [disabled]

26

>   -maes                                 [disabled]

27

>   -malign-double                        [disabled]

28

>   -malign-functions=

29

>   -malign-jumps=

30

>   -malign-loops=

31

>   -malign-stringops                     [enabled]

32

>   -march=                               core2

33

>   -masm=

34

>   -mavx                                 [disabled]

35

>   -mbranch-cost=

36

>   -mcld                                 [disabled]

37

>   -mcmodel=

38

>   -mcx16                                [disabled]

39

>   -mfancy-math-387                      [enabled]

40

>   -mfma                                 [disabled]

41

>   -mforce-drap                          [disabled]

42

>   -mfp-ret-in-387                       [enabled]

43

>   -mfpmath=

44

>   -mfused-madd                          [enabled]

45

>   -mglibc                               [enabled]

46

>   -mhard-float                          [enabled]

47

>   -mieee-fp                             [enabled]

48

>   -mincoming-stack-boundary=

49

>   -minline-all-stringops                [disabled]

50

>   -minline-stringops-dynamically        [disabled]

51

>   -mintel-syntax                        [disabled]

52

>   -mlarge-data-threshold=

53

>   -mmmx                                 [disabled]

54

>   -mms-bitfields                        [disabled]

55

>   -mno-align-stringops                  [disabled]

56

>   -mno-fancy-math-387                   [disabled]

57

>   -mno-fused-madd                       [disabled]

58

>   -mno-push-args                        [disabled]

59

>   -mno-red-zone                         [disabled]

60

>   -mno-sse4                             [enabled]

61

>   -momit-leaf-frame-pointer             [disabled]

62

>   -mpc

63

>   -mpclmul                              [disabled]

64

>   -mpopcnt                              [disabled]

65

>   -mpreferred-stack-boundary=

66

>   -mpush-args                           [enabled]

67

>   -mrecip                               [disabled]

68

>   -mred-zone                            [enabled]

69

>   -mregparm=

70

>   -mrtd                                 [disabled]

71

>   -msahf                                [disabled]

72

>   -msoft-float                          [disabled]

73

>   -msse                                 [disabled]

74

>   -msse2                                [disabled]

75

>   -msse2avx                             [disabled]

76

>   -msse3                                [disabled]

77

>   -msse4                                [disabled]

78

>   -msse4.1                              [disabled]

79

>   -msse4.2                              [disabled]

80

>   -msse4a                               [disabled]

81

>   -msse5                                [disabled]

82

>   -msseregparm                          [disabled]

83

>   -mssse3                               [disabled]

84

>   -mstack-arg-probe                     [disabled]

85

>   -mstackrealign                        [enabled]

86

>   -mstringop-strategy=

87

>   -mtls-dialect=

88

>   -mtls-direct-seg-refs                 [enabled]

89

>   -mtune=

90

>   -muclibc                              [disabled]

91

>   -mveclibabi=

92

93

and man gcc says:

94

 --help={class|[^]qualifier}[,...]

95

           Print (on the standard output) a description of the command line 

96

options understood by the compiler that fit into all specified

97

           classes and qualifiers.  These are the supported classes:

98

99

target

100

    This will display target-specific options.  Unlike the --target-help 

101

option however, target-specific options of the linker

102

               and assembler will not be displayed.  This is because those 

103

tools do not currently support the extended --help= syntax.

104

105

Which leads to the conclusion, that it only shows options that can be set. Not 

106

options how they are really set (besides a few that are enabled by the 

107

architecture). If you leave -march=core2 out. you will probably get the same 

108

result.

109

110

for example:

111

 -mfpmath=unit

112

           Generate floating point arithmetics for selected unit unit.  The 

113

choices for unit are:

114

115

           387 Use the standard 387 floating point coprocessor present 

116

majority of chips and emulated otherwise.  Code compiled with this

117

               option will run almost everywhere.  The temporary results are 

118

computed in 80bit precision instead of precision specified by

119

               the type resulting in slightly different results compared to 

120

most of other chips.  See -ffloat-store for more detailed

121

               description.

122

123

               This is the default choice for i386 compiler.

124

125

           sse Use scalar floating point instructions present in the SSE 

126

instruction set.  This instruction set is supported by Pentium3

127

               and newer chips, in the AMD line by Athlon-4, Athlon-xp and 

128

Athlon-mp chips.  The earlier version of SSE instruction set

129

               supports only single precision arithmetics, thus the double and 

130

extended precision arithmetics is still done using 387.

131

               Later version, present only in Pentium4 and the future AMD 

132

x86-64 chips supports double precision arithmetics too.

133

134

               For the i386 compiler, you need to use -march=cpu-type, -msse 

135

or -msse2 switches to enable SSE extensions and make this

136

               option effective.  For the x86-64 compiler, these extensions 

137

are enabled by default.

138

139

               The resulting code should be considerably faster in the 

140

majority of cases and avoid the numerical instability problems of

141

               387 code, but may break some existing code that expects 

142

temporaries to be 80bit.

143

144

               This is the default choice for the x86-64 compiler.

145

146

as you can see from the man excerpt, if the help showed enabled options, 

147

mfpmath=sse should be there. It isn't.

148

149

...

1	On Freitag 29 Mai 2009, Graham Murray wrote:
2	> Stroller <stroller@××××××××××××××××××.uk> writes:
3	> > But, surely "-march=" also instructs gcc to support the additional
4	> > instructions. Suggest you re-read Daniel's post that I was replying
5	> > to.
6	> >
7	> > What's the difference between supporting the "certain set of
8	> > instructions" with "-march=" and doing so with USEs?
9	> >
10	> > Or doesn't "-march=" support additional "certain sets of
11	> > instructions". What does it do, then?
12	>
13	> I am not sure,
14	>
15	> $ gcc -Q --help=target -march=core2
16	> The following options are target specific:
17	> -m128bit-long-double [disabled]
18	> -m32 [enabled]
19	> -m3dnow [disabled]
20	> -m3dnowa [disabled]
21	> -m64 [disabled]
22	> -m80387 [enabled]
23	> -m96bit-long-double [enabled]
24	> -mabm [disabled]
25	> -maccumulate-outgoing-args [disabled]
26	> -maes [disabled]
27	> -malign-double [disabled]
28	> -malign-functions=
29	> -malign-jumps=
30	> -malign-loops=
31	> -malign-stringops [enabled]
32	> -march= core2
33	> -masm=
34	> -mavx [disabled]
35	> -mbranch-cost=
36	> -mcld [disabled]
37	> -mcmodel=
38	> -mcx16 [disabled]
39	> -mfancy-math-387 [enabled]
40	> -mfma [disabled]
41	> -mforce-drap [disabled]
42	> -mfp-ret-in-387 [enabled]
43	> -mfpmath=
44	> -mfused-madd [enabled]
45	> -mglibc [enabled]
46	> -mhard-float [enabled]
47	> -mieee-fp [enabled]
48	> -mincoming-stack-boundary=
49	> -minline-all-stringops [disabled]
50	> -minline-stringops-dynamically [disabled]
51	> -mintel-syntax [disabled]
52	> -mlarge-data-threshold=
53	> -mmmx [disabled]
54	> -mms-bitfields [disabled]
55	> -mno-align-stringops [disabled]
56	> -mno-fancy-math-387 [disabled]
57	> -mno-fused-madd [disabled]
58	> -mno-push-args [disabled]
59	> -mno-red-zone [disabled]
60	> -mno-sse4 [enabled]
61	> -momit-leaf-frame-pointer [disabled]
62	> -mpc
63	> -mpclmul [disabled]
64	> -mpopcnt [disabled]
65	> -mpreferred-stack-boundary=
66	> -mpush-args [enabled]
67	> -mrecip [disabled]
68	> -mred-zone [enabled]
69	> -mregparm=
70	> -mrtd [disabled]
71	> -msahf [disabled]
72	> -msoft-float [disabled]
73	> -msse [disabled]
74	> -msse2 [disabled]
75	> -msse2avx [disabled]
76	> -msse3 [disabled]
77	> -msse4 [disabled]
78	> -msse4.1 [disabled]
79	> -msse4.2 [disabled]
80	> -msse4a [disabled]
81	> -msse5 [disabled]
82	> -msseregparm [disabled]
83	> -mssse3 [disabled]
84	> -mstack-arg-probe [disabled]
85	> -mstackrealign [enabled]
86	> -mstringop-strategy=
87	> -mtls-dialect=
88	> -mtls-direct-seg-refs [enabled]
89	> -mtune=
90	> -muclibc [disabled]
91	> -mveclibabi=
92
93	and man gcc says:
94	--help={class\|[^]qualifier}[,...]
95	Print (on the standard output) a description of the command line
96	options understood by the compiler that fit into all specified
97	classes and qualifiers. These are the supported classes:
98
99	target
100	This will display target-specific options. Unlike the --target-help
101	option however, target-specific options of the linker
102	and assembler will not be displayed. This is because those
103	tools do not currently support the extended --help= syntax.
104
105	Which leads to the conclusion, that it only shows options that can be set. Not
106	options how they are really set (besides a few that are enabled by the
107	architecture). If you leave -march=core2 out. you will probably get the same
108	result.
109
110	for example:
111	-mfpmath=unit
112	Generate floating point arithmetics for selected unit unit. The
113	choices for unit are:
114
115	387 Use the standard 387 floating point coprocessor present
116	majority of chips and emulated otherwise. Code compiled with this
117	option will run almost everywhere. The temporary results are
118	computed in 80bit precision instead of precision specified by
119	the type resulting in slightly different results compared to
120	most of other chips. See -ffloat-store for more detailed
121	description.
122
123	This is the default choice for i386 compiler.
124
125	sse Use scalar floating point instructions present in the SSE
126	instruction set. This instruction set is supported by Pentium3
127	and newer chips, in the AMD line by Athlon-4, Athlon-xp and
128	Athlon-mp chips. The earlier version of SSE instruction set
129	supports only single precision arithmetics, thus the double and
130	extended precision arithmetics is still done using 387.
131	Later version, present only in Pentium4 and the future AMD
132	x86-64 chips supports double precision arithmetics too.
133
134	For the i386 compiler, you need to use -march=cpu-type, -msse
135	or -msse2 switches to enable SSE extensions and make this
136	option effective. For the x86-64 compiler, these extensions
137	are enabled by default.
138
139	The resulting code should be considerably faster in the
140	majority of cases and avoid the numerical instability problems of
141	387 code, but may break some existing code that expects
142	temporaries to be 80bit.
143
144	This is the default choice for the x86-64 compiler.
145
146	as you can see from the man excerpt, if the help showed enabled options,
147	mfpmath=sse should be there. It isn't.
148
149	...

Gentoo Archives: gentoo-user