[gentoo-amd64] Re: Hyper-threading an AMD64 3800+ - gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: Hyper-threading an AMD64 3800+
Date:	Thu, 11 Jun 2009 09:31:06
Message-Id:	`pan.2009.06.11.09.30.52@cox.net`
In Reply to:	Re: [gentoo-amd64] Hyper-threading an AMD64 3800+ by Volker Armin Hemmann

1

Volker Armin Hemmann <volkerarmin@××××××××××.com> posted

2

200906110022.26698.volkerarmin@××××××××××.com, excerpted below, on  Thu,

3

11 Jun 2009 00:22:26 +0200:

4

5

> On Donnerstag 11 Juni 2009, Greg wrote:

6

>> I've been having trouble determining if my processor has

7

>> hyper-threading. I'm thinking that it does. I know that it isn't a

8

>> dual-core.

9

>>

10

>> If it is a hyper-thread processor, I can't seem to figure out exactly

11

>> how to enable the hyper-thread under linux.

12

>

13

> no amd supports hyper-threading. They have that flag because they are

14

> compatible - and if they are multicore to 'trick' stupid software that

15

> checks for ht to multi thread but does not multithread on multicore

16

> cpus.

17

18

More to the point, AMD CPUs don't /need/ hyper-threading to run 

19

efficiently.

20

21

Here's the deal on hyper-threading.

22

23

It first became popular (and I believe was first introduced, but I may be 

24

mistaken on that) with the Intel "Netburst" architecture, back in the 

25

last gasps of the clock-rate-is-everything era when Intel was doing 

26

everything they could to write those last few hundred MHz out of their 

27

CPUs, even at the expense of such deep pipelines that it actually hurt 

28

performance in many cases.  (Plus it ran way hot, and sucked up power at 

29

such a rate that people were doing projections indicating that at the 

30

rate things were going, in a few years each CPU was going to need its own 

31

Nuclear reactor power supply... and the cooling to go along with it!)

32

33

Happily Intel has moved beyond that stage now, and the core-2s and 

34

beyond, and moving to true dual-core and beyond, they once again began 

35

competing extremely favorably against AMD, but netburst was the last gasp 

36

of the old "ever higher clocks" process, and it simply didn't compete 

37

well at all.

38

39

One of the things Intel did with netburst to keep the clock rates rising 

40

was create an incredibly deep instruction pipeline.  Once the pipeline 

41

got full, the CPU still dispatched the typical instruction per clock tick 

42

(I say typical because some instructions take more than a tick, while 

43

others can be processed two at a time, so the detail is considerably more 

44

complex than one instruction one tick, but the general idea remains 

45

"typically" accurate), but each instruction took many ticks to work thru 

46

the pipeline, so the penalty was horrible for a branch mis-predict or 

47

other event that emptied the instruction pipeline, as the units at the 

48

end of the pipeline effectively had to sit there doing nothing for dozens 

49

of clock ticks, waiting for the new instructions to get processed to that 

50

point again, filling the pipeline.  To some degree they could compensate 

51

by using better branch prediction, pre-caching, and other techniques, but 

52

it really wasn't nearly enough to fully compensate for the penalty they 

53

were paying when the prediction was wrong, due to the incredibly deep 

54

pipelining.

55

56

So the Intel engineers came up with the solution the marketers billed 

57

"hyper-threading" in ordered to try to claw back some of the performance 

58

they were losing due to all this.  Basically, they added a bit of very 

59

fast local storage, giving the CPU access to it on a swapping basis.  

60

When one thread ran into a mis-prediction, thereby emptying the pipeline, 

61

instead of the components at the end of the pipeline waiting idle for 

62

several dozen clocks for the pipeline to refill, they swapped to the 

63

hyperthread and continued working on it.  Ideally, by the time it got 

64

stuck, the first one was ready to go again, so they could switch back to 

65

it, while they waited on the other one now.

66

67

Thus, what was really happening was that they were trying desperately to 

68

compensate for their design choice of an overly deep pipeline (forced on 

69

them by the pursuit of ever faster clock rates), and the marketers billed 

70

hyper-threading, in reality a very very clever but not really adequate 

71

compensation for a bad design choice, as a feature they were able to sell 

72

surprisingly effectively.

73

74

Meanwhile, AMD saw the light and decided the MHz game simply wasn't going 

75

to work for them.  They decided the loss of performance per clock they 

76

were seeing continuing to play the MHz game just wasn't worth it, and 

77

deliberately did NOT continue targeting the ever increasing clock rates, 

78

instead, choosing to emphasize their AMD64 instruction set and other 

79

features.

80

81

As a result, AMD's chips didn't have to pay the price of the incredibly 

82

deep pipeline Intel was using, and with their shorter pipeline, the 

83

penalty for mis-prediction was much lower as well, and it didn't really 

84

make sense to do the hyper-threading thing because it didn't really help 

85

with the lower mis-prediction penalty they were paying.

86

87

Thus, AMD never needed hyper-threading as compensation for their bad 

88

design choices and never implemented it, thus never getting to sell the 

89

very clever but still poor workaround for a poor design choice as a great 

90

feature, as Intel was doing at the time.

91

92

So that's where all the hype over hyper-threading first started.  

93

Eventually, tho, Intel realized the cost it was paying for pursuit of the 

94

MHz God wasn't worth it, and they came out with the Core-2s, which REALLY 

95

gave AMD a run for the money.  (Truth be told, the core-2s were spanking 

96

AMD's butt, performance-wise.  Added to that AMD in its turn slipped up 

97

with its original quad-core implementation in the phenoms, handing Intel 

98

the win for another few quarters.  The problem of course being that Intel 

99

is a far larger company than AMD, so it fumbling as it did for a couple 

100

years, didn't hurt it near as much as AMD's fumbling for just a couple 

101

quarters!)

102

103

Soon enough the real multi-cores came out, and hyper-threading as a 

104

rather poor substitute was somewhat forgotten.  However, Intel, having 

105

sold it as this great feature, found it was still in demand, with people 

106

wondering why their dual-cores couldn't use hyper-threading to appear as 

107

four cores, just as the single-core netburst arch had appeared as dual-

108

cores.

109

110

So the Intel marketing folks stuck their heads together with the 

111

engineering folks, and soon enough, hyper-threaded dual-cores were 

112

available as well.  The new architecture didn't really gain that much 

113

benefit from it as Intel had long since worked thru their way-too-long-

114

pipeline issues, so with the exception of rare corner-cases, hyper-

115

threading was now mostly buying performance directly from the real cores, 

116

and there was no gain under most loads that couldn't have been at least 

117

equally achieved by using the same transistor budget elsewhere, say for 

118

more cache, but once the market had been programmed to accept hyper-

119

threading as a solution, it demanded it, and seeing those extra "fake" 

120

cores listed /did/ look impressive, so Intel continued to provide what 

121

the market was now demanding, real performance gain or not.

122

123

That's where we are today.  On a modern CPU, hyper-threading provides 

124

very little real performance gain, one that actually may be a loss if one 

125

considers what else that same transistor budget could have otherwise been 

126

used for, but the market, once programmed for it, now continues to demand 

127

it, so Intel continues to provide it.

128

129

The (main) source for much of my understanding at the level explained 

130

above is Arstechnica's CPU writeups over the years, with additional 

131

articles as found on Tom's Hardware, Slashdot, and elsewhere.  Of course, 

132

when Ars does it, it's complete with unit and instruction flow diagrams, 

133

etc, plus much more detail that I gave above.  Anybody that's interested 

134

in this sort of thing really should follow Ars, as they have a guy that's 

135

really an expert in it following the industry for them, doing writeups on 

136

new developments generally some time after initial announcement, but 

137

before or immediately after initial full public release.  I've been 

138

following the articles there since the Pentium Pro era and the 

139

reliability level is very high.

140

141

--

142

Duncan - List replies preferred.   No HTML msgs.

143

"Every nonfree program has a lord, a master --

144

and if you use the program, he is your master."  Richard Stallman

Gentoo Archives: gentoo-amd64

Replies

1	Volker Armin Hemmann <volkerarmin@××××××××××.com> posted
2	200906110022.26698.volkerarmin@××××××××××.com, excerpted below, on Thu,
3	11 Jun 2009 00:22:26 +0200:
4
5	> On Donnerstag 11 Juni 2009, Greg wrote:
6	>> I've been having trouble determining if my processor has
7	>> hyper-threading. I'm thinking that it does. I know that it isn't a
8	>> dual-core.
9	>>
10	>> If it is a hyper-thread processor, I can't seem to figure out exactly
11	>> how to enable the hyper-thread under linux.
12	>
13	> no amd supports hyper-threading. They have that flag because they are
14	> compatible - and if they are multicore to 'trick' stupid software that
15	> checks for ht to multi thread but does not multithread on multicore
16	> cpus.
17
18	More to the point, AMD CPUs don't /need/ hyper-threading to run
19	efficiently.
20
21	Here's the deal on hyper-threading.
22
23	It first became popular (and I believe was first introduced, but I may be
24	mistaken on that) with the Intel "Netburst" architecture, back in the
25	last gasps of the clock-rate-is-everything era when Intel was doing
26	everything they could to write those last few hundred MHz out of their
27	CPUs, even at the expense of such deep pipelines that it actually hurt
28	performance in many cases. (Plus it ran way hot, and sucked up power at
29	such a rate that people were doing projections indicating that at the
30	rate things were going, in a few years each CPU was going to need its own
31	Nuclear reactor power supply... and the cooling to go along with it!)
32
33	Happily Intel has moved beyond that stage now, and the core-2s and
34	beyond, and moving to true dual-core and beyond, they once again began
35	competing extremely favorably against AMD, but netburst was the last gasp
36	of the old "ever higher clocks" process, and it simply didn't compete
37	well at all.
38
39	One of the things Intel did with netburst to keep the clock rates rising
40	was create an incredibly deep instruction pipeline. Once the pipeline
41	got full, the CPU still dispatched the typical instruction per clock tick
42	(I say typical because some instructions take more than a tick, while
43	others can be processed two at a time, so the detail is considerably more
44	complex than one instruction one tick, but the general idea remains
45	"typically" accurate), but each instruction took many ticks to work thru
46	the pipeline, so the penalty was horrible for a branch mis-predict or
47	other event that emptied the instruction pipeline, as the units at the
48	end of the pipeline effectively had to sit there doing nothing for dozens
49	of clock ticks, waiting for the new instructions to get processed to that
50	point again, filling the pipeline. To some degree they could compensate
51	by using better branch prediction, pre-caching, and other techniques, but
52	it really wasn't nearly enough to fully compensate for the penalty they
53	were paying when the prediction was wrong, due to the incredibly deep
54	pipelining.
55
56	So the Intel engineers came up with the solution the marketers billed
57	"hyper-threading" in ordered to try to claw back some of the performance
58	they were losing due to all this. Basically, they added a bit of very
59	fast local storage, giving the CPU access to it on a swapping basis.
60	When one thread ran into a mis-prediction, thereby emptying the pipeline,
61	instead of the components at the end of the pipeline waiting idle for
62	several dozen clocks for the pipeline to refill, they swapped to the
63	hyperthread and continued working on it. Ideally, by the time it got
64	stuck, the first one was ready to go again, so they could switch back to
65	it, while they waited on the other one now.
66
67	Thus, what was really happening was that they were trying desperately to
68	compensate for their design choice of an overly deep pipeline (forced on
69	them by the pursuit of ever faster clock rates), and the marketers billed
70	hyper-threading, in reality a very very clever but not really adequate
71	compensation for a bad design choice, as a feature they were able to sell
72	surprisingly effectively.
73
74	Meanwhile, AMD saw the light and decided the MHz game simply wasn't going
75	to work for them. They decided the loss of performance per clock they
76	were seeing continuing to play the MHz game just wasn't worth it, and
77	deliberately did NOT continue targeting the ever increasing clock rates,
78	instead, choosing to emphasize their AMD64 instruction set and other
79	features.
80
81	As a result, AMD's chips didn't have to pay the price of the incredibly
82	deep pipeline Intel was using, and with their shorter pipeline, the
83	penalty for mis-prediction was much lower as well, and it didn't really
84	make sense to do the hyper-threading thing because it didn't really help
85	with the lower mis-prediction penalty they were paying.
86
87	Thus, AMD never needed hyper-threading as compensation for their bad
88	design choices and never implemented it, thus never getting to sell the
89	very clever but still poor workaround for a poor design choice as a great
90	feature, as Intel was doing at the time.
91
92	So that's where all the hype over hyper-threading first started.
93	Eventually, tho, Intel realized the cost it was paying for pursuit of the
94	MHz God wasn't worth it, and they came out with the Core-2s, which REALLY
95	gave AMD a run for the money. (Truth be told, the core-2s were spanking
96	AMD's butt, performance-wise. Added to that AMD in its turn slipped up
97	with its original quad-core implementation in the phenoms, handing Intel
98	the win for another few quarters. The problem of course being that Intel
99	is a far larger company than AMD, so it fumbling as it did for a couple
100	years, didn't hurt it near as much as AMD's fumbling for just a couple
101	quarters!)
102
103	Soon enough the real multi-cores came out, and hyper-threading as a
104	rather poor substitute was somewhat forgotten. However, Intel, having
105	sold it as this great feature, found it was still in demand, with people
106	wondering why their dual-cores couldn't use hyper-threading to appear as
107	four cores, just as the single-core netburst arch had appeared as dual-
108	cores.
109
110	So the Intel marketing folks stuck their heads together with the
111	engineering folks, and soon enough, hyper-threaded dual-cores were
112	available as well. The new architecture didn't really gain that much
113	benefit from it as Intel had long since worked thru their way-too-long-
114	pipeline issues, so with the exception of rare corner-cases, hyper-
115	threading was now mostly buying performance directly from the real cores,
116	and there was no gain under most loads that couldn't have been at least
117	equally achieved by using the same transistor budget elsewhere, say for
118	more cache, but once the market had been programmed to accept hyper-
119	threading as a solution, it demanded it, and seeing those extra "fake"
120	cores listed /did/ look impressive, so Intel continued to provide what
121	the market was now demanding, real performance gain or not.
122
123	That's where we are today. On a modern CPU, hyper-threading provides
124	very little real performance gain, one that actually may be a loss if one
125	considers what else that same transistor budget could have otherwise been
126	used for, but the market, once programmed for it, now continues to demand
127	it, so Intel continues to provide it.
128
129	The (main) source for much of my understanding at the level explained
130	above is Arstechnica's CPU writeups over the years, with additional
131	articles as found on Tom's Hardware, Slashdot, and elsewhere. Of course,
132	when Ars does it, it's complete with unit and instruction flow diagrams,
133	etc, plus much more detail that I gave above. Anybody that's interested
134	in this sort of thing really should follow Ars, as they have a guy that's
135	really an expert in it following the industry for them, doing writeups on
136	new developments generally some time after initial announcement, but
137	before or immediately after initial full public release. I've been
138	following the articles there since the Pentium Pro era and the
139	reliability level is very high.
140
141	--
142	Duncan - List replies preferred. No HTML msgs.
143	"Every nonfree program has a lord, a master --
144	and if you use the program, he is your master." Richard Stallman