Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Hyper-threading an AMD64 3800+
Date: Thu, 11 Jun 2009 09:31:06
Message-Id: pan.2009.06.11.09.30.52@cox.net
In Reply to: Re: [gentoo-amd64] Hyper-threading an AMD64 3800+ by Volker Armin Hemmann
1 Volker Armin Hemmann <volkerarmin@××××××××××.com> posted
2 200906110022.26698.volkerarmin@××××××××××.com, excerpted below, on Thu,
3 11 Jun 2009 00:22:26 +0200:
4
5 > On Donnerstag 11 Juni 2009, Greg wrote:
6 >> I've been having trouble determining if my processor has
7 >> hyper-threading. I'm thinking that it does. I know that it isn't a
8 >> dual-core.
9 >>
10 >> If it is a hyper-thread processor, I can't seem to figure out exactly
11 >> how to enable the hyper-thread under linux.
12 >
13 > no amd supports hyper-threading. They have that flag because they are
14 > compatible - and if they are multicore to 'trick' stupid software that
15 > checks for ht to multi thread but does not multithread on multicore
16 > cpus.
17
18 More to the point, AMD CPUs don't /need/ hyper-threading to run
19 efficiently.
20
21 Here's the deal on hyper-threading.
22
23 It first became popular (and I believe was first introduced, but I may be
24 mistaken on that) with the Intel "Netburst" architecture, back in the
25 last gasps of the clock-rate-is-everything era when Intel was doing
26 everything they could to write those last few hundred MHz out of their
27 CPUs, even at the expense of such deep pipelines that it actually hurt
28 performance in many cases. (Plus it ran way hot, and sucked up power at
29 such a rate that people were doing projections indicating that at the
30 rate things were going, in a few years each CPU was going to need its own
31 Nuclear reactor power supply... and the cooling to go along with it!)
32
33 Happily Intel has moved beyond that stage now, and the core-2s and
34 beyond, and moving to true dual-core and beyond, they once again began
35 competing extremely favorably against AMD, but netburst was the last gasp
36 of the old "ever higher clocks" process, and it simply didn't compete
37 well at all.
38
39 One of the things Intel did with netburst to keep the clock rates rising
40 was create an incredibly deep instruction pipeline. Once the pipeline
41 got full, the CPU still dispatched the typical instruction per clock tick
42 (I say typical because some instructions take more than a tick, while
43 others can be processed two at a time, so the detail is considerably more
44 complex than one instruction one tick, but the general idea remains
45 "typically" accurate), but each instruction took many ticks to work thru
46 the pipeline, so the penalty was horrible for a branch mis-predict or
47 other event that emptied the instruction pipeline, as the units at the
48 end of the pipeline effectively had to sit there doing nothing for dozens
49 of clock ticks, waiting for the new instructions to get processed to that
50 point again, filling the pipeline. To some degree they could compensate
51 by using better branch prediction, pre-caching, and other techniques, but
52 it really wasn't nearly enough to fully compensate for the penalty they
53 were paying when the prediction was wrong, due to the incredibly deep
54 pipelining.
55
56 So the Intel engineers came up with the solution the marketers billed
57 "hyper-threading" in ordered to try to claw back some of the performance
58 they were losing due to all this. Basically, they added a bit of very
59 fast local storage, giving the CPU access to it on a swapping basis.
60 When one thread ran into a mis-prediction, thereby emptying the pipeline,
61 instead of the components at the end of the pipeline waiting idle for
62 several dozen clocks for the pipeline to refill, they swapped to the
63 hyperthread and continued working on it. Ideally, by the time it got
64 stuck, the first one was ready to go again, so they could switch back to
65 it, while they waited on the other one now.
66
67 Thus, what was really happening was that they were trying desperately to
68 compensate for their design choice of an overly deep pipeline (forced on
69 them by the pursuit of ever faster clock rates), and the marketers billed
70 hyper-threading, in reality a very very clever but not really adequate
71 compensation for a bad design choice, as a feature they were able to sell
72 surprisingly effectively.
73
74 Meanwhile, AMD saw the light and decided the MHz game simply wasn't going
75 to work for them. They decided the loss of performance per clock they
76 were seeing continuing to play the MHz game just wasn't worth it, and
77 deliberately did NOT continue targeting the ever increasing clock rates,
78 instead, choosing to emphasize their AMD64 instruction set and other
79 features.
80
81 As a result, AMD's chips didn't have to pay the price of the incredibly
82 deep pipeline Intel was using, and with their shorter pipeline, the
83 penalty for mis-prediction was much lower as well, and it didn't really
84 make sense to do the hyper-threading thing because it didn't really help
85 with the lower mis-prediction penalty they were paying.
86
87 Thus, AMD never needed hyper-threading as compensation for their bad
88 design choices and never implemented it, thus never getting to sell the
89 very clever but still poor workaround for a poor design choice as a great
90 feature, as Intel was doing at the time.
91
92 So that's where all the hype over hyper-threading first started.
93 Eventually, tho, Intel realized the cost it was paying for pursuit of the
94 MHz God wasn't worth it, and they came out with the Core-2s, which REALLY
95 gave AMD a run for the money. (Truth be told, the core-2s were spanking
96 AMD's butt, performance-wise. Added to that AMD in its turn slipped up
97 with its original quad-core implementation in the phenoms, handing Intel
98 the win for another few quarters. The problem of course being that Intel
99 is a far larger company than AMD, so it fumbling as it did for a couple
100 years, didn't hurt it near as much as AMD's fumbling for just a couple
101 quarters!)
102
103 Soon enough the real multi-cores came out, and hyper-threading as a
104 rather poor substitute was somewhat forgotten. However, Intel, having
105 sold it as this great feature, found it was still in demand, with people
106 wondering why their dual-cores couldn't use hyper-threading to appear as
107 four cores, just as the single-core netburst arch had appeared as dual-
108 cores.
109
110 So the Intel marketing folks stuck their heads together with the
111 engineering folks, and soon enough, hyper-threaded dual-cores were
112 available as well. The new architecture didn't really gain that much
113 benefit from it as Intel had long since worked thru their way-too-long-
114 pipeline issues, so with the exception of rare corner-cases, hyper-
115 threading was now mostly buying performance directly from the real cores,
116 and there was no gain under most loads that couldn't have been at least
117 equally achieved by using the same transistor budget elsewhere, say for
118 more cache, but once the market had been programmed to accept hyper-
119 threading as a solution, it demanded it, and seeing those extra "fake"
120 cores listed /did/ look impressive, so Intel continued to provide what
121 the market was now demanding, real performance gain or not.
122
123 That's where we are today. On a modern CPU, hyper-threading provides
124 very little real performance gain, one that actually may be a loss if one
125 considers what else that same transistor budget could have otherwise been
126 used for, but the market, once programmed for it, now continues to demand
127 it, so Intel continues to provide it.
128
129 The (main) source for much of my understanding at the level explained
130 above is Arstechnica's CPU writeups over the years, with additional
131 articles as found on Tom's Hardware, Slashdot, and elsewhere. Of course,
132 when Ars does it, it's complete with unit and instruction flow diagrams,
133 etc, plus much more detail that I gave above. Anybody that's interested
134 in this sort of thing really should follow Ars, as they have a guy that's
135 really an expert in it following the industry for them, doing writeups on
136 new developments generally some time after initial announcement, but
137 before or immediately after initial full public release. I've been
138 following the articles there since the Pentium Pro era and the
139 reliability level is very high.
140
141 --
142 Duncan - List replies preferred. No HTML msgs.
143 "Every nonfree program has a lord, a master --
144 and if you use the program, he is your master." Richard Stallman

Replies

Subject Author
[gentoo-amd64] Re: Hyper-threading an AMD64 3800+ Nikos Chantziaras <realnc@×××××.de>