1 |
Volker Armin Hemmann <volkerarmin@××××××××××.com> posted |
2 |
200906110022.26698.volkerarmin@××××××××××.com, excerpted below, on Thu, |
3 |
11 Jun 2009 00:22:26 +0200: |
4 |
|
5 |
> On Donnerstag 11 Juni 2009, Greg wrote: |
6 |
>> I've been having trouble determining if my processor has |
7 |
>> hyper-threading. I'm thinking that it does. I know that it isn't a |
8 |
>> dual-core. |
9 |
>> |
10 |
>> If it is a hyper-thread processor, I can't seem to figure out exactly |
11 |
>> how to enable the hyper-thread under linux. |
12 |
> |
13 |
> no amd supports hyper-threading. They have that flag because they are |
14 |
> compatible - and if they are multicore to 'trick' stupid software that |
15 |
> checks for ht to multi thread but does not multithread on multicore |
16 |
> cpus. |
17 |
|
18 |
More to the point, AMD CPUs don't /need/ hyper-threading to run |
19 |
efficiently. |
20 |
|
21 |
Here's the deal on hyper-threading. |
22 |
|
23 |
It first became popular (and I believe was first introduced, but I may be |
24 |
mistaken on that) with the Intel "Netburst" architecture, back in the |
25 |
last gasps of the clock-rate-is-everything era when Intel was doing |
26 |
everything they could to write those last few hundred MHz out of their |
27 |
CPUs, even at the expense of such deep pipelines that it actually hurt |
28 |
performance in many cases. (Plus it ran way hot, and sucked up power at |
29 |
such a rate that people were doing projections indicating that at the |
30 |
rate things were going, in a few years each CPU was going to need its own |
31 |
Nuclear reactor power supply... and the cooling to go along with it!) |
32 |
|
33 |
Happily Intel has moved beyond that stage now, and the core-2s and |
34 |
beyond, and moving to true dual-core and beyond, they once again began |
35 |
competing extremely favorably against AMD, but netburst was the last gasp |
36 |
of the old "ever higher clocks" process, and it simply didn't compete |
37 |
well at all. |
38 |
|
39 |
One of the things Intel did with netburst to keep the clock rates rising |
40 |
was create an incredibly deep instruction pipeline. Once the pipeline |
41 |
got full, the CPU still dispatched the typical instruction per clock tick |
42 |
(I say typical because some instructions take more than a tick, while |
43 |
others can be processed two at a time, so the detail is considerably more |
44 |
complex than one instruction one tick, but the general idea remains |
45 |
"typically" accurate), but each instruction took many ticks to work thru |
46 |
the pipeline, so the penalty was horrible for a branch mis-predict or |
47 |
other event that emptied the instruction pipeline, as the units at the |
48 |
end of the pipeline effectively had to sit there doing nothing for dozens |
49 |
of clock ticks, waiting for the new instructions to get processed to that |
50 |
point again, filling the pipeline. To some degree they could compensate |
51 |
by using better branch prediction, pre-caching, and other techniques, but |
52 |
it really wasn't nearly enough to fully compensate for the penalty they |
53 |
were paying when the prediction was wrong, due to the incredibly deep |
54 |
pipelining. |
55 |
|
56 |
So the Intel engineers came up with the solution the marketers billed |
57 |
"hyper-threading" in ordered to try to claw back some of the performance |
58 |
they were losing due to all this. Basically, they added a bit of very |
59 |
fast local storage, giving the CPU access to it on a swapping basis. |
60 |
When one thread ran into a mis-prediction, thereby emptying the pipeline, |
61 |
instead of the components at the end of the pipeline waiting idle for |
62 |
several dozen clocks for the pipeline to refill, they swapped to the |
63 |
hyperthread and continued working on it. Ideally, by the time it got |
64 |
stuck, the first one was ready to go again, so they could switch back to |
65 |
it, while they waited on the other one now. |
66 |
|
67 |
Thus, what was really happening was that they were trying desperately to |
68 |
compensate for their design choice of an overly deep pipeline (forced on |
69 |
them by the pursuit of ever faster clock rates), and the marketers billed |
70 |
hyper-threading, in reality a very very clever but not really adequate |
71 |
compensation for a bad design choice, as a feature they were able to sell |
72 |
surprisingly effectively. |
73 |
|
74 |
Meanwhile, AMD saw the light and decided the MHz game simply wasn't going |
75 |
to work for them. They decided the loss of performance per clock they |
76 |
were seeing continuing to play the MHz game just wasn't worth it, and |
77 |
deliberately did NOT continue targeting the ever increasing clock rates, |
78 |
instead, choosing to emphasize their AMD64 instruction set and other |
79 |
features. |
80 |
|
81 |
As a result, AMD's chips didn't have to pay the price of the incredibly |
82 |
deep pipeline Intel was using, and with their shorter pipeline, the |
83 |
penalty for mis-prediction was much lower as well, and it didn't really |
84 |
make sense to do the hyper-threading thing because it didn't really help |
85 |
with the lower mis-prediction penalty they were paying. |
86 |
|
87 |
Thus, AMD never needed hyper-threading as compensation for their bad |
88 |
design choices and never implemented it, thus never getting to sell the |
89 |
very clever but still poor workaround for a poor design choice as a great |
90 |
feature, as Intel was doing at the time. |
91 |
|
92 |
So that's where all the hype over hyper-threading first started. |
93 |
Eventually, tho, Intel realized the cost it was paying for pursuit of the |
94 |
MHz God wasn't worth it, and they came out with the Core-2s, which REALLY |
95 |
gave AMD a run for the money. (Truth be told, the core-2s were spanking |
96 |
AMD's butt, performance-wise. Added to that AMD in its turn slipped up |
97 |
with its original quad-core implementation in the phenoms, handing Intel |
98 |
the win for another few quarters. The problem of course being that Intel |
99 |
is a far larger company than AMD, so it fumbling as it did for a couple |
100 |
years, didn't hurt it near as much as AMD's fumbling for just a couple |
101 |
quarters!) |
102 |
|
103 |
Soon enough the real multi-cores came out, and hyper-threading as a |
104 |
rather poor substitute was somewhat forgotten. However, Intel, having |
105 |
sold it as this great feature, found it was still in demand, with people |
106 |
wondering why their dual-cores couldn't use hyper-threading to appear as |
107 |
four cores, just as the single-core netburst arch had appeared as dual- |
108 |
cores. |
109 |
|
110 |
So the Intel marketing folks stuck their heads together with the |
111 |
engineering folks, and soon enough, hyper-threaded dual-cores were |
112 |
available as well. The new architecture didn't really gain that much |
113 |
benefit from it as Intel had long since worked thru their way-too-long- |
114 |
pipeline issues, so with the exception of rare corner-cases, hyper- |
115 |
threading was now mostly buying performance directly from the real cores, |
116 |
and there was no gain under most loads that couldn't have been at least |
117 |
equally achieved by using the same transistor budget elsewhere, say for |
118 |
more cache, but once the market had been programmed to accept hyper- |
119 |
threading as a solution, it demanded it, and seeing those extra "fake" |
120 |
cores listed /did/ look impressive, so Intel continued to provide what |
121 |
the market was now demanding, real performance gain or not. |
122 |
|
123 |
That's where we are today. On a modern CPU, hyper-threading provides |
124 |
very little real performance gain, one that actually may be a loss if one |
125 |
considers what else that same transistor budget could have otherwise been |
126 |
used for, but the market, once programmed for it, now continues to demand |
127 |
it, so Intel continues to provide it. |
128 |
|
129 |
The (main) source for much of my understanding at the level explained |
130 |
above is Arstechnica's CPU writeups over the years, with additional |
131 |
articles as found on Tom's Hardware, Slashdot, and elsewhere. Of course, |
132 |
when Ars does it, it's complete with unit and instruction flow diagrams, |
133 |
etc, plus much more detail that I gave above. Anybody that's interested |
134 |
in this sort of thing really should follow Ars, as they have a guy that's |
135 |
really an expert in it following the industry for them, doing writeups on |
136 |
new developments generally some time after initial announcement, but |
137 |
before or immediately after initial full public release. I've been |
138 |
following the articles there since the Pentium Pro era and the |
139 |
reliability level is very high. |
140 |
|
141 |
-- |
142 |
Duncan - List replies preferred. No HTML msgs. |
143 |
"Every nonfree program has a lord, a master -- |
144 |
and if you use the program, he is your master." Richard Stallman |