1 |
On Friday 22 June 2007 16:02:59 Duncan wrote: |
2 |
|
3 |
> Was it you who posted about this before, or someone else? If it wasn't |
4 |
> you, take a look back thru the list a couple months, as it did come up |
5 |
> previously. You may have someone to compare notes with. =8^) |
6 |
|
7 |
Nope, t'was I, and I've now tried lots more things but with no success. |
8 |
|
9 |
> Separate processes or separate threads? Two CPUs (um, two separate |
10 |
> sockets) or two cores on the same CPU/socket? |
11 |
|
12 |
Twin sockets, single Opteron 246 cores. And the BOINC scheduler sets off |
13 |
separate processes; they have distinct PIDs and on the old motherboard ran |
14 |
on distinct CPUs. |
15 |
|
16 |
> There are [...] differences in architecture between AMD (with its onboard |
17 |
> memory controller and closer cooperation, both between cores and between |
18 |
> CPUs on separate sockets, due to the direct Hypertransport links) and |
19 |
> Intel (with its off-chip controller and looser inter-core, inter- chip, |
20 |
> and inter-socket cooperation). There's also differences in the way you |
21 |
> can configure both the memory (thru the BIOS) and the kernel, for separate |
22 |
> NUMA access or unified view memory. If these settings don't match your |
23 |
> actual physical layout, efficiency will be less than peak, either because |
24 |
> there won't be enough resistance to a relatively high cost switching |
25 |
> between CPUs/cores and memory, so they'll switch off frequently |
26 |
> with little reason, incurring expensive delays each time, or because |
27 |
> there's too much resistance and to much favor placed on what the kernel |
28 |
> thinks is local vs remote memory, when it's all the same, and there is in |
29 |
> fact very little cost to switching cores/CPUs. |
30 |
|
31 |
This board has eight DIMM sockets, four arranged next to each CPU socket and |
32 |
associated with it electrically and logically. I have four 1GB DIMMs, each |
33 |
in (I hope) the right pair of sockets in each bank of four. In other words, |
34 |
each CPU has 2GB of local RAM. I suppose I could buy as much RAM again and |
35 |
fill up all the sockets :-) |
36 |
|
37 |
> If it's AMD with its onboard memory controllers, two sockets means two |
38 |
> controllers, and you'll also want to consider NUMA, tho you can disable it |
39 |
> and interleave your memory if you wish, for a unified memory view and |
40 |
> higher bandwidth, but at the tradeoff of higher latency and less efficient |
41 |
> memory access when separate tasks (each running on a CPU) both want to use |
42 |
> memory at the same time. |
43 |
|
44 |
NUMA is switched on in BIOS and kernel config. I still find some of the |
45 |
BIOS settings mysterious though, so perhaps I don't have it set up right. I |
46 |
have tried the two sets of defaults, failsafe and optimised, but with no |
47 |
effect on this problem. |
48 |
|
49 |
> In particular, you'll want to pay attention to the following kernel config |
50 |
> settings under Processor type and features: |
51 |
> |
52 |
> 1) Symmetric multiprocessing support (CONFIG_SMP). You probably have |
53 |
> this set right or you'd not be using multiple CPUs/cores. |
54 |
|
55 |
Yep. |
56 |
|
57 |
> 2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and |
58 |
> on the older Hyperthreading Netburst arch models. |
59 |
|
60 |
Nope. |
61 |
|
62 |
> 3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if |
63 |
> you have true dual cores. Again, note that the first "dual core" Intel |
64 |
> units were simply two separate CPUs in the same package, so you probably |
65 |
> do NOT want this for them. |
66 |
|
67 |
Nope. |
68 |
|
69 |
> 4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA) [...] You |
70 |
> probably DO want this on AMD multi-socket Opteron systems, BUT note that |
71 |
> there may be BIOS settings for this as well. It won't work so efficiently |
72 |
> if the BIOS setting doesn't agree with the kernel setting. |
73 |
|
74 |
Yep. |
75 |
|
76 |
> 5) If you have NUMA support enabled, you'll also want either Old style |
77 |
> AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA |
78 |
> detection (CONFIG_X86_64_ACPI_NUMA). |
79 |
|
80 |
Tried both together and each separately. No difference. |
81 |
|
82 |
> 6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled. |
83 |
|
84 |
No point in emulating something that's present. :-) |
85 |
|
86 |
> What I'm wondering, of course, is whether you have NUMA turned on when |
87 |
> you shouldn't, or don't have core scheduling turned on when you should, |
88 |
> thus artificially increasing the resistance to switching cores/cpus and |
89 |
> causing the stickiness. |
90 |
|
91 |
I don't think so. |
92 |
|
93 |
> If [...] you were correct when you said BOINC starts two |
94 |
> separate /processes/ (not threads), |
95 |
|
96 |
I'm sure I was correct. |
97 |
|
98 |
> or if BOINC happens to use the older/heavier Linux threads model (which |
99 |
> again will cause the threads to show up as separate processes), |
100 |
|
101 |
I can't be quite certain this isn't happening, but I'm nearly so. |
102 |
|
103 |
> There are two scheduling utility packages that include utilities to tie |
104 |
> processes to one or more specific processors. |
105 |
> |
106 |
> sys-process/schedutils is what I have installed. It's a collection of |
107 |
> separate utilities, including taskset, by which I can tell the kernel |
108 |
> which CPUs I want specific processes to run on. |
109 |
|
110 |
> If you prefer a single do-it-all scheduler-tool, perhaps easier to learn |
111 |
> if you plan to fiddle with more than simply which CPU a process runs on, |
112 |
> and want to learn it all at once, sys-process/schedtool may be more your |
113 |
> style. |
114 |
|
115 |
I'll look into those - thanks. |
116 |
|
117 |
-- |
118 |
Rgds |
119 |
Peter Humphrey |
120 |
Linux Counter 5290, Aug 93 |
121 |
-- |
122 |
gentoo-amd64@g.o mailing list |