1 |
On Sun, Nov 27, 2011 at 4:27 AM, Mick <michaelkintzios@×××××.com> wrote: |
2 |
> On Saturday 26 Nov 2011 15:22:15 Michael Mol wrote: |
3 |
>> I just wanted to share an experience I had today with optimizing parallel |
4 |
>> builds after discovering "-l" for Make... |
5 |
>> |
6 |
>> I've got a little more tweaking I still want to do, but this is pretty |
7 |
>> awesome... |
8 |
>> |
9 |
>> http://funnybutnot.wordpress.com/2011/11/26/optimizing-parallel-builds/ |
10 |
>> |
11 |
>> ZZ |
12 |
> |
13 |
> Thanks for sharing! How do you determine the optimum value for -l? |
14 |
|
15 |
I'm making an educated guess. >.> |
16 |
|
17 |
I figure that the optimal number of simultaneous CPU-consuming |
18 |
processes is going to be the number of CPU cores, plus enough to keep |
19 |
the CPU occupied while others are blocked on I/O. That's the same |
20 |
reasoning that drives the selection of a -j number, really. |
21 |
|
22 |
If I read make's man page correctly, -l acts as a threshold, choosing |
23 |
not to spawn an additional child process if the system load average is |
24 |
above a certain value Since system load is a count of actively running |
25 |
and ready-to-run processes, you want it to be very close to your |
26 |
number of logical cores[1]. |
27 |
|
28 |
Since it's going to be a spot decision for Make as to whether or not |
29 |
to spawn another child (if it hits its limit, it's not going to check |
30 |
again until after one of its children returns), there will be many |
31 |
race cases where the load average is high when it looks, but some |
32 |
other processes will return shortly afterward.[2] That means adding a |
33 |
process or two for a fudge factor. |
34 |
|
35 |
That's a lot of guess, though, and it still comes down to guess-and-check. |
36 |
|
37 |
emerge -j8 @world # MAKEOPTS="-j16 -l10" |
38 |
|
39 |
Was the first combination I tried. This completed in 89 minutes. |
40 |
|
41 |
emerge -j8 @world # MAKEOPT="-j16 -l8" |
42 |
|
43 |
Was the second. This took significantly longer. |
44 |
|
45 |
I haven't tried higher than -l10; I needed this box to do be able to |
46 |
do things, which meant installing more software. I've gone from 177 |
47 |
packages to 466. |
48 |
|
49 |
[1] I don't have a hyperthreading system available, but I suspect that |
50 |
this is also going to be true of logical cores; It's my understanding |
51 |
that the overhead from overcommitting CPU comes primarily from context |
52 |
switching between processors, and hyperthreading adds CPU hardware |
53 |
specifically to reduce the need to context-switch in splitting |
54 |
physical CPU resources between threads/processes. So while you'd lose |
55 |
a little speed for an individual thread, you would gain it back in |
56 |
aggregate over both threads. |
57 |
|
58 |
[2] There would also be cases where the load average is low, such as |
59 |
if a Make recipe calls for a significant bit of I/O before it consumes |
60 |
a great deal of CPU, but a simple 7200rpm SATA disk appears to be |
61 |
sufficiently fast that this case is less frequent. |
62 |
-- |
63 |
:wq |