1 |
On 9/12/2012 5:58 AM, Ian Stakenvicius wrote: |
2 |
> -----BEGIN PGP SIGNED MESSAGE----- |
3 |
> Hash: SHA256 |
4 |
> |
5 |
> On 12/09/12 05:55 AM, Gregory M. Turner wrote: |
6 |
>> |
7 |
>> Note that, effectively, we have this already, and it's called |
8 |
>> "portage". But one could certainly make a case for modularizing it |
9 |
>> better, since, in truth, we are talking about a very common, very |
10 |
>> abstract problem here which portage shares with any number of |
11 |
>> batch-build systems. |
12 |
>> |
13 |
>> Such an engine could very well do exactly the right thing if it |
14 |
>> were faced with a constraint that a certain part of a certain build |
15 |
>> needed to proceed without parallelism due to limitations coming |
16 |
>> from the build. |
17 |
>> |
18 |
>> Also, there are very large parts of most builds -- configure comes |
19 |
>> to mind -- that don't parallelize even if, perhaps, they should. |
20 |
>> In such cases, a really smart global parallelism arbiter could |
21 |
>> easily respond by spawning more jobs from other builds. |
22 |
>> |
23 |
> |
24 |
> So essentially what you're saying here is that it might be worthwhile |
25 |
> to look into parallelism as a whole and possibly come up with a |
26 |
> solution that combines 'emerge --jobs' and build-system parallelism |
27 |
> together to maximum benefit? |
28 |
|
29 |
Yeah, couldn't have said it better myself ... apparently :) |
30 |
|
31 |
> Advanced HPC systems (sys-cluster/torque along with an appropriate |
32 |
> scheduler, for instance) can do such things with their jobs when the |
33 |
> jobs are properly built; I could see portage being able to handle this |
34 |
> as well given most of what is necessary is already known (ebuild |
35 |
> phases, build system type (via eclass), etc). However, given the |
36 |
> limitations already put on parallelism in terms of emerge order, etc, |
37 |
> I could see this solution needing to be -very- complex and integration |
38 |
> needing to occur on multiple levels. We'd also need to consider |
39 |
> distcc (and other cluster-shared compilation methods if there are |
40 |
> any??).. It would be an interesting project, though. |
41 |
|
42 |
ACK all of the above. |
43 |
|
44 |
Tempting to think more deeply about this but probably the last thing I |
45 |
need to do right now is to talk myself into another speculative project. |
46 |
|
47 |
I've hurt my wrist a bit -- probably an RSI -- so should help deter me :S |
48 |
|
49 |
Only a few major sources of parallelism exist in portage: --jobs / |
50 |
--load-average in emerge opts, multiprocessing eclass & equiv. ebuild |
51 |
helper, distcc, and make... Infrastructure is already in place for all |
52 |
of those, so perhaps a good holistic solution exists that isn't /too/ |
53 |
complicated. |
54 |
|
55 |
...OK another f!#!%$^ brainstorm incoming :) |
56 |
|
57 |
For "JOBS" syntax... what really seems missing in portage are: |
58 |
|
59 |
o a clean way to say "dont parallelize this particular make |
60 |
invocation" in ebuilds |
61 |
|
62 |
o a clean way to globally say "try to use this parallelization |
63 |
strategy when emerging." |
64 |
|
65 |
So what about something like: |
66 |
|
67 |
o EMERGE_JOBS and EMERGE_LOAD_AVERAGE make.conf vars equiv. to |
68 |
--jobs and --load-average emerge options |
69 |
|
70 |
o EBUILD_JOBS and EBUILD_LOAD_AVERAGE make.conf vars |
71 |
|
72 |
o If the latter are not specified, they are copied respectively from |
73 |
the former (debatable for *_JOBS, since now we get 16 processes when |
74 |
we asked for four). |
75 |
|
76 |
o MAKEOPTS is auto-extended to reflect EBUILD_JOBS/EBUILD_LOAD_AVERAGE |
77 |
if & only if -j|--jobs|-l|--load-average options aren't provided in |
78 |
make.conf/profile/envvar MAKEOPTS |
79 |
|
80 |
o however, if MAKEOPTS "override" EBUILD_JOBS or EBUILD_LOAD_AVERAGE, |
81 |
issue a conspicuous yellow-stars warning |
82 |
|
83 |
o extend "emake" to accept a "--non-parallel" option which will |
84 |
strip all -j|--jobs|-l|--load-average options from MAKEOPTS; |
85 |
perhaps support an equivalent EBUILD_NON_PARALLEL envvar as well, |
86 |
with support for override in profile.bashrc. Don't warn about this |
87 |
overriding EBUILD_JOBS -- treat as SOP. |
88 |
|
89 |
o debatable: respect EBUILD_NON_PARALLEL in multiprocessing, etc? |
90 |
or, perhaps, something like: |
91 |
|
92 |
EMAKE_NON_PARALLEL=${EMAKE_NON_PARALLEL:-${EBUILD_NON_PARALLEL:-no}} |
93 |
|
94 |
could be used to distinguish between "don't use any parallelism" |
95 |
and "don't use GNU's make parallelism in emake". Also maybe a |
96 |
better name exists that doesn't use double-negatives. |
97 |
|
98 |
? |
99 |
|
100 |
Seems to me something vaguely like the above would provide |
101 |
|
102 |
o backward compatibility for ebuilds and make.conf |
103 |
|
104 |
o not so vastly different than what we have |
105 |
|
106 |
o a decent way to specify what "we really want" globally; |
107 |
insofar as portage doesn't do the best job effecting the requested |
108 |
parallelization strategy, more ambitious tactics could be |
109 |
implemented later, hopefully without huge interface revisions. |
110 |
|
111 |
-gmt |
112 |
|
113 |
P.S.: |
114 |
|
115 |
(Kind-of-crazy additional idea: put ceil(sqrt(EMERGE_JOBS)) into |
116 |
EBUILD_JOBS when only the former is specified, and then let |
117 |
effective_emerge_jobs equal floor(EMERGE_JOBS/EBUILD_JOBS).... but maybe |
118 |
too much automagic for this to be a good idea.) |