1 |
Hello, Python team and other nice people. |
2 |
|
3 |
I'd like to discuss the topic of parallel runs again. While it sounded |
4 |
like a good idea at first, I have my doubts now. I'll try to shortly |
5 |
describe the implementation, then recollect the advantages |
6 |
and disadvantages of it. |
7 |
|
8 |
|
9 |
Implementation |
10 |
-------------- |
11 |
|
12 |
Right now, parallel runs are feature of multibuild.eclass which in turn |
13 |
uses multiprocessing.eclass. It's some hacky implementation in bash but |
14 |
it works. However, calling 'die' inside such implementation is illegal |
15 |
per PMS and the Council is not interested in changing that even though |
16 |
we're doing that a lot :). |
17 |
|
18 |
The parallel support is implemented in python-r1 through |
19 |
python_parallel_foreach_impl function. This function in turn is used to |
20 |
implement parallel running of sub-phases in distutils-r1. |
21 |
|
22 |
|
23 |
Adv. and disadv. of parallel phases now |
24 |
--------------------------------------- |
25 |
|
26 |
Advantages: |
27 |
|
28 |
- speedup of non-parallel build tasks -- compiling Python modules, |
29 |
extensions (before Python 3.4 [?]), running 2to3. The latter uses to |
30 |
take a lot of CPU time while utilizing only one core on a modern CPU. |
31 |
Running it in parallel for few impls makes it possible to utilize |
32 |
full power of the CPU. |
33 |
|
34 |
- speedup of PyPy phase runs -- PyPy and PyPy3 take quite long to |
35 |
start. By spawning their phases first and in parallel to CPython |
36 |
runs, we can speed the build up a bit. The idea is that |
37 |
implementations that usually take longer to build are spawned first |
38 |
so that the machine is kept multi-core busy as long as possible. |
39 |
|
40 |
- finding of silly assumptions in build systems -- we have a lot of |
41 |
build systems that write in random locations and expect files not to |
42 |
be touched by anything else. |
43 |
|
44 |
Disadvantages: |
45 |
|
46 |
- conflict with parallel parts of build -- I think Python 3.4's |
47 |
distutils is capable of building extensions in parallel [can we |
48 |
backport that?]. The same goes for nosetests and possibly some other |
49 |
stuff. |
50 |
|
51 |
- possibility of high resource usage -- this especially applies to |
52 |
tests which aren't made with assumption that someone will be running, |
53 |
say, 4 instances of them in parallel. |
54 |
|
55 |
- necessity of fighting build system bugs -- it's rather common that |
56 |
tests and builds write to files in sourcedir or tempdir without |
57 |
proper unique naming. Long story short, we need to workaround that |
58 |
stuff a lot to get the tests not to fail randomly, and the build to |
59 |
install correct files (and e.g. not mix implementations). |
60 |
|
61 |
- some developers are surprised that variables set inside sub-phases |
62 |
are not preserved in global scope (due to subshell). |
63 |
|
64 |
|
65 |
What if we disabled it? |
66 |
----------------------- |
67 |
|
68 |
Advantages: |
69 |
|
70 |
- the eclass becomes a small bit simpler, and loses the dependency on |
71 |
multiprocessing (well, it will still be inherited implicitly but not |
72 |
used). |
73 |
|
74 |
- developers no longer have to fix all the upstream build system |
75 |
failures. |
76 |
|
77 |
- resource-consuming and parallel parts of build no longer have to be |
78 |
hacked to avoid issues with multiprocessing. |
79 |
|
80 |
- we comply to PMS again. |
81 |
|
82 |
Disadvantages: |
83 |
|
84 |
- 2to3 and pure Python module build/install steps will be noticeably |
85 |
slower and less efficient (esp. noticeable for PyPy and PyPy3). |
86 |
|
87 |
- some ebuilds may have to be modified because developers assumed that |
88 |
changes (global vars, working directories) from within sub-phase will |
89 |
not affect the successive phases. |
90 |
|
91 |
|
92 |
What are your thoughts? |
93 |
|
94 |
-- |
95 |
Best regards, |
96 |
Michał Górny |