Gentoo Archives: gentoo-python

From: "Michał Górny" <mgorny@g.o>
To: gentoo-python@l.g.o
Cc: python@g.o
Subject: [gentoo-python] The future of parallel phases?
Date: Sat, 22 Nov 2014 22:06:37
Message-Id: 20141122230629.5fae98b1@pomiot.lan
1 Hello, Python team and other nice people.
2
3 I'd like to discuss the topic of parallel runs again. While it sounded
4 like a good idea at first, I have my doubts now. I'll try to shortly
5 describe the implementation, then recollect the advantages
6 and disadvantages of it.
7
8
9 Implementation
10 --------------
11
12 Right now, parallel runs are feature of multibuild.eclass which in turn
13 uses multiprocessing.eclass. It's some hacky implementation in bash but
14 it works. However, calling 'die' inside such implementation is illegal
15 per PMS and the Council is not interested in changing that even though
16 we're doing that a lot :).
17
18 The parallel support is implemented in python-r1 through
19 python_parallel_foreach_impl function. This function in turn is used to
20 implement parallel running of sub-phases in distutils-r1.
21
22
23 Adv. and disadv. of parallel phases now
24 ---------------------------------------
25
26 Advantages:
27
28 - speedup of non-parallel build tasks -- compiling Python modules,
29 extensions (before Python 3.4 [?]), running 2to3. The latter uses to
30 take a lot of CPU time while utilizing only one core on a modern CPU.
31 Running it in parallel for few impls makes it possible to utilize
32 full power of the CPU.
33
34 - speedup of PyPy phase runs -- PyPy and PyPy3 take quite long to
35 start. By spawning their phases first and in parallel to CPython
36 runs, we can speed the build up a bit. The idea is that
37 implementations that usually take longer to build are spawned first
38 so that the machine is kept multi-core busy as long as possible.
39
40 - finding of silly assumptions in build systems -- we have a lot of
41 build systems that write in random locations and expect files not to
42 be touched by anything else.
43
44 Disadvantages:
45
46 - conflict with parallel parts of build -- I think Python 3.4's
47 distutils is capable of building extensions in parallel [can we
48 backport that?]. The same goes for nosetests and possibly some other
49 stuff.
50
51 - possibility of high resource usage -- this especially applies to
52 tests which aren't made with assumption that someone will be running,
53 say, 4 instances of them in parallel.
54
55 - necessity of fighting build system bugs -- it's rather common that
56 tests and builds write to files in sourcedir or tempdir without
57 proper unique naming. Long story short, we need to workaround that
58 stuff a lot to get the tests not to fail randomly, and the build to
59 install correct files (and e.g. not mix implementations).
60
61 - some developers are surprised that variables set inside sub-phases
62 are not preserved in global scope (due to subshell).
63
64
65 What if we disabled it?
66 -----------------------
67
68 Advantages:
69
70 - the eclass becomes a small bit simpler, and loses the dependency on
71 multiprocessing (well, it will still be inherited implicitly but not
72 used).
73
74 - developers no longer have to fix all the upstream build system
75 failures.
76
77 - resource-consuming and parallel parts of build no longer have to be
78 hacked to avoid issues with multiprocessing.
79
80 - we comply to PMS again.
81
82 Disadvantages:
83
84 - 2to3 and pure Python module build/install steps will be noticeably
85 slower and less efficient (esp. noticeable for PyPy and PyPy3).
86
87 - some ebuilds may have to be modified because developers assumed that
88 changes (global vars, working directories) from within sub-phase will
89 not affect the successive phases.
90
91
92 What are your thoughts?
93
94 --
95 Best regards,
96 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
[gentoo-python] Re: The future of parallel phases? Alex Brandt <alunduil@g.o>
Re: [gentoo-python] The future of parallel phases? "Michał Górny" <mgorny@g.o>