1 |
Hi All |
2 |
|
3 |
Consider this a tentative first email to test the water, but I have started to look at performance |
4 |
of particularly the install phase of the emerge utility and I could use some guidance on where to go |
5 |
next |
6 |
|
7 |
Firstly, to define the "problem": I have found gentoo to be a great base for building custom |
8 |
distributions and I use it to build a small embedded distro which runs on a couple of different |
9 |
architectures. (Essentially just a "ROOT=/something emerge $some_packages"). However, I use some |
10 |
packaging around binpackages to avoid uncessary rebuilds, and this highlights that "building" a |
11 |
complete install using only binary packages rarely gets over a load of 1. Can we do better than |
12 |
this? Seems to be highly serialised on the install phase of copying the files to the disk? |
13 |
|
14 |
(Note I use parallel build and parallel-install flags, plus --jobs=N. If there is code to compile |
15 |
then load will shoot up, but simply installing binpackages struggles to get the load over about |
16 |
0.7-1.1, so presumably single threaded in all parts?) |
17 |
|
18 |
|
19 |
Now, this is particularly noticeable where I cheated to build my arm install and just used qemu |
20 |
user-mode on an amd64 host (rather than using cross-compile). Here it's very noticeable that the |
21 |
install/merge phase of the build is consuming much/most of the install time. |
22 |
|
23 |
eg, random example (under qemu user mode) |
24 |
|
25 |
# time ROOT=/tmp/timetest emerge -1k --nodeps openssl |
26 |
|
27 |
>>> Emerging binary (1 of 1) dev-libs/openssl-1.1.1k-r1::gentoo for /tmp/timetest/ |
28 |
... |
29 |
real 0m30.145s |
30 |
user 0m29.066s |
31 |
sys 0m1.685s |
32 |
|
33 |
|
34 |
Running the same on the native host is about 5-6sec, (and I find this ratio fairly consistent for |
35 |
qemu usermode, about 5-6x slower than native) |
36 |
|
37 |
If I pick another package with fewer files, then I will see this 5-6 secs drop, suggesting (without |
38 |
offering proof) that the bulk of the time here is some "per file" processing. |
39 |
|
40 |
Note this machine is a 12 core AMD ryzen 3900x with SSDs that bench around the 4GB/s+. So really 5-6 |
41 |
seconds to install a few files is relatively "slow". Random benchmark on this machine might be that |
42 |
I can backup 4.5GB of chroot with tar+zstd in about 4 seconds. |
43 |
|
44 |
|
45 |
So the question is: I assume that further parallelisation of the install phase will be difficult, |
46 |
therefore the low hanging fruit here seems to be the install/merge phase and why there seems to be |
47 |
quite a bit of CPU "per file installed"? Can anyone give me a leg up on how I could benchmark this |
48 |
further and look for the hotspot? Perhaps someone understand the architecture of this point more |
49 |
intimately and could point at whether there are opportunities to do some of the processing on mass, |
50 |
rather than per file? |
51 |
|
52 |
I'm not really a python guru, but interested to poke further to see where the time is going. |
53 |
|
54 |
|
55 |
Many thanks |
56 |
|
57 |
Ed W |