1 |
On Thu, Aug 26, 2021 at 4:03 AM Ed W <lists@××××××××××.com> wrote: |
2 |
> |
3 |
> Hi All |
4 |
> |
5 |
> Consider this a tentative first email to test the water, but I have started to look at performance |
6 |
> of particularly the install phase of the emerge utility and I could use some guidance on where to go |
7 |
> next |
8 |
|
9 |
To clarify; the 'install' phase installs the package into ${D}. The |
10 |
'qmerge' phase is the phase that merges to the livefs. |
11 |
|
12 |
> |
13 |
> Firstly, to define the "problem": I have found gentoo to be a great base for building custom |
14 |
> distributions and I use it to build a small embedded distro which runs on a couple of different |
15 |
> architectures. (Essentially just a "ROOT=/something emerge $some_packages"). However, I use some |
16 |
> packaging around binpackages to avoid uncessary rebuilds, and this highlights that "building" a |
17 |
> complete install using only binary packages rarely gets over a load of 1. Can we do better than |
18 |
> this? Seems to be highly serialised on the install phase of copying the files to the disk? |
19 |
|
20 |
In terms of parallelism it's not safe to run multiple phase functions |
21 |
simultaneously. This is a problem in theory and occasionally in |
22 |
practice (recently discussed in #gentoo-dev.) |
23 |
The phase functions run arbitrary code that modifies the livefs (as |
24 |
pre / post install and rm can touch $ROOT.) As an example we observed |
25 |
recently; font ebuilds will generate font related metadata. If 2 |
26 |
ebuilds try to generate the metadata at the same time; they can race |
27 |
and cause unexpected results. Sometimes this is caught in the ebuild |
28 |
(e.g. they wrote code like rebuild_indexes || die and the indexer |
29 |
returned non-zero) but can simply result in silent data corruption |
30 |
instead; particularly if the races go undetected. |
31 |
|
32 |
> |
33 |
> (Note I use parallel build and parallel-install flags, plus --jobs=N. If there is code to compile |
34 |
> then load will shoot up, but simply installing binpackages struggles to get the load over about |
35 |
> 0.7-1.1, so presumably single threaded in all parts?) |
36 |
> |
37 |
> |
38 |
> Now, this is particularly noticeable where I cheated to build my arm install and just used qemu |
39 |
> user-mode on an amd64 host (rather than using cross-compile). Here it's very noticeable that the |
40 |
> install/merge phase of the build is consuming much/most of the install time. |
41 |
> |
42 |
> eg, random example (under qemu user mode) |
43 |
|
44 |
I think perhaps a simpler test is to use qmerge (from portage-utils)? |
45 |
If you can use emerge (e.g. in --pretend mode) to generate a package |
46 |
list to merge; you can simply merge them with qmerge. I suspect qmerge |
47 |
will both (a) be faster and (b) be less safe than emerge; as emerge is |
48 |
doing a bunch of extra work you may or may not care about. You can |
49 |
also consider running N qmerge's (again less sure how safe this is; as |
50 |
the writes by qmerge may be racy.) Note again that this speed may not |
51 |
come for free and you may end up with a corrupt image afterwards. |
52 |
|
53 |
I'm not sure if folks are running qmerge in production like this |
54 |
(maybe others on the list have experience.) |
55 |
|
56 |
> |
57 |
> # time ROOT=/tmp/timetest emerge -1k --nodeps openssl |
58 |
> |
59 |
> >>> Emerging binary (1 of 1) dev-libs/openssl-1.1.1k-r1::gentoo for /tmp/timetest/ |
60 |
> ... |
61 |
> real 0m30.145s |
62 |
> user 0m29.066s |
63 |
> sys 0m1.685s |
64 |
> |
65 |
> |
66 |
> Running the same on the native host is about 5-6sec, (and I find this ratio fairly consistent for |
67 |
> qemu usermode, about 5-6x slower than native) |
68 |
> |
69 |
> If I pick another package with fewer files, then I will see this 5-6 secs drop, suggesting (without |
70 |
> offering proof) that the bulk of the time here is some "per file" processing. |
71 |
> |
72 |
> Note this machine is a 12 core AMD ryzen 3900x with SSDs that bench around the 4GB/s+. So really 5-6 |
73 |
> seconds to install a few files is relatively "slow". Random benchmark on this machine might be that |
74 |
> I can backup 4.5GB of chroot with tar+zstd in about 4 seconds. |
75 |
> |
76 |
> |
77 |
> So the question is: I assume that further parallelisation of the install phase will be difficult, |
78 |
> therefore the low hanging fruit here seems to be the install/merge phase and why there seems to be |
79 |
> quite a bit of CPU "per file installed"? Can anyone give me a leg up on how I could benchmark this |
80 |
> further and look for the hotspot? Perhaps someone understand the architecture of this point more |
81 |
> intimately and could point at whether there are opportunities to do some of the processing on mass, |
82 |
> rather than per file? |
83 |
> |
84 |
> I'm not really a python guru, but interested to poke further to see where the time is going. |
85 |
> |
86 |
> |
87 |
> Many thanks |
88 |
> |
89 |
> Ed W |
90 |
> |
91 |
> |
92 |
> |
93 |
> |
94 |
> |