Gentoo Archives: gentoo-user

From: Kai Krakow <hurikhan77@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: distributed emerge
Date: Wed, 27 Sep 2017 00:27:41
Message-Id: 20170927022720.5c98fe82@jupiter.sol.kaishome.de
In Reply to: [gentoo-user] Re: distributed emerge by Kai Krakow
1 Am Wed, 27 Sep 2017 02:04:12 +0200
2 schrieb Kai Krakow <hurikhan77@×××××.com>:
3
4 > Am Mon, 25 Sep 2017 21:35:02 +1000
5 > schrieb Damo Brisbane <dhatchett2@×××××.com>:
6 >
7 > > Can someone point where I might go for parallel @world build, it is
8 > > really for my own curiositynat this time. Currently I stage binaries
9 > > for multiple machines on a single nfs share, but the assumption is
10 > > to use instead some distributed filesystem. So I think I just need a
11 > > recipie, pointers or ideas on how to distribute emerge on an @world
12 > > set? I am thinking granular first, ie per package rather than eg
13 > > distributed gcc within a single package.
14 >
15 > As others already pointed out, distcc introduces more headache then it
16 > solves.
17 >
18 > If you are searching for a solution due to performance of package
19 > building, you get most profit from building on tmpfs.
20 >
21 > Then, I also suggest going breadth first, thus building more packages
22 > at the same time.
23 >
24 > Your question implies depth first which means having more compiler
25 > processes running at a time for a single package. But most build
26 > processes do not scale out very well for the following reasons:
27 >
28 > 1. Configure phases are serial processes
29 >
30 > 2. Dependencies in Makefile are often buggy or incomplete
31 >
32 > 3. Dependencies between source files often allow parallel
33 > building only for short burst throughout the complete
34 > build and are serial otherwise
35 >
36 > Building packages in parallel instead solves all these problems: Each
37 > build phase can one in parallel to every other build phase. So while a
38 > serialized configure phase is running or package is bundled/merged,
39 > another package can have multiple gccs running while a third package
40 > maybe builds serialized due to source file deps.
41 >
42 > Also, emerge is very IO bound. Resorting to distcc won't solve this,
43 > as a lot of compiler internals need to be copied back and forth
44 > between the peers. It may even create more IO than building locally
45 > only. Using tmpfs instead solves this much better.
46 >
47 > I'm using the following settings and have 100% on all eight cores
48 > almost all the time during emerge, while IO is idle most of the time:
49 >
50 > MAKEOPTS="-s -j9 -l8"
51 > FEATURES="sfperms parallel-fetch parallel-install protect-owned \
52 > userfetch splitdebug fail-clean cgroup compressdebug buildpkg \
53 > binpkg-multi-instance clean-logs userpriv usersandbox"
54 > EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --binpkg-changed-deps=y \
55 > --jobs=10 --load-average 8 --keep-going --usepkg"
56 >
57 > $ fgrep portage /etc/fstab
58 > none /var/tmp/portage tmpfs
59 > noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=32G,mode=770,uid=portage,gid=portage
60 >
61 > Have either enough swap or lower the tmpfs allocation.
62 >
63 > Using FEATURES buildpkg pinpkg-multi-instance allows to reuse packages
64 > on different but similar machines. EMERGE_DEFAULT_OPTS makes use of
65 > this. /usr/portage/{distfiles,packages} is on shared media.
66 >
67 > Also, I'm usually building world upgrades with --changed-deps to
68 > rebuild dependers and update the bin packages that way.
69 >
70 > I'm not sure, tho, if running emerge in parallel on two machines would
71 > pickup newly appearing binpkgs during the process... I guess, not. I
72 > usually don't do that except the dep tree looks independent between
73 > both machines.
74 >
75 > If your machine cannot saturate the CPU throughout the whole emerge
76 > process (as long as there are parallel ebuild running), then distcc
77 > will clearly not help you, make the complete process slower due to
78 > waiting on remote resources, and even increase the load. Only very
79 > few, huge projects, with Makefile deps very clearly optimized or
80 > specially crafted for distributed builds can benefit from distcc.
81 > Most projects aren't of this type, even Chromium and LibreOffice
82 > don't. Exactly, those projects have way to much meta data to
83 > transport between the distcc peers.
84 >
85 > But YMMV. I'd say, try a different path first.
86
87 I imagine one case where distcc could help you: If the building machine
88 (that one running emerge) is very constraint on system resources. But
89 in that case, the much better performing option is still staging the
90 builds on another machine and using binary install on that low-resource
91 machine.
92
93
94 --
95 Regards,
96 Kai
97
98 Replies to list-only preferred.