Gentoo Archives: gentoo-user

From: Sam Bishop <sam@××××××.email>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Open Question: The feasibility of a complete portage binhost
Date: Wed, 21 Jan 2015 14:00:26
Message-Id: CAC9sXgm=MZJSD2EcZQdGPvecHK24o94Gd==KX_Hpv284N=cuAg@mail.gmail.com
In Reply to: Re: [gentoo-user] Open Question: The feasibility of a complete portage binhost by Alec Ten Harmsel
1 On 21 January 2015 at 21:23, Alec Ten Harmsel <alec@××××××××××××××.com> wrote:
2 >
3 > On 01/21/2015 07:47 AM, Sam Bishop wrote:
4 >> So I've been thinking crazy thoughts.
5 >>
6 >> Theoretically it can't be that hard to do a complete package binhost for gentoo.
7 >
8 > I love that you qualify this with "theoretically."
9 >
10
11 I'm in a position where the cost of these servers may become less than
12 the cost of paying developers to wait while ebuilds compile. So II'm
13 having a semi-serious theoretical discussion with myself as to the
14 merits of opening this up to the entire Gentoo community and a much
15 more serious theoretical discussion here right now with anyone on this
16 list as to just how would one do this.
17
18 >>
19 >> To be clear, when i say complete, Im referring to building, all
20 >> versions of all ebuilds marked stable or unstable on amd64, with every
21 >> combination of use flags.
22 >
23 > Every ebuild with every combination of USE flags? This is likely
24 > impossible, and definitely not feasible. With 17000ish ebuilds in the
25 > portage tree and assuming each only has 2 USE flags, this would be
26 > building 17000*2^2 = 68,000 packages. If average build time is 20
27 > seconds (nice server w/ SSD and enough RAM to build in /tmp), it'd take
28 > 377ish hours to do an initial build of the tree. I guess this isn't so
29 > bad. Of course, there are outliers like www-client/firefox: 19
30 > non-language USE flags, so 2^19 different firefox permutations at a fast
31 > 5 minutes apiece would take 43000 hours. I haven't looked at
32 > REQUIRED_USE, so there could be less than 2^19 different combinations of
33 > flags; taking it down to 2^10 combinations is only 85 hours or so.
34 >
35
36 Or... looking at it another way, in order to do this in under 24 hrs,
37 the initial burst capacity would need to be, using your time estimate
38 and a healthy over estimate of capacity. It would need approximately
39 20 'nice servers'... for a day for the initial build, then a much
40 reduced number in order to continue the ongoing work of building all
41 the new changes.
42
43 >>
44 >> This pretty much boils down to bytes and bytes of storage + compute
45 >> resources. Both of which are easily available to me. So I began
46 >> pondering and here I am, thinking to myself "is this really all there
47 >> is too it"?
48 >
49 > A full CentOS mirror is ~600GB iirc, so you're gonna need a ton of storage.
50 >
51
52 1TB on AWS S3 costs me $30 ... thats about 20 minutes of developer
53 time saved to pay back the cost.
54 At the moment our build pipeline can take over 45 minutes... most of
55 it is ebuilds compiling so it won't be hard to speed up with a
56 binhost.
57 Were not exactly going to build 'less often', so this does add up.
58
59 >> Does it really come down to CPU cycles and repeatedly running through
60 >> the following commands for each combination of ebuild, version and use
61 >> flags
62 >> emerge --emptytree --onlydeps ${name}
63 >> emerge --emptytree --buildpkgonly --buildpkg ${name}
64 >>
65 >> Obviously running them in a clean environment each time, either by
66 >> chroot or other means.
67 >> Then just storing the giant binhost somewhere suitable such as an AWS
68 >> s3 bucket setup to work via HTTP so the normal tools work fine with
69 >> it.
70 >>
71 >
72 > I haven't used binpkgs in a long time, but I think USE on the client
73 > machine has to match the USE of the package being installed. Managing
74 > all of this would be a nightmare unless you wrote your own special
75 > portage server that served up binpkgs in a USE-aware way and a portage
76 > host could request a binpkg with a certain USE.
77 >
78 > Theoretically, great idea. I think this would be possible if you had
79 > maybe 3 or 4 different USE combos (i.e. one for servers, one for KDE
80 > client machines, one for gnome clients, etc.).
81 >
82 > Alec
83 >
84 > P.S. I'm reasonably sure my math is correct, but I would appreciate
85 > corrections.
86 >
87
88 I don't see why it can't be all the combinations, the issue is
89 storage, and the storage costs could be a lot lower than expected
90 given how hard it is to guess. So I would also love to see some
91 corrected/more accurate estimates, especially any that are based on
92 numbers from anyone who has been involved in running a tinderbox,
93 these aren't exactly numbers many people have sitting around haha.

Replies