Gentoo Archives: gentoo-desktop

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-desktop@l.g.o
Subject: [gentoo-desktop] Re: @system and parallel merge speedup
Date: Mon, 22 Oct 2012 09:02:46
Message-Id: pan.2012.10.22.06.26.06@cox.net
In Reply to: [gentoo-desktop] @system and parallel merge speedup by Alex Efros
1 Alex Efros posted on Sun, 21 Oct 2012 16:24:32 +0300 as excerpted:
2
3 > Hi!
4 >
5 > On Sun, Oct 21, 2012 at 08:02:47AM +0000, Duncan wrote:
6 >> Bottom line, an empty @system set really does make a noticeable
7 >> difference in parallel merge handling, speeding up especially
8 >> --emptytree @world rebuilds but also any general update that has a
9 >> significant number of otherwise @system packages and deps,
10 >> dramatically. I'm happy. =:^)
11 >
12 > I think "@system first" and "@system not merge in parallel" rules are
13 > safe to break when you just doing "--emptytree @world" on already
14 > updated OS because it's only rebuild existing packages, and all packages
15 > while compiling will see same set of other packages (including same
16 > versions). But when upgrading multiple packages (including some from
17 > original @system and some from @world) this probably may result in bugs.
18
19 In theory, you're right. In practice, I've not seen it yet, tho being
20 cautious I'd say it needs at least six months of testing (I've only been
21 testing it about a month, maybe six weeks) before I can say for sure.
22 It /was/ something I was a bit concerned about, however.
23
24 That was in fact one of the reasons I decided to try it on the netbook's
25 chroot as well, which hadn't been upgraded in a year and a half. I
26 figured if it could work reasonably well there, the chances of an
27 undiscovered real problem were much lower.
28
29 However, it /is/ worth noting that as a matter of course, I already often
30 choose to do some system-critical upgrades (portage, gcc, glibc, openrc,
31 udev) on their own, before doing the general upgrades, in part so I can
32 deal with their config file changes and note any problems right away,
33 with a relatively small changeset to deal with, as opposed to having a
34 whole slew of updates including critical system package updates happen
35 all at once, thus making it far more difficult to trace which update
36 actually broke things.
37
38 That's where the years of gentoo experience I originally mentioned comes
39 in. This isn't going to be as easy for a gentoo newbie for at least two
40 reasons. First, they're less likely to know what packages really /are/
41 system critical, and thus are more likely to unmerge them without the
42 extra unmerge warning a package in the system set gets. (I mentioned
43 that one in the first post.) Second, spotting critical updates in the
44 initial --pretend run, knowing which packages it's a good idea to upgrade
45 first, by themselves, dealing with config file updates, etc, for just
46 that critical package (and any dependency updates it might pull in),
47 before going on to the general @world upgrade, probably makes a good bit
48 of difference in practice, and gentoo newbies are rather less likely to
49 be able to make that differentiation. (I didn't specifically mention
50 that one until now.)
51
52 > As for "--emptytree @world" speedup, can you provide benchmarked values?
53 > I mean, only few packages forced to use only one CPU Core while
54 > compiling.
55 > So, merging packages in parallel may save some time mostly for doing
56 > unpack/prepare/configure/install/merge. All of them except configure
57 > actually do a lot of I/O, which most likely lose a lot in speed instead
58 > of gain when done in parallel (especially keeping in mind kernel bug
59 > 12309). So, at a glance time you may win on configure you'll mostly lose
60 > on I/O, and most of time all your CPU Cores will be loaded anyway while
61 > compiling, and doing configure in parallel to compiling unlikely save
62 > some time. This is why I think without actual benchmarking we can't be
63 > sure how faster it became (if it became faster at all, which is
64 > questionable).
65
66 Good points, and no, I can't easily provide benchmarks, both because of
67 the recent hardware upgrade here, and because portage itself has been
68 gradually improving its parallel merging abilities -- a recent update
69 changed the scheduling algorithm so it starts additional merges much
70 sooner than it did previously. (See gentoo bug 438650 fixed in portage
71 2.1.11.29 and 2.2.0_alpha140, both released on Oct 17. That I know about
72 that hints at another thing I do routinely as an experienced gentooer: I
73 always read portage's changelog and check out any referenced bugs that
74 look interesting, before I upgrade portage. To the extent practical
75 without actually reading the individual git commits, I want to know about
76 package manager changes that might affect me BEFORE I do that upgrade!)
77
78 But, I believe as core-counts rise, you're underestimating the effects of
79 portage's parallel merging abilities. In particular, a lot of packages
80 normally in @system (or deps thereof) are relatively small packages such
81 as grep, patch, sed... where the single-threaded configure step takes a
82 MUCH larger share of the total package merge time than it does with
83 larger packages. Similarly, the unpack and prepare phases, plus the
84 package phase for folks using FEATURES=binpkg, tend to be
85 single-threaded.[1]
86
87 Thus, instead of serializing several dozen small mostly single-threaded
88 package merges for packages like grep/sed/patch/util-linux/etc, depending
89 on the --jobs and --load-average numbers you feed to portage, several of
90 these end up getting done in parallel, with the portage multi-job output
91 bumping a line every few seconds because it's doing them in parallel,
92 instead of every minute or so, because it's doing one at a time.
93
94 Meanwhile, it should be obvious, but it's worth stating anyway. The
95 effect gets *MUCH* bigger as the number of cores increases. For a dual-
96 core, bah, not worth the trouble, as it could cause more problems then it
97 solves, especially if people are trying to work on other things while
98 portage is doing its thing in the background. I suspect the break-over
99 point is either triple-core or quad-core. One of the reasons portage is
100 getting better lately is because someone's taken an interest that has a
101 32-core, with a corresponding amount of memory (64 or 128 gig IIRC).
102
103 It's worth noting, as I mentioned, that I now have a 6-core, recently
104 upgraded from a dual-dual-core (4 cores), with a corresponding memory
105 upgrade, to 16 gigs.
106
107 One of the first things I noticed doing emerges was how much more
108 difficult it was to keep the 6-core actually peaked out to 100% CPU, than
109 it had been the 4-core. While I suspect there would have been a
110 difference on the quad-core (as I said I believe the break-over's
111 probably 3-4 cores), it wasn't a big deal there. Staring at that 6-core
112 running at 100% on 1-2 cores CPU-freq-maxed at 3.6 GHz, while the other
113 4-5 cores remained near idle at <20% utilization at CPU-freq-minimum 1.4
114 GHz... was VERY frustrating. So began my drive to empty @system and get
115 portage properly scheduling parallel merges for former @system packages
116 and their deps as well!
117
118 For the quad-core plus hyperthreading (thus 8 threads I take it?) you
119 mention below (4.6 GHz OC, nice! I see stock is 3.4 GHz), the boost from
120 killing @system forced serialization should definitely make a difference
121 (unless the hyperthreading doesn't do much for that work load, making it
122 effectively no better than a non-hyperthreaded quad-core. For my 6-core,
123 it made a rather big difference, and I guarantee if you had the 32-core
124 that one of the devs working on improving portage's parallelization has,
125 you'd be hot on the trail to improve it as well!
126
127 > As for me, I found very effective way to speedup emerge is upgrading
128 > from Core2Duo E6600 to i7-2600K overclocked to 4.6GHz. This speedup
129 > compilation on my system in 6 times (kernel now compiles in just 1
130 > minute). And to speedup most other (non-compilation) portage operations
131 > I use 4GB tmpfs mount on /var/tmp/portage/.
132
133 I remember reading about the 1-minute kernel compiles on i7s. Very
134 impressive.
135
136 FWIW, there's a lot of variables to fill in the blank on, before we can
137 be sure kernel build time comparisons are apples to apples (I had several
138 more paragraphs written on that, but decided it was a digression too far
139 for this post so deleted 'em), but AFAIK when I read about it (on phoronix
140 I believe), he was doing an all-yes config, so building rather more than
141 a typical customized-config gentooer, but was using a rather fast SSD,
142 which probably improved his times quite a bit compared to "spinning rust".
143
144 But I don't know if his timings included the actual compress (and if so
145 with what CONFIG_KERNEL_XXX compression option) and I don't believe they
146 included the actual install, only the build.
147
148 That said, a 1-minute all-yes-config kernel build time is impressive
149 indeed, the envy of many, including me. (OTOH, my fx6100 was on sale for
150 $100, $109 post-tax. That's lower than pricewatch's $118 lowest quote
151 (shipped, no tax), and only about 40% of the $273 low quote for an
152 i7-2600k.)
153
154 My build, compress (CONFIG_KERNEL_XZ) and install, runs ~2 minutes
155 (1:58-2:07, 10+ runs, warm-cache), so yes, even if your build time
156 doesn't include compress and install, which it might, 1-minute is still
157 VERY impressive. Tho as I said, my CPU cost ~40% of the going price on
158 yours, so...
159
160 Meanwhile...
161
162 I too use and DEFINITELY recommend a tmpfs $PORTAGE_TMPDIR. I'm running
163 16 gig RAM here, and didn't want to run out of room with parallel builds,
164 so set a nice roomy 12G tmpfs size.
165
166 A $PORTAGE_TMPDIR on tmpfs also reduces the I/O. At least here, the only
167 time I've had problems, both on the old hardware and on the new, is when
168 I go into swap. (And on the old hardware I had swap priority= striped
169 across four disks and 4-way md/raid0, so the kernel could schedule swap-
170 out vs read-in much better and I didn't see a problem until I hit nearly
171 half-gig of swap loading at once; the new hardware is only single-disk
172 ATM, and I see issues starting @ 80 meg or so of swap loading, at once.)
173 But with 16 gig RAM on the new system, the only time I see it go into
174 swap is when I run a kernel build with uncapped -j, thus hitting 500+
175 jobs and close enough to 16 gigs that whether I hit swap or not depends
176 on what else I've been doing with the system.
177
178 Basically, I/O is thus not a problem at all with portage, here, up to the
179 --jobs=12 --load-average=12 along with MAKEOPTS="-j20 -l15" I normally
180 run, anyway. On the old system with only six gigs of RAM, if I tried
181 hard enough I could get portage to hit swap there, but I limited --jobs
182 and MAKEOPTS until that wasn't an issue, and had no additional problems.
183
184 Tho I should mention I also run PORTAGE_NICENESS=19 (and my kernel-build/
185 install script similarly renices itself to 19 before starting the kernel
186 build), which puts it in batch-scheduling mode (idle-only scheduling, but
187 longer timeslices).
188
189 If it matters, filesystem is reiserfs, iosched is cfq, drive is sata2/ahci
190 (amd 990fx/sb950 chipset) 2.5" seagate "spinning rust".
191
192 But I definitely agree with $PORTAGE_TMPDIR on tmpfs. It makes a HUGE
193 difference!
194
195 ---
196 [1] Compression parallelism: There are parallel-threaded alternatives to
197 bzip2, for instance, but they have certain down-sides like decompress
198 only being parallel where the tarball was compressed with the same
199 parallel tool, and certain compression buffer nul-fill handling
200 differences that make them not functionally perfect drop-in replacements.
201 See the recent discussion on the topic on the gentoo-dev list for
202 instance.
203
204 --
205 Duncan - List replies preferred. No HTML msgs.
206 "Every nonfree program has a lord, a master --
207 and if you use the program, he is your master." Richard Stallman