1 |
Alex Efros posted on Sun, 21 Oct 2012 16:24:32 +0300 as excerpted: |
2 |
|
3 |
> Hi! |
4 |
> |
5 |
> On Sun, Oct 21, 2012 at 08:02:47AM +0000, Duncan wrote: |
6 |
>> Bottom line, an empty @system set really does make a noticeable |
7 |
>> difference in parallel merge handling, speeding up especially |
8 |
>> --emptytree @world rebuilds but also any general update that has a |
9 |
>> significant number of otherwise @system packages and deps, |
10 |
>> dramatically. I'm happy. =:^) |
11 |
> |
12 |
> I think "@system first" and "@system not merge in parallel" rules are |
13 |
> safe to break when you just doing "--emptytree @world" on already |
14 |
> updated OS because it's only rebuild existing packages, and all packages |
15 |
> while compiling will see same set of other packages (including same |
16 |
> versions). But when upgrading multiple packages (including some from |
17 |
> original @system and some from @world) this probably may result in bugs. |
18 |
|
19 |
In theory, you're right. In practice, I've not seen it yet, tho being |
20 |
cautious I'd say it needs at least six months of testing (I've only been |
21 |
testing it about a month, maybe six weeks) before I can say for sure. |
22 |
It /was/ something I was a bit concerned about, however. |
23 |
|
24 |
That was in fact one of the reasons I decided to try it on the netbook's |
25 |
chroot as well, which hadn't been upgraded in a year and a half. I |
26 |
figured if it could work reasonably well there, the chances of an |
27 |
undiscovered real problem were much lower. |
28 |
|
29 |
However, it /is/ worth noting that as a matter of course, I already often |
30 |
choose to do some system-critical upgrades (portage, gcc, glibc, openrc, |
31 |
udev) on their own, before doing the general upgrades, in part so I can |
32 |
deal with their config file changes and note any problems right away, |
33 |
with a relatively small changeset to deal with, as opposed to having a |
34 |
whole slew of updates including critical system package updates happen |
35 |
all at once, thus making it far more difficult to trace which update |
36 |
actually broke things. |
37 |
|
38 |
That's where the years of gentoo experience I originally mentioned comes |
39 |
in. This isn't going to be as easy for a gentoo newbie for at least two |
40 |
reasons. First, they're less likely to know what packages really /are/ |
41 |
system critical, and thus are more likely to unmerge them without the |
42 |
extra unmerge warning a package in the system set gets. (I mentioned |
43 |
that one in the first post.) Second, spotting critical updates in the |
44 |
initial --pretend run, knowing which packages it's a good idea to upgrade |
45 |
first, by themselves, dealing with config file updates, etc, for just |
46 |
that critical package (and any dependency updates it might pull in), |
47 |
before going on to the general @world upgrade, probably makes a good bit |
48 |
of difference in practice, and gentoo newbies are rather less likely to |
49 |
be able to make that differentiation. (I didn't specifically mention |
50 |
that one until now.) |
51 |
|
52 |
> As for "--emptytree @world" speedup, can you provide benchmarked values? |
53 |
> I mean, only few packages forced to use only one CPU Core while |
54 |
> compiling. |
55 |
> So, merging packages in parallel may save some time mostly for doing |
56 |
> unpack/prepare/configure/install/merge. All of them except configure |
57 |
> actually do a lot of I/O, which most likely lose a lot in speed instead |
58 |
> of gain when done in parallel (especially keeping in mind kernel bug |
59 |
> 12309). So, at a glance time you may win on configure you'll mostly lose |
60 |
> on I/O, and most of time all your CPU Cores will be loaded anyway while |
61 |
> compiling, and doing configure in parallel to compiling unlikely save |
62 |
> some time. This is why I think without actual benchmarking we can't be |
63 |
> sure how faster it became (if it became faster at all, which is |
64 |
> questionable). |
65 |
|
66 |
Good points, and no, I can't easily provide benchmarks, both because of |
67 |
the recent hardware upgrade here, and because portage itself has been |
68 |
gradually improving its parallel merging abilities -- a recent update |
69 |
changed the scheduling algorithm so it starts additional merges much |
70 |
sooner than it did previously. (See gentoo bug 438650 fixed in portage |
71 |
2.1.11.29 and 2.2.0_alpha140, both released on Oct 17. That I know about |
72 |
that hints at another thing I do routinely as an experienced gentooer: I |
73 |
always read portage's changelog and check out any referenced bugs that |
74 |
look interesting, before I upgrade portage. To the extent practical |
75 |
without actually reading the individual git commits, I want to know about |
76 |
package manager changes that might affect me BEFORE I do that upgrade!) |
77 |
|
78 |
But, I believe as core-counts rise, you're underestimating the effects of |
79 |
portage's parallel merging abilities. In particular, a lot of packages |
80 |
normally in @system (or deps thereof) are relatively small packages such |
81 |
as grep, patch, sed... where the single-threaded configure step takes a |
82 |
MUCH larger share of the total package merge time than it does with |
83 |
larger packages. Similarly, the unpack and prepare phases, plus the |
84 |
package phase for folks using FEATURES=binpkg, tend to be |
85 |
single-threaded.[1] |
86 |
|
87 |
Thus, instead of serializing several dozen small mostly single-threaded |
88 |
package merges for packages like grep/sed/patch/util-linux/etc, depending |
89 |
on the --jobs and --load-average numbers you feed to portage, several of |
90 |
these end up getting done in parallel, with the portage multi-job output |
91 |
bumping a line every few seconds because it's doing them in parallel, |
92 |
instead of every minute or so, because it's doing one at a time. |
93 |
|
94 |
Meanwhile, it should be obvious, but it's worth stating anyway. The |
95 |
effect gets *MUCH* bigger as the number of cores increases. For a dual- |
96 |
core, bah, not worth the trouble, as it could cause more problems then it |
97 |
solves, especially if people are trying to work on other things while |
98 |
portage is doing its thing in the background. I suspect the break-over |
99 |
point is either triple-core or quad-core. One of the reasons portage is |
100 |
getting better lately is because someone's taken an interest that has a |
101 |
32-core, with a corresponding amount of memory (64 or 128 gig IIRC). |
102 |
|
103 |
It's worth noting, as I mentioned, that I now have a 6-core, recently |
104 |
upgraded from a dual-dual-core (4 cores), with a corresponding memory |
105 |
upgrade, to 16 gigs. |
106 |
|
107 |
One of the first things I noticed doing emerges was how much more |
108 |
difficult it was to keep the 6-core actually peaked out to 100% CPU, than |
109 |
it had been the 4-core. While I suspect there would have been a |
110 |
difference on the quad-core (as I said I believe the break-over's |
111 |
probably 3-4 cores), it wasn't a big deal there. Staring at that 6-core |
112 |
running at 100% on 1-2 cores CPU-freq-maxed at 3.6 GHz, while the other |
113 |
4-5 cores remained near idle at <20% utilization at CPU-freq-minimum 1.4 |
114 |
GHz... was VERY frustrating. So began my drive to empty @system and get |
115 |
portage properly scheduling parallel merges for former @system packages |
116 |
and their deps as well! |
117 |
|
118 |
For the quad-core plus hyperthreading (thus 8 threads I take it?) you |
119 |
mention below (4.6 GHz OC, nice! I see stock is 3.4 GHz), the boost from |
120 |
killing @system forced serialization should definitely make a difference |
121 |
(unless the hyperthreading doesn't do much for that work load, making it |
122 |
effectively no better than a non-hyperthreaded quad-core. For my 6-core, |
123 |
it made a rather big difference, and I guarantee if you had the 32-core |
124 |
that one of the devs working on improving portage's parallelization has, |
125 |
you'd be hot on the trail to improve it as well! |
126 |
|
127 |
> As for me, I found very effective way to speedup emerge is upgrading |
128 |
> from Core2Duo E6600 to i7-2600K overclocked to 4.6GHz. This speedup |
129 |
> compilation on my system in 6 times (kernel now compiles in just 1 |
130 |
> minute). And to speedup most other (non-compilation) portage operations |
131 |
> I use 4GB tmpfs mount on /var/tmp/portage/. |
132 |
|
133 |
I remember reading about the 1-minute kernel compiles on i7s. Very |
134 |
impressive. |
135 |
|
136 |
FWIW, there's a lot of variables to fill in the blank on, before we can |
137 |
be sure kernel build time comparisons are apples to apples (I had several |
138 |
more paragraphs written on that, but decided it was a digression too far |
139 |
for this post so deleted 'em), but AFAIK when I read about it (on phoronix |
140 |
I believe), he was doing an all-yes config, so building rather more than |
141 |
a typical customized-config gentooer, but was using a rather fast SSD, |
142 |
which probably improved his times quite a bit compared to "spinning rust". |
143 |
|
144 |
But I don't know if his timings included the actual compress (and if so |
145 |
with what CONFIG_KERNEL_XXX compression option) and I don't believe they |
146 |
included the actual install, only the build. |
147 |
|
148 |
That said, a 1-minute all-yes-config kernel build time is impressive |
149 |
indeed, the envy of many, including me. (OTOH, my fx6100 was on sale for |
150 |
$100, $109 post-tax. That's lower than pricewatch's $118 lowest quote |
151 |
(shipped, no tax), and only about 40% of the $273 low quote for an |
152 |
i7-2600k.) |
153 |
|
154 |
My build, compress (CONFIG_KERNEL_XZ) and install, runs ~2 minutes |
155 |
(1:58-2:07, 10+ runs, warm-cache), so yes, even if your build time |
156 |
doesn't include compress and install, which it might, 1-minute is still |
157 |
VERY impressive. Tho as I said, my CPU cost ~40% of the going price on |
158 |
yours, so... |
159 |
|
160 |
Meanwhile... |
161 |
|
162 |
I too use and DEFINITELY recommend a tmpfs $PORTAGE_TMPDIR. I'm running |
163 |
16 gig RAM here, and didn't want to run out of room with parallel builds, |
164 |
so set a nice roomy 12G tmpfs size. |
165 |
|
166 |
A $PORTAGE_TMPDIR on tmpfs also reduces the I/O. At least here, the only |
167 |
time I've had problems, both on the old hardware and on the new, is when |
168 |
I go into swap. (And on the old hardware I had swap priority= striped |
169 |
across four disks and 4-way md/raid0, so the kernel could schedule swap- |
170 |
out vs read-in much better and I didn't see a problem until I hit nearly |
171 |
half-gig of swap loading at once; the new hardware is only single-disk |
172 |
ATM, and I see issues starting @ 80 meg or so of swap loading, at once.) |
173 |
But with 16 gig RAM on the new system, the only time I see it go into |
174 |
swap is when I run a kernel build with uncapped -j, thus hitting 500+ |
175 |
jobs and close enough to 16 gigs that whether I hit swap or not depends |
176 |
on what else I've been doing with the system. |
177 |
|
178 |
Basically, I/O is thus not a problem at all with portage, here, up to the |
179 |
--jobs=12 --load-average=12 along with MAKEOPTS="-j20 -l15" I normally |
180 |
run, anyway. On the old system with only six gigs of RAM, if I tried |
181 |
hard enough I could get portage to hit swap there, but I limited --jobs |
182 |
and MAKEOPTS until that wasn't an issue, and had no additional problems. |
183 |
|
184 |
Tho I should mention I also run PORTAGE_NICENESS=19 (and my kernel-build/ |
185 |
install script similarly renices itself to 19 before starting the kernel |
186 |
build), which puts it in batch-scheduling mode (idle-only scheduling, but |
187 |
longer timeslices). |
188 |
|
189 |
If it matters, filesystem is reiserfs, iosched is cfq, drive is sata2/ahci |
190 |
(amd 990fx/sb950 chipset) 2.5" seagate "spinning rust". |
191 |
|
192 |
But I definitely agree with $PORTAGE_TMPDIR on tmpfs. It makes a HUGE |
193 |
difference! |
194 |
|
195 |
--- |
196 |
[1] Compression parallelism: There are parallel-threaded alternatives to |
197 |
bzip2, for instance, but they have certain down-sides like decompress |
198 |
only being parallel where the tarball was compressed with the same |
199 |
parallel tool, and certain compression buffer nul-fill handling |
200 |
differences that make them not functionally perfect drop-in replacements. |
201 |
See the recent discussion on the topic on the gentoo-dev list for |
202 |
instance. |
203 |
|
204 |
-- |
205 |
Duncan - List replies preferred. No HTML msgs. |
206 |
"Every nonfree program has a lord, a master -- |
207 |
and if you use the program, he is your master." Richard Stallman |