Gentoo Archives: gentoo-portage-dev

From: Gustavo Barbieri <barbieri@×××××.com>
To: gentoo-portage-dev@l.g.o, ferringb@g.o
Subject: Re: [gentoo-portage-dev] Current portage well designed, but badly used
Date: Wed, 01 Dec 2004 03:56:02
Message-Id: 9ef20ef3041130195572195bea@mail.gmail.com
In Reply to: Re: [gentoo-portage-dev] Current portage well designed, but badly used by Brian Harring
1 On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@g.o> wrote:
2 > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
3 > > > >The portage library is too heavy, complicated and make things slow.
4 > > > >Heavy and complicated I noticed from (trying to) look at the source,
5 > > > >slow by usage.
6 > You *really* should explain how it's heavy and complicated.
7 > Generalizations don't help to improve it :)
8
9 You're right, but to explain these I need to understand it a bit more.
10 By now you can see it just as user feeling.
11
12
13 > > > >time emerge # without parameters
14 > > > >real 0m0.614s
15 > > > >user 0m0.487s
16 > > > >sys 0m0.046s
17 > > > >
18 > > > >time emerge -pv world # 16 packages to be upgraded
19 > > > >real 0m22.664s
20 > > > >user 0m12.423s
21 > > > >sys 0m1.130s
22 > There's quite a large difference between just importing portage, and
23 > actually parsing your profile, determining what your use flags are
24 > (since profiles can define defaults, and some use flags are based upon
25 > packages being installed dependant on the profile, perl fex). That, and
26 > walking/building the depgraph, querying the cache, locking, etc.
27
28 These two were not related, just put them together since i "measured"
29 them in sequence.
30
31 I just think that 1/2 second for just printing "usage" message is too
32 much, I already experienced more than seconds. But this doesn't real
33 matters, forget it.
34
35
36 > > > >It's too much, look at debian apt, it's fast. And I can't see why
37 > > > >portage is slow.
38 > > > >Forgive me if I'm wrong, but portage just need to parse
39 > > > >/var/lib/portage/world (237 entries in my case), them for each check
40 > > > >if there is any other version greater than and if so check for
41 > > > >dependencies. Why 22seconds? A hand made take less than 1.
42 >
43 > Checks first level depends of all packages in world also. So that list
44 > just got larger :)
45
46 I know, but that larger?
47 Anyway, I'll try to understand how you do things, what's read from
48 disk to memory, data structures... any documents on that? just reading
49 the source is painful :/
50
51
52 > Regarding debian apt, it's likely apples/oranges as urilith stated. One
53 > thing to note is that afaik, debian dependencies lack versions- they're
54 > basically a flat namespace.
55 >
56 > fex, if a dpkg states it deps mysql, there is mysql. Singular.
57 > W/ portage, well, need to determine what version is available based upon
58 > keywords, package.mask, and users /etc/portage/package.keywords (and
59 > other things).
60
61 As I see, this doesn't make algorithms worst (exponentially), just add
62 a constant... that constant is that huge?
63
64
65 > Note I work on portage, not dpkg/apt. So I could be talking out of my
66 > ass there...
67
68 Ok, and I work with none so far... :) [but as a gentoo user I want to
69 improve my sys]
70
71
72 > > I'll look at CVS, but I don't see why portage need to be slow. As you
73 > > said, it's being fixed.
74 > Elaborate on how it's slow. There are various algs/processes that you
75 > could be referencing. Rough cvs improvements, 33% bash sourcing
76 > improvement- for those thinking parsing bash == slow portage, it's not
77 > the case. Users *never* see portage sourcing ebuilds for their keys
78 > (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and
79 > the ebuilds in the overlay are _only_ sourced when they've changed. The
80 > improvements in bash sourcing speed in cvs were A) intended to fix env
81 > handling for ebuilds (ancillary benefit), and B) speed up regen for devs
82 > and the server that generates the metacache for rsync users.
83 >
84 > So no, bash isn't really what's slowing things down. :)
85
86 Good to know, from previous messages I was believing that every
87 "emerge -s" did source the whole portage tree :)
88
89
90
91 > If you're referencing the nice long pause after sync'ing, that's
92 > transfer of the cache from ${PORTDIR}/metadata/cache to
93 > /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially
94 > might have that pause eliminated also, although that would require
95 > ensuring the tree is readonly- that's another can of worms.
96
97 No, I don't care about the caching stuff, just that "emerge
98 something_without_deps" takes too long.
99
100
101 > Aside from that, there is searching speed, which is a bit slowed down by
102 > the current use of locks in the default portage_db_flat ebuild metadata
103 > cache. Additionally, portage_db_flat uses seperate files for each cpv
104 > (category/package-version), so there is considerable overhead from
105 > opening/closing a crapload of files.
106 >
107 > Using portage_db_anydbm improves this, although it has a few issues of
108 > it's own.
109
110 Hum, here it comes. That's the part I think is slow and my reasons
111 (guess) are those you said: locks and the "lots of small files"
112 instead of one, probably real/optimized/indexed, database.
113
114 Sorry, but I was not aware of the _anydbm stuff, were I can read more about it?
115
116
117 > If you're referencing doing a search based on description, well, the
118 > cache backend as mentioned above slows things down pretty majorly. Even
119 > with anydbm, it still has to proceed cpv by cpv- basically walk the
120 > entire cache, *while* verifying the cache isn't stale- eg, check the
121 > stored mtime, and compare it to the ebuilds mtime.
122 >
123 > Things could be speed up by treating the tree/cache on disk as readonly-
124 > this is something being bantered about, and may happen. Treating the
125 > tree/cache as readonly means we don't have to do any locking in the
126 > cache, nor staleness checks (less IO).
127
128 Optimizing for the common case, it's a valid assumption. It will save
129 us a lot of time and may cause little problem, since portage is much
130 less write than read.
131
132
133
134 > > and I can't see that difference between portage and apt in the area
135 > > portage is slow, ok apt uses a db and don't need to check use flags,
136 > > but they're orders of magnitude different. Even lemons and apples are
137 > > that different ;)
138 > See above.
139 >
140 > > > > - portage to act as a daemon, queue requests and fetch packages.
141 > > > >If portage could be a daemon with 3 threads: one that download
142 > > > >packages, one that compiles and one to manage the other and accept
143 > > > >requests; then it could schedule download to maximize download
144 > > > >throughput,
145 > parallel fetch is in cvs already. Portage 2.0.51 already supports this
146 > in a way,
147 > (emerge -f targets &> /dev/null &); emerge targets
148 >
149 > > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
150 > > > portage CVS, although it
151 > > > doesnt use threads, because there is no way to kill processes (wget,
152 > > > etc.) spawned from within
153 > > > a thread, so youd have stale processes after Ctrl+C'ing portage.
154 > Doesn't apply in this case, daemonized ebuild.sh just speeds up bash
155 > sourcing which most users won't see. Devs, on the other hand see it
156 > since they use cvs- no rsyncing of a pregenerated cache.
157 >
158 > > Great!
159 > > BTW, with threads I meant the concept of more than one thing running
160 > > in parallel, don't need to be posix threads, can be process or even
161 > > one process using select()
162 > Currently implemented via fork. Doing long running threads in python is
163 > a bit trickier then you might suspect (tried that route, stopping a
164 > thread w/out having it check up every 5 seconds is pretty fricking
165 > hard/annoying).
166 >
167 > > > Jstubbs is working on an api that will make its way into a later
168 > > > revision of portage. As far as parsing
169 > > > ebuilds, they are sourced directly from bash.
170 > >
171 > > There is any explanation/roadmap/design I can look at? Jstubbs reads
172 > > this list? What's his goals, how he want to achieve it?
173 > He'd have to state his goals-
174 > offhand, afaik he threw in some of this goals in
175 > http://dev.gentoo.org/~jstubbs/portage/goals.txt
176 > as are a collection of mine, and some of genone (Marius Mauch).
177 >
178 > >
179 > > About parsing of ebuilds, what do I need to source before the ebuild
180 > > itself? I mean, to get things like "inherit" working.
181 > All of ebuild.sh. Seriously. :)
182 >
183 > Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild;
184 > lot of functions are expected to exist for ebuilds to work, inherit fex
185 > (bash function).
186
187 Ok.
188
189
190 > I'd suggest grabbing
191 > http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through
192 > bin/ebuild*.sh and bin/isolated*
193 >
194 > With the exemption of ebuild-daemon.sh, all of that code is required to
195 > create the appropriate bash environment that ebuilds expect. Even with
196 > that default env, eclasses exist to extend it and add new functionality.
197 >
198 > Portage *does* have issues that need correcting, calling
199 > patterns/design/structure changed, etc. Trying to elaborate on the
200 > issues above, hope it provides some insight into why things are they way
201 > they are (and potential avenues to check out for improving performance).
202
203 I'll try the CVS.
204
205 Thank you for your time and patient for replying to my kinda rude
206 question/doubts. I'll try to help as far as possible. If there are
207 minor works, I could start learning portage in more depth.
208
209
210 --
211 Gustavo Sverzut Barbieri
212 ---------------------------------------
213 Computer Engineer 2001 - UNICAMP
214 GPSL - Grupo Pro Software Livre
215 Cell..: +55 (19) 9165 8010
216 Jabber: gsbarbieri@××××××.org
217 ICQ#: 17249123
218 GPG: 0xB640E1A2 @ wwwkeys.pgp.net
219
220 --
221 gentoo-portage-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-portage-dev] Current portage well designed, but badly used Gregorio Guidi <g.guidi@×××.it>