1 |
On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@g.o> wrote: |
2 |
> On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: |
3 |
> > > >The portage library is too heavy, complicated and make things slow. |
4 |
> > > >Heavy and complicated I noticed from (trying to) look at the source, |
5 |
> > > >slow by usage. |
6 |
> You *really* should explain how it's heavy and complicated. |
7 |
> Generalizations don't help to improve it :) |
8 |
|
9 |
You're right, but to explain these I need to understand it a bit more. |
10 |
By now you can see it just as user feeling. |
11 |
|
12 |
|
13 |
> > > >time emerge # without parameters |
14 |
> > > >real 0m0.614s |
15 |
> > > >user 0m0.487s |
16 |
> > > >sys 0m0.046s |
17 |
> > > > |
18 |
> > > >time emerge -pv world # 16 packages to be upgraded |
19 |
> > > >real 0m22.664s |
20 |
> > > >user 0m12.423s |
21 |
> > > >sys 0m1.130s |
22 |
> There's quite a large difference between just importing portage, and |
23 |
> actually parsing your profile, determining what your use flags are |
24 |
> (since profiles can define defaults, and some use flags are based upon |
25 |
> packages being installed dependant on the profile, perl fex). That, and |
26 |
> walking/building the depgraph, querying the cache, locking, etc. |
27 |
|
28 |
These two were not related, just put them together since i "measured" |
29 |
them in sequence. |
30 |
|
31 |
I just think that 1/2 second for just printing "usage" message is too |
32 |
much, I already experienced more than seconds. But this doesn't real |
33 |
matters, forget it. |
34 |
|
35 |
|
36 |
> > > >It's too much, look at debian apt, it's fast. And I can't see why |
37 |
> > > >portage is slow. |
38 |
> > > >Forgive me if I'm wrong, but portage just need to parse |
39 |
> > > >/var/lib/portage/world (237 entries in my case), them for each check |
40 |
> > > >if there is any other version greater than and if so check for |
41 |
> > > >dependencies. Why 22seconds? A hand made take less than 1. |
42 |
> |
43 |
> Checks first level depends of all packages in world also. So that list |
44 |
> just got larger :) |
45 |
|
46 |
I know, but that larger? |
47 |
Anyway, I'll try to understand how you do things, what's read from |
48 |
disk to memory, data structures... any documents on that? just reading |
49 |
the source is painful :/ |
50 |
|
51 |
|
52 |
> Regarding debian apt, it's likely apples/oranges as urilith stated. One |
53 |
> thing to note is that afaik, debian dependencies lack versions- they're |
54 |
> basically a flat namespace. |
55 |
> |
56 |
> fex, if a dpkg states it deps mysql, there is mysql. Singular. |
57 |
> W/ portage, well, need to determine what version is available based upon |
58 |
> keywords, package.mask, and users /etc/portage/package.keywords (and |
59 |
> other things). |
60 |
|
61 |
As I see, this doesn't make algorithms worst (exponentially), just add |
62 |
a constant... that constant is that huge? |
63 |
|
64 |
|
65 |
> Note I work on portage, not dpkg/apt. So I could be talking out of my |
66 |
> ass there... |
67 |
|
68 |
Ok, and I work with none so far... :) [but as a gentoo user I want to |
69 |
improve my sys] |
70 |
|
71 |
|
72 |
> > I'll look at CVS, but I don't see why portage need to be slow. As you |
73 |
> > said, it's being fixed. |
74 |
> Elaborate on how it's slow. There are various algs/processes that you |
75 |
> could be referencing. Rough cvs improvements, 33% bash sourcing |
76 |
> improvement- for those thinking parsing bash == slow portage, it's not |
77 |
> the case. Users *never* see portage sourcing ebuilds for their keys |
78 |
> (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and |
79 |
> the ebuilds in the overlay are _only_ sourced when they've changed. The |
80 |
> improvements in bash sourcing speed in cvs were A) intended to fix env |
81 |
> handling for ebuilds (ancillary benefit), and B) speed up regen for devs |
82 |
> and the server that generates the metacache for rsync users. |
83 |
> |
84 |
> So no, bash isn't really what's slowing things down. :) |
85 |
|
86 |
Good to know, from previous messages I was believing that every |
87 |
"emerge -s" did source the whole portage tree :) |
88 |
|
89 |
|
90 |
|
91 |
> If you're referencing the nice long pause after sync'ing, that's |
92 |
> transfer of the cache from ${PORTDIR}/metadata/cache to |
93 |
> /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially |
94 |
> might have that pause eliminated also, although that would require |
95 |
> ensuring the tree is readonly- that's another can of worms. |
96 |
|
97 |
No, I don't care about the caching stuff, just that "emerge |
98 |
something_without_deps" takes too long. |
99 |
|
100 |
|
101 |
> Aside from that, there is searching speed, which is a bit slowed down by |
102 |
> the current use of locks in the default portage_db_flat ebuild metadata |
103 |
> cache. Additionally, portage_db_flat uses seperate files for each cpv |
104 |
> (category/package-version), so there is considerable overhead from |
105 |
> opening/closing a crapload of files. |
106 |
> |
107 |
> Using portage_db_anydbm improves this, although it has a few issues of |
108 |
> it's own. |
109 |
|
110 |
Hum, here it comes. That's the part I think is slow and my reasons |
111 |
(guess) are those you said: locks and the "lots of small files" |
112 |
instead of one, probably real/optimized/indexed, database. |
113 |
|
114 |
Sorry, but I was not aware of the _anydbm stuff, were I can read more about it? |
115 |
|
116 |
|
117 |
> If you're referencing doing a search based on description, well, the |
118 |
> cache backend as mentioned above slows things down pretty majorly. Even |
119 |
> with anydbm, it still has to proceed cpv by cpv- basically walk the |
120 |
> entire cache, *while* verifying the cache isn't stale- eg, check the |
121 |
> stored mtime, and compare it to the ebuilds mtime. |
122 |
> |
123 |
> Things could be speed up by treating the tree/cache on disk as readonly- |
124 |
> this is something being bantered about, and may happen. Treating the |
125 |
> tree/cache as readonly means we don't have to do any locking in the |
126 |
> cache, nor staleness checks (less IO). |
127 |
|
128 |
Optimizing for the common case, it's a valid assumption. It will save |
129 |
us a lot of time and may cause little problem, since portage is much |
130 |
less write than read. |
131 |
|
132 |
|
133 |
|
134 |
> > and I can't see that difference between portage and apt in the area |
135 |
> > portage is slow, ok apt uses a db and don't need to check use flags, |
136 |
> > but they're orders of magnitude different. Even lemons and apples are |
137 |
> > that different ;) |
138 |
> See above. |
139 |
> |
140 |
> > > > - portage to act as a daemon, queue requests and fetch packages. |
141 |
> > > >If portage could be a daemon with 3 threads: one that download |
142 |
> > > >packages, one that compiles and one to manage the other and accept |
143 |
> > > >requests; then it could schedule download to maximize download |
144 |
> > > >throughput, |
145 |
> parallel fetch is in cvs already. Portage 2.0.51 already supports this |
146 |
> in a way, |
147 |
> (emerge -f targets &> /dev/null &); emerge targets |
148 |
> |
149 |
> > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in |
150 |
> > > portage CVS, although it |
151 |
> > > doesnt use threads, because there is no way to kill processes (wget, |
152 |
> > > etc.) spawned from within |
153 |
> > > a thread, so youd have stale processes after Ctrl+C'ing portage. |
154 |
> Doesn't apply in this case, daemonized ebuild.sh just speeds up bash |
155 |
> sourcing which most users won't see. Devs, on the other hand see it |
156 |
> since they use cvs- no rsyncing of a pregenerated cache. |
157 |
> |
158 |
> > Great! |
159 |
> > BTW, with threads I meant the concept of more than one thing running |
160 |
> > in parallel, don't need to be posix threads, can be process or even |
161 |
> > one process using select() |
162 |
> Currently implemented via fork. Doing long running threads in python is |
163 |
> a bit trickier then you might suspect (tried that route, stopping a |
164 |
> thread w/out having it check up every 5 seconds is pretty fricking |
165 |
> hard/annoying). |
166 |
> |
167 |
> > > Jstubbs is working on an api that will make its way into a later |
168 |
> > > revision of portage. As far as parsing |
169 |
> > > ebuilds, they are sourced directly from bash. |
170 |
> > |
171 |
> > There is any explanation/roadmap/design I can look at? Jstubbs reads |
172 |
> > this list? What's his goals, how he want to achieve it? |
173 |
> He'd have to state his goals- |
174 |
> offhand, afaik he threw in some of this goals in |
175 |
> http://dev.gentoo.org/~jstubbs/portage/goals.txt |
176 |
> as are a collection of mine, and some of genone (Marius Mauch). |
177 |
> |
178 |
> > |
179 |
> > About parsing of ebuilds, what do I need to source before the ebuild |
180 |
> > itself? I mean, to get things like "inherit" working. |
181 |
> All of ebuild.sh. Seriously. :) |
182 |
> |
183 |
> Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild; |
184 |
> lot of functions are expected to exist for ebuilds to work, inherit fex |
185 |
> (bash function). |
186 |
|
187 |
Ok. |
188 |
|
189 |
|
190 |
> I'd suggest grabbing |
191 |
> http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through |
192 |
> bin/ebuild*.sh and bin/isolated* |
193 |
> |
194 |
> With the exemption of ebuild-daemon.sh, all of that code is required to |
195 |
> create the appropriate bash environment that ebuilds expect. Even with |
196 |
> that default env, eclasses exist to extend it and add new functionality. |
197 |
> |
198 |
> Portage *does* have issues that need correcting, calling |
199 |
> patterns/design/structure changed, etc. Trying to elaborate on the |
200 |
> issues above, hope it provides some insight into why things are they way |
201 |
> they are (and potential avenues to check out for improving performance). |
202 |
|
203 |
I'll try the CVS. |
204 |
|
205 |
Thank you for your time and patient for replying to my kinda rude |
206 |
question/doubts. I'll try to help as far as possible. If there are |
207 |
minor works, I could start learning portage in more depth. |
208 |
|
209 |
|
210 |
-- |
211 |
Gustavo Sverzut Barbieri |
212 |
--------------------------------------- |
213 |
Computer Engineer 2001 - UNICAMP |
214 |
GPSL - Grupo Pro Software Livre |
215 |
Cell..: +55 (19) 9165 8010 |
216 |
Jabber: gsbarbieri@××××××.org |
217 |
ICQ#: 17249123 |
218 |
GPG: 0xB640E1A2 @ wwwkeys.pgp.net |
219 |
|
220 |
-- |
221 |
gentoo-portage-dev@g.o mailing list |