1 |
On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: |
2 |
> > >The portage library is too heavy, complicated and make things slow. |
3 |
> > >Heavy and complicated I noticed from (trying to) look at the source, |
4 |
> > >slow by usage. |
5 |
You *really* should explain how it's heavy and complicated. |
6 |
Generalizations don't help to improve it :) |
7 |
|
8 |
> > >time emerge # without parameters |
9 |
> > >real 0m0.614s |
10 |
> > >user 0m0.487s |
11 |
> > >sys 0m0.046s |
12 |
> > > |
13 |
> > >time emerge -pv world # 16 packages to be upgraded |
14 |
> > >real 0m22.664s |
15 |
> > >user 0m12.423s |
16 |
> > >sys 0m1.130s |
17 |
There's quite a large difference between just importing portage, and |
18 |
actually parsing your profile, determining what your use flags are |
19 |
(since profiles can define defaults, and some use flags are based upon |
20 |
packages being installed dependant on the profile, perl fex). That, and |
21 |
walking/building the depgraph, querying the cache, locking, etc. |
22 |
|
23 |
> > > |
24 |
> > >It's too much, look at debian apt, it's fast. And I can't see why |
25 |
> > >portage is slow. |
26 |
> > >Forgive me if I'm wrong, but portage just need to parse |
27 |
> > >/var/lib/portage/world (237 entries in my case), them for each check |
28 |
> > >if there is any other version greater than and if so check for |
29 |
> > >dependencies. Why 22seconds? A hand made take less than 1. |
30 |
Checks first level depends of all packages in world also. So that list |
31 |
just got larger :) |
32 |
Regarding debian apt, it's likely apples/oranges as urilith stated. One |
33 |
thing to note is that afaik, debian dependencies lack versions- they're |
34 |
basically a flat namespace. |
35 |
|
36 |
fex, if a dpkg states it deps mysql, there is mysql. Singular. |
37 |
W/ portage, well, need to determine what version is available based upon |
38 |
keywords, package.mask, and users /etc/portage/package.keywords (and |
39 |
other things). |
40 |
|
41 |
Note I work on portage, not dpkg/apt. So I could be talking out of my |
42 |
ass there... |
43 |
|
44 |
> I'll look at CVS, but I don't see why portage need to be slow. As you |
45 |
> said, it's being fixed. |
46 |
Elaborate on how it's slow. There are various algs/processes that you |
47 |
could be referencing. Rough cvs improvements, 33% bash sourcing |
48 |
improvement- for those thinking parsing bash == slow portage, it's not |
49 |
the case. Users *never* see portage sourcing ebuilds for their keys |
50 |
(DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and |
51 |
the ebuilds in the overlay are _only_ sourced when they've changed. The |
52 |
improvements in bash sourcing speed in cvs were A) intended to fix env |
53 |
handling for ebuilds (ancillary benefit), and B) speed up regen for devs |
54 |
and the server that generates the metacache for rsync users. |
55 |
|
56 |
So no, bash isn't really what's slowing things down. :) |
57 |
|
58 |
If you're referencing the nice long pause after sync'ing, that's |
59 |
transfer of the cache from ${PORTDIR}/metadata/cache to |
60 |
/var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially |
61 |
might have that pause eliminated also, although that would require |
62 |
ensuring the tree is readonly- that's another can of worms. |
63 |
|
64 |
Aside from that, there is searching speed, which is a bit slowed down by |
65 |
the current use of locks in the default portage_db_flat ebuild metadata |
66 |
cache. Additionally, portage_db_flat uses seperate files for each cpv |
67 |
(category/package-version), so there is considerable overhead from |
68 |
opening/closing a crapload of files. |
69 |
|
70 |
Using portage_db_anydbm improves this, although it has a few issues of |
71 |
it's own. |
72 |
|
73 |
If you're referencing doing a search based on description, well, the |
74 |
cache backend as mentioned above slows things down pretty majorly. Even |
75 |
with anydbm, it still has to proceed cpv by cpv- basically walk the |
76 |
entire cache, *while* verifying the cache isn't stale- eg, check the |
77 |
stored mtime, and compare it to the ebuilds mtime. |
78 |
|
79 |
Things could be speed up by treating the tree/cache on disk as readonly- |
80 |
this is something being bantered about, and may happen. Treating the |
81 |
tree/cache as readonly means we don't have to do any locking in the |
82 |
cache, nor staleness checks (less IO). |
83 |
|
84 |
> |
85 |
> and I can't see that difference between portage and apt in the area |
86 |
> portage is slow, ok apt uses a db and don't need to check use flags, |
87 |
> but they're orders of magnitude different. Even lemons and apples are |
88 |
> that different ;) |
89 |
See above. |
90 |
|
91 |
> > > - portage to act as a daemon, queue requests and fetch packages. |
92 |
> > >If portage could be a daemon with 3 threads: one that download |
93 |
> > >packages, one that compiles and one to manage the other and accept |
94 |
> > >requests; then it could schedule download to maximize download |
95 |
> > >throughput, |
96 |
parallel fetch is in cvs already. Portage 2.0.51 already supports this |
97 |
in a way, |
98 |
(emerge -f targets &> /dev/null &); emerge targets |
99 |
|
100 |
> > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in |
101 |
> > portage CVS, although it |
102 |
> > doesnt use threads, because there is no way to kill processes (wget, |
103 |
> > etc.) spawned from within |
104 |
> > a thread, so youd have stale processes after Ctrl+C'ing portage. |
105 |
Doesn't apply in this case, daemonized ebuild.sh just speeds up bash |
106 |
sourcing which most users won't see. Devs, on the other hand see it |
107 |
since they use cvs- no rsyncing of a pregenerated cache. |
108 |
|
109 |
> |
110 |
> Great! |
111 |
> BTW, with threads I meant the concept of more than one thing running |
112 |
> in parallel, don't need to be posix threads, can be process or even |
113 |
> one process using select() |
114 |
Currently implemented via fork. Doing long running threads in python is |
115 |
a bit trickier then you might suspect (tried that route, stopping a |
116 |
thread w/out having it check up every 5 seconds is pretty fricking |
117 |
hard/annoying). |
118 |
|
119 |
> > Jstubbs is working on an api that will make its way into a later |
120 |
> > revision of portage. As far as parsing |
121 |
> > ebuilds, they are sourced directly from bash. |
122 |
> |
123 |
> There is any explanation/roadmap/design I can look at? Jstubbs reads |
124 |
> this list? What's his goals, how he want to achieve it? |
125 |
He'd have to state his goals- |
126 |
offhand, afaik he threw in some of this goals in |
127 |
http://dev.gentoo.org/~jstubbs/portage/goals.txt |
128 |
as are a collection of mine, and some of genone (Marius Mauch). |
129 |
|
130 |
> |
131 |
> About parsing of ebuilds, what do I need to source before the ebuild |
132 |
> itself? I mean, to get things like "inherit" working. |
133 |
All of ebuild.sh. Seriously. :) |
134 |
|
135 |
Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild; |
136 |
lot of functions are expected to exist for ebuilds to work, inherit fex |
137 |
(bash function). |
138 |
|
139 |
I'd suggest grabbing |
140 |
http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through |
141 |
bin/ebuild*.sh and bin/isolated* |
142 |
|
143 |
With the exemption of ebuild-daemon.sh, all of that code is required to |
144 |
create the appropriate bash environment that ebuilds expect. Even with |
145 |
that default env, eclasses exist to extend it and add new functionality. |
146 |
|
147 |
Portage *does* have issues that need correcting, calling |
148 |
patterns/design/structure changed, etc. Trying to elaborate on the |
149 |
issues above, hope it provides some insight into why things are they way |
150 |
they are (and potential avenues to check out for improving performance). |
151 |
~brian |
152 |
|
153 |
|
154 |
-- |
155 |
gentoo-portage-dev@g.o mailing list |