Gentoo Archives: gentoo-portage-dev

From: Brian Harring <ferringb@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Current portage well designed, but badly used
Date: Tue, 30 Nov 2004 14:19:54
Message-Id: 1101824527.32056.102.camel@localhost.localdomain
In Reply to: Re: [gentoo-portage-dev] Current portage well designed, but badly used by Gustavo Barbieri
1 On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
2 > > >The portage library is too heavy, complicated and make things slow.
3 > > >Heavy and complicated I noticed from (trying to) look at the source,
4 > > >slow by usage.
5 You *really* should explain how it's heavy and complicated.
6 Generalizations don't help to improve it :)
7
8 > > >time emerge # without parameters
9 > > >real 0m0.614s
10 > > >user 0m0.487s
11 > > >sys 0m0.046s
12 > > >
13 > > >time emerge -pv world # 16 packages to be upgraded
14 > > >real 0m22.664s
15 > > >user 0m12.423s
16 > > >sys 0m1.130s
17 There's quite a large difference between just importing portage, and
18 actually parsing your profile, determining what your use flags are
19 (since profiles can define defaults, and some use flags are based upon
20 packages being installed dependant on the profile, perl fex). That, and
21 walking/building the depgraph, querying the cache, locking, etc.
22
23 > > >
24 > > >It's too much, look at debian apt, it's fast. And I can't see why
25 > > >portage is slow.
26 > > >Forgive me if I'm wrong, but portage just need to parse
27 > > >/var/lib/portage/world (237 entries in my case), them for each check
28 > > >if there is any other version greater than and if so check for
29 > > >dependencies. Why 22seconds? A hand made take less than 1.
30 Checks first level depends of all packages in world also. So that list
31 just got larger :)
32 Regarding debian apt, it's likely apples/oranges as urilith stated. One
33 thing to note is that afaik, debian dependencies lack versions- they're
34 basically a flat namespace.
35
36 fex, if a dpkg states it deps mysql, there is mysql. Singular.
37 W/ portage, well, need to determine what version is available based upon
38 keywords, package.mask, and users /etc/portage/package.keywords (and
39 other things).
40
41 Note I work on portage, not dpkg/apt. So I could be talking out of my
42 ass there...
43
44 > I'll look at CVS, but I don't see why portage need to be slow. As you
45 > said, it's being fixed.
46 Elaborate on how it's slow. There are various algs/processes that you
47 could be referencing. Rough cvs improvements, 33% bash sourcing
48 improvement- for those thinking parsing bash == slow portage, it's not
49 the case. Users *never* see portage sourcing ebuilds for their keys
50 (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and
51 the ebuilds in the overlay are _only_ sourced when they've changed. The
52 improvements in bash sourcing speed in cvs were A) intended to fix env
53 handling for ebuilds (ancillary benefit), and B) speed up regen for devs
54 and the server that generates the metacache for rsync users.
55
56 So no, bash isn't really what's slowing things down. :)
57
58 If you're referencing the nice long pause after sync'ing, that's
59 transfer of the cache from ${PORTDIR}/metadata/cache to
60 /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially
61 might have that pause eliminated also, although that would require
62 ensuring the tree is readonly- that's another can of worms.
63
64 Aside from that, there is searching speed, which is a bit slowed down by
65 the current use of locks in the default portage_db_flat ebuild metadata
66 cache. Additionally, portage_db_flat uses seperate files for each cpv
67 (category/package-version), so there is considerable overhead from
68 opening/closing a crapload of files.
69
70 Using portage_db_anydbm improves this, although it has a few issues of
71 it's own.
72
73 If you're referencing doing a search based on description, well, the
74 cache backend as mentioned above slows things down pretty majorly. Even
75 with anydbm, it still has to proceed cpv by cpv- basically walk the
76 entire cache, *while* verifying the cache isn't stale- eg, check the
77 stored mtime, and compare it to the ebuilds mtime.
78
79 Things could be speed up by treating the tree/cache on disk as readonly-
80 this is something being bantered about, and may happen. Treating the
81 tree/cache as readonly means we don't have to do any locking in the
82 cache, nor staleness checks (less IO).
83
84 >
85 > and I can't see that difference between portage and apt in the area
86 > portage is slow, ok apt uses a db and don't need to check use flags,
87 > but they're orders of magnitude different. Even lemons and apples are
88 > that different ;)
89 See above.
90
91 > > > - portage to act as a daemon, queue requests and fetch packages.
92 > > >If portage could be a daemon with 3 threads: one that download
93 > > >packages, one that compiles and one to manage the other and accept
94 > > >requests; then it could schedule download to maximize download
95 > > >throughput,
96 parallel fetch is in cvs already. Portage 2.0.51 already supports this
97 in a way,
98 (emerge -f targets &> /dev/null &); emerge targets
99
100 > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
101 > > portage CVS, although it
102 > > doesnt use threads, because there is no way to kill processes (wget,
103 > > etc.) spawned from within
104 > > a thread, so youd have stale processes after Ctrl+C'ing portage.
105 Doesn't apply in this case, daemonized ebuild.sh just speeds up bash
106 sourcing which most users won't see. Devs, on the other hand see it
107 since they use cvs- no rsyncing of a pregenerated cache.
108
109 >
110 > Great!
111 > BTW, with threads I meant the concept of more than one thing running
112 > in parallel, don't need to be posix threads, can be process or even
113 > one process using select()
114 Currently implemented via fork. Doing long running threads in python is
115 a bit trickier then you might suspect (tried that route, stopping a
116 thread w/out having it check up every 5 seconds is pretty fricking
117 hard/annoying).
118
119 > > Jstubbs is working on an api that will make its way into a later
120 > > revision of portage. As far as parsing
121 > > ebuilds, they are sourced directly from bash.
122 >
123 > There is any explanation/roadmap/design I can look at? Jstubbs reads
124 > this list? What's his goals, how he want to achieve it?
125 He'd have to state his goals-
126 offhand, afaik he threw in some of this goals in
127 http://dev.gentoo.org/~jstubbs/portage/goals.txt
128 as are a collection of mine, and some of genone (Marius Mauch).
129
130 >
131 > About parsing of ebuilds, what do I need to source before the ebuild
132 > itself? I mean, to get things like "inherit" working.
133 All of ebuild.sh. Seriously. :)
134
135 Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild;
136 lot of functions are expected to exist for ebuilds to work, inherit fex
137 (bash function).
138
139 I'd suggest grabbing
140 http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through
141 bin/ebuild*.sh and bin/isolated*
142
143 With the exemption of ebuild-daemon.sh, all of that code is required to
144 create the appropriate bash environment that ebuilds expect. Even with
145 that default env, eclasses exist to extend it and add new functionality.
146
147 Portage *does* have issues that need correcting, calling
148 patterns/design/structure changed, etc. Trying to elaborate on the
149 issues above, hope it provides some insight into why things are they way
150 they are (and potential avenues to check out for improving performance).
151 ~brian
152
153
154 --
155 gentoo-portage-dev@g.o mailing list

Replies