Gentoo Archives: gentoo-portage-dev

From:	Brian Harring <ferringb@g.o>
To:	gentoo-portage-dev@l.g.o
Subject:	Re: [gentoo-portage-dev] Current portage well designed, but badly used
Date:	Tue, 30 Nov 2004 14:19:54
Message-Id:	`1101824527.32056.102.camel@localhost.localdomain`
In Reply to:	Re: [gentoo-portage-dev] Current portage well designed, but badly used by Gustavo Barbieri

1	On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
2	> > >The portage library is too heavy, complicated and make things slow.
3	> > >Heavy and complicated I noticed from (trying to) look at the source,
4	> > >slow by usage.
5	You really should explain how it's heavy and complicated.
6	Generalizations don't help to improve it :)
7
8	> > >time emerge # without parameters
9	> > >real 0m0.614s
10	> > >user 0m0.487s
11	> > >sys 0m0.046s
12	> > >
13	> > >time emerge -pv world # 16 packages to be upgraded
14	> > >real 0m22.664s
15	> > >user 0m12.423s
16	> > >sys 0m1.130s
17	There's quite a large difference between just importing portage, and
18	actually parsing your profile, determining what your use flags are
19	(since profiles can define defaults, and some use flags are based upon
20	packages being installed dependant on the profile, perl fex). That, and
21	walking/building the depgraph, querying the cache, locking, etc.
22
23	> > >
24	> > >It's too much, look at debian apt, it's fast. And I can't see why
25	> > >portage is slow.
26	> > >Forgive me if I'm wrong, but portage just need to parse
27	> > >/var/lib/portage/world (237 entries in my case), them for each check
28	> > >if there is any other version greater than and if so check for
29	> > >dependencies. Why 22seconds? A hand made take less than 1.
30	Checks first level depends of all packages in world also. So that list
31	just got larger :)
32	Regarding debian apt, it's likely apples/oranges as urilith stated. One
33	thing to note is that afaik, debian dependencies lack versions- they're
34	basically a flat namespace.
35
36	fex, if a dpkg states it deps mysql, there is mysql. Singular.
37	W/ portage, well, need to determine what version is available based upon
38	keywords, package.mask, and users /etc/portage/package.keywords (and
39	other things).
40
41	Note I work on portage, not dpkg/apt. So I could be talking out of my
42	ass there...
43
44	> I'll look at CVS, but I don't see why portage need to be slow. As you
45	> said, it's being fixed.
46	Elaborate on how it's slow. There are various algs/processes that you
47	could be referencing. Rough cvs improvements, 33% bash sourcing
48	improvement- for those thinking parsing bash == slow portage, it's not
49	the case. Users never see portage sourcing ebuilds for their keys
50	(DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and
51	the ebuilds in the overlay are _only_ sourced when they've changed. The
52	improvements in bash sourcing speed in cvs were A) intended to fix env
53	handling for ebuilds (ancillary benefit), and B) speed up regen for devs
54	and the server that generates the metacache for rsync users.
55
56	So no, bash isn't really what's slowing things down. :)
57
58	If you're referencing the nice long pause after sync'ing, that's
59	transfer of the cache from ${PORTDIR}/metadata/cache to
60	/var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially
61	might have that pause eliminated also, although that would require
62	ensuring the tree is readonly- that's another can of worms.
63
64	Aside from that, there is searching speed, which is a bit slowed down by
65	the current use of locks in the default portage_db_flat ebuild metadata
66	cache. Additionally, portage_db_flat uses seperate files for each cpv
67	(category/package-version), so there is considerable overhead from
68	opening/closing a crapload of files.
69
70	Using portage_db_anydbm improves this, although it has a few issues of
71	it's own.
72
73	If you're referencing doing a search based on description, well, the
74	cache backend as mentioned above slows things down pretty majorly. Even
75	with anydbm, it still has to proceed cpv by cpv- basically walk the
76	entire cache, while verifying the cache isn't stale- eg, check the
77	stored mtime, and compare it to the ebuilds mtime.
78
79	Things could be speed up by treating the tree/cache on disk as readonly-
80	this is something being bantered about, and may happen. Treating the
81	tree/cache as readonly means we don't have to do any locking in the
82	cache, nor staleness checks (less IO).
83
84	>
85	> and I can't see that difference between portage and apt in the area
86	> portage is slow, ok apt uses a db and don't need to check use flags,
87	> but they're orders of magnitude different. Even lemons and apples are
88	> that different ;)
89	See above.
90
91	> > > - portage to act as a daemon, queue requests and fetch packages.
92	> > >If portage could be a daemon with 3 threads: one that download
93	> > >packages, one that compiles and one to manage the other and accept
94	> > >requests; then it could schedule download to maximize download
95	> > >throughput,
96	parallel fetch is in cvs already. Portage 2.0.51 already supports this
97	in a way,
98	(emerge -f targets &> /dev/null &); emerge targets
99
100	> > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
101	> > portage CVS, although it
102	> > doesnt use threads, because there is no way to kill processes (wget,
103	> > etc.) spawned from within
104	> > a thread, so youd have stale processes after Ctrl+C'ing portage.
105	Doesn't apply in this case, daemonized ebuild.sh just speeds up bash
106	sourcing which most users won't see. Devs, on the other hand see it
107	since they use cvs- no rsyncing of a pregenerated cache.
108
109	>
110	> Great!
111	> BTW, with threads I meant the concept of more than one thing running
112	> in parallel, don't need to be posix threads, can be process or even
113	> one process using select()
114	Currently implemented via fork. Doing long running threads in python is
115	a bit trickier then you might suspect (tried that route, stopping a
116	thread w/out having it check up every 5 seconds is pretty fricking
117	hard/annoying).
118
119	> > Jstubbs is working on an api that will make its way into a later
120	> > revision of portage. As far as parsing
121	> > ebuilds, they are sourced directly from bash.
122	>
123	> There is any explanation/roadmap/design I can look at? Jstubbs reads
124	> this list? What's his goals, how he want to achieve it?
125	He'd have to state his goals-
126	offhand, afaik he threw in some of this goals in
127	http://dev.gentoo.org/~jstubbs/portage/goals.txt
128	as are a collection of mine, and some of genone (Marius Mauch).
129
130	>
131	> About parsing of ebuilds, what do I need to source before the ebuild
132	> itself? I mean, to get things like "inherit" working.
133	All of ebuild.sh. Seriously. :)
134
135	Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild;
136	lot of functions are expected to exist for ebuilds to work, inherit fex
137	(bash function).
138
139	I'd suggest grabbing
140	http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through
141	bin/ebuild.sh and bin/isolated
142
143	With the exemption of ebuild-daemon.sh, all of that code is required to
144	create the appropriate bash environment that ebuilds expect. Even with
145	that default env, eclasses exist to extend it and add new functionality.
146
147	Portage does have issues that need correcting, calling
148	patterns/design/structure changed, etc. Trying to elaborate on the
149	issues above, hope it provides some insight into why things are they way
150	they are (and potential avenues to check out for improving performance).
151	~brian
152
153
154	--
155	gentoo-portage-dev@g.o mailing list

Replies

Subject	Author
Re: [gentoo-portage-dev] Current portage well designed, but badly used	Jason Stubbs <jstubbs@g.o>
Re: [gentoo-portage-dev] Current portage well designed, but badly used	Gustavo Barbieri <barbieri@×××××.com>

Report Message

Find on MARC Find on Google Groups