Gentoo Archives: gentoo-portage-dev

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Re: [gentoo-dev] EBUILD_FORMAT support
Date: Fri, 26 Aug 2005 14:32:37
Message-Id: 200508261258.09962.pauldv@gentoo.org
In Reply to: [gentoo-portage-dev] Re: [gentoo-dev] EBUILD_FORMAT support by Brian Harring
1 On Friday 26 August 2005 09:35, Brian Harring wrote:
2 > Any parser that doesn't support full bash syntax isn't acceptable from
3 > where I sit; re: slow down, 2.1 is around 33% faster sourcing the
4 > whole tree (some cases 60% faster, some 5%, etc). The speed up's are
5 > also what allow template's to be swapped, the eapi concept.
6
7 For the toplevel of the ebuilds there are many things that are not
8 allowed. Basically things must be deterministic for the cache to work. I
9 have built an extension that would parse 98% of current ebuilds properly,
10 and much (more than 10 times) faster than the bash/ecache way. It is in
11 the shape of a python module written in C. It just ignores the functions,
12 so anything is allowed in there. As such the parser understands enough of
13 bash to support it. Even variable substitution and inherit are supported.
14 What's not supported is various kinds of uncommon substitution tricks
15 that should probably not happen in the toplevel either.
16
17 Using EAPI would also allow to see something as capabilities. Say have
18 portage support version 2-relaxed and version 2-strict. 2-relaxed has all
19 the bash freedom and is parsed using bash. 2-strict would allow parsing
20 by a faster parser module, but would limit the bash freedom. I don't say
21 we have to do this, but if ebuild and eclass EAPI declarations follow a
22 few very simple rules that are normally obeyed, it would be possible to
23 support this thing in the future.
24
25 One of the problems I see with the current ebuild format is that it is
26 impossible to do incompatible changes at all. This means that many
27 features that might be desired can not be implemented. EAPI can relieve
28 that. To make it easier there should be an easy way to get the EAPI of a
29 package.
30
31 >
32 > I'd note limiting the bash capabilities is a restriction that
33 > transcends anything EAPI should supply; changes to what's possible in
34 > the language (a subset of bash syntax as you're suggesting) are a
35 > seperate format from where I draw the line in the sand.
36
37 What I suggest is making a policy that would make this possible in the
38 future. Note that I do not wish to restrict any bash functionality in the
39 various functions in the ebuild.
40
41 > Mainly, limiting the syntax has the undesired affect of deviating from
42 > what users/devs know already; mistakes *will* occur. QA tools can be
43 > written, but people are fallable; both in writing a QA tool, and
44 > abiding by the syntax subset allowed.
45
46 The QA tools would just be running the parser. If the parser chokes (which
47 it doesn't easilly) then the ebuild does not conform to the correct
48 syntax. It's even possible to just compare the variables returned. If
49 they don't match, the format is wrong for the C parser.
50
51 >
52 > > The restriction I propose would be:
53 > > - If EAPI is defined in the ebuild it should be unconditional, on
54 > > it's own line in the toplevel of the ebuild before any functions are
55 > > defined. (preferably the first element after the comments and
56 > > whitespace)
57 > >
58 > > - If EAPI is not defined in the ebuild, but in an eclass, the inherit
59 > > chain should be unconditional and direct. Further more in the
60 > > eclass the above rules should be followed.
61 > >
62 > > Please note that many of the conditions are allready true for current
63 > > ebuilds, just portage can "handle" more.
64 >
65 > inherit chain must be unconditional anyways. re: eapi placement, I
66 > would view that as somewhat arbitrary; the question is what gain it
67 > would give.
68
69 The gain of putting it at the top would be that there are less chances for
70 parsers to have choked on incompatible syntax. If EAPI is in the top, at
71 some point incompatible syntax might be allowed, and older parsers could
72 still retrieve the EAPI. Of course any syntax that works on 'egrep
73 "^[ \t]*EAPI[ \t]*="' should be no problem.
74
75 >
76 > I'd wonder about the parsing speed of your parser; the difference
77 > between parsing ebuilds and running from cache metadata is several
78 > orders of magnitude differant- the current cache backend flat_list
79 > and portage design properly corrected ought to widen the gap too.
80 > General cache lookup is slow due to-
81 > A) bad call patterns, allowed by the api; N calls to get different
82 > bits of metadata from a cpv, resulting in potentially N to disk set
83 > of ops.
84 > B) default cache requires opening/closing a file per cpv lookup;
85 > syscall's are killer here.
86 > C) every metadata lookup incurs 2 stats, ebuild and cache file.
87
88 This parser was part of a stranded rewrite attempt. One of the features
89 was that it regarded packages and package instances (specific files) as
90 objects whose attributes would be lazilly evaluated. That means that it
91 would parse if not available, lookup otherwise. The speed of "emerge -s"
92 is stunning on the program as it uses a directory search which is orders
93 of magnitudes faster than python doing the same thing.
94
95 > Getting to the point; cache is 100x to 400x faster then sourcing for
96 > <=2.0.51. Haven't tested it under 2.1, should be different due to
97 > cache and regen fixups/rewrites.
98
99 Don't forget the fact that bash must be execed for normal parses, and that
100 python has extremely slow string handling when not using one of the
101 standard parsing modules (that work in C). To put my money where my mouth
102 is, I've tarred up my code and put it on my dev space:
103 http://dev.gentoo.org/~pauldv/portage_native-0.1.tar.bz2
104
105 Just run make in the extracted dir. The binary created is xbuildparse,
106 this is a standalone parser that takes the ebuild as argument. It will
107 look for eclasses in /usr/portage/eclass.
108
109 The python module can be built with "make xbuildparse.so", and includes a
110 little bit of help reachable through the normal python way.
111 >
112 > Back to the point, essentially, EAPI matters in two places;
113 > 1) metadata transfer from the ebuild env into python side during
114 > depends phase; has to know what to transfer key wise.
115 > 2) actual ebuild build phase executions; if it isn't the depends phase,
116 > eapi being required so that the parser can swap drop in the
117 > appropriate ebuild env template.
118
119 I think it also matters in actually allowing future incompatible versions
120 of ebuild formats. I don't mean to say good bye to the current format,
121 but when redesigning the format, we should now design it for
122 extensionability.
123
124 > The restrictions suggested for EAPI would only make sense if eyeing
125 > #1, an alternative parser; no reason to drop the cache unless the
126 > parser is capable of hitting the same runtime performance the cache
127 > can hit (frankly, it's not possible from where I'm sitting although
128 > the gap can be narrowed).
129
130 You're probably right, but the time needed to parse an ebuild can be
131 reduced that much that parsing will not be the issue anymore, but
132 building the right tree is:
133
134 time ./xbuildparse /usr/portage/sys-libs/db/db-4.2.52_p2.ebuild
135 &>/dev/null
136
137 real 0m0.054s
138 user 0m0.048s
139 sys 0m0.002s
140
141 Please note that the parser is incomplete, does have some small bugs
142 (don't try it on flag-o-matic as it someway goes into an endless loop),
143 and could probably do some things smarter.
144
145 > So... the EAPI limitations, not much for due to the conclusion above.
146 >
147 > Interested in the parser however, since ebd is effectively a pipe
148 > hack so that pythonic portage can control ebuild.sh. I (and others)
149 > have been after a bashlib for a while, just no one has crunched down
150 > and done it (easier said then done I suspect).
151
152 See it above. It does not fully understand every bash statement around.
153 And important is that it currently does not understand the "if"
154 statement. This is easy to add though, just wasn't added out of "policy".
155 But being that even my own ebuilds (like db) use it, it should probably
156 be added.
157
158 I do believe that the parser could be made usefull for most ebuilds. This
159 would however still mean a small restriction in allowed syntax. The
160 parser module has basically one function which is "parse" it parses an
161 ebuild, the eclasses, and returns a list of variables. Not all variables
162 are substituted though, I have a python function that does this. If
163 people are interested I can take a look at sanitizing my whole tree and
164 providing it.
165
166 Paul
167
168 --
169 Paul de Vrieze
170 Gentoo Developer
171 Mail: pauldv@g.o
172 Homepage: http://www.devrieze.net