1 |
On Friday 26 August 2005 09:35, Brian Harring wrote: |
2 |
> Any parser that doesn't support full bash syntax isn't acceptable from |
3 |
> where I sit; re: slow down, 2.1 is around 33% faster sourcing the |
4 |
> whole tree (some cases 60% faster, some 5%, etc). The speed up's are |
5 |
> also what allow template's to be swapped, the eapi concept. |
6 |
|
7 |
For the toplevel of the ebuilds there are many things that are not |
8 |
allowed. Basically things must be deterministic for the cache to work. I |
9 |
have built an extension that would parse 98% of current ebuilds properly, |
10 |
and much (more than 10 times) faster than the bash/ecache way. It is in |
11 |
the shape of a python module written in C. It just ignores the functions, |
12 |
so anything is allowed in there. As such the parser understands enough of |
13 |
bash to support it. Even variable substitution and inherit are supported. |
14 |
What's not supported is various kinds of uncommon substitution tricks |
15 |
that should probably not happen in the toplevel either. |
16 |
|
17 |
Using EAPI would also allow to see something as capabilities. Say have |
18 |
portage support version 2-relaxed and version 2-strict. 2-relaxed has all |
19 |
the bash freedom and is parsed using bash. 2-strict would allow parsing |
20 |
by a faster parser module, but would limit the bash freedom. I don't say |
21 |
we have to do this, but if ebuild and eclass EAPI declarations follow a |
22 |
few very simple rules that are normally obeyed, it would be possible to |
23 |
support this thing in the future. |
24 |
|
25 |
One of the problems I see with the current ebuild format is that it is |
26 |
impossible to do incompatible changes at all. This means that many |
27 |
features that might be desired can not be implemented. EAPI can relieve |
28 |
that. To make it easier there should be an easy way to get the EAPI of a |
29 |
package. |
30 |
|
31 |
> |
32 |
> I'd note limiting the bash capabilities is a restriction that |
33 |
> transcends anything EAPI should supply; changes to what's possible in |
34 |
> the language (a subset of bash syntax as you're suggesting) are a |
35 |
> seperate format from where I draw the line in the sand. |
36 |
|
37 |
What I suggest is making a policy that would make this possible in the |
38 |
future. Note that I do not wish to restrict any bash functionality in the |
39 |
various functions in the ebuild. |
40 |
|
41 |
> Mainly, limiting the syntax has the undesired affect of deviating from |
42 |
> what users/devs know already; mistakes *will* occur. QA tools can be |
43 |
> written, but people are fallable; both in writing a QA tool, and |
44 |
> abiding by the syntax subset allowed. |
45 |
|
46 |
The QA tools would just be running the parser. If the parser chokes (which |
47 |
it doesn't easilly) then the ebuild does not conform to the correct |
48 |
syntax. It's even possible to just compare the variables returned. If |
49 |
they don't match, the format is wrong for the C parser. |
50 |
|
51 |
> |
52 |
> > The restriction I propose would be: |
53 |
> > - If EAPI is defined in the ebuild it should be unconditional, on |
54 |
> > it's own line in the toplevel of the ebuild before any functions are |
55 |
> > defined. (preferably the first element after the comments and |
56 |
> > whitespace) |
57 |
> > |
58 |
> > - If EAPI is not defined in the ebuild, but in an eclass, the inherit |
59 |
> > chain should be unconditional and direct. Further more in the |
60 |
> > eclass the above rules should be followed. |
61 |
> > |
62 |
> > Please note that many of the conditions are allready true for current |
63 |
> > ebuilds, just portage can "handle" more. |
64 |
> |
65 |
> inherit chain must be unconditional anyways. re: eapi placement, I |
66 |
> would view that as somewhat arbitrary; the question is what gain it |
67 |
> would give. |
68 |
|
69 |
The gain of putting it at the top would be that there are less chances for |
70 |
parsers to have choked on incompatible syntax. If EAPI is in the top, at |
71 |
some point incompatible syntax might be allowed, and older parsers could |
72 |
still retrieve the EAPI. Of course any syntax that works on 'egrep |
73 |
"^[ \t]*EAPI[ \t]*="' should be no problem. |
74 |
|
75 |
> |
76 |
> I'd wonder about the parsing speed of your parser; the difference |
77 |
> between parsing ebuilds and running from cache metadata is several |
78 |
> orders of magnitude differant- the current cache backend flat_list |
79 |
> and portage design properly corrected ought to widen the gap too. |
80 |
> General cache lookup is slow due to- |
81 |
> A) bad call patterns, allowed by the api; N calls to get different |
82 |
> bits of metadata from a cpv, resulting in potentially N to disk set |
83 |
> of ops. |
84 |
> B) default cache requires opening/closing a file per cpv lookup; |
85 |
> syscall's are killer here. |
86 |
> C) every metadata lookup incurs 2 stats, ebuild and cache file. |
87 |
|
88 |
This parser was part of a stranded rewrite attempt. One of the features |
89 |
was that it regarded packages and package instances (specific files) as |
90 |
objects whose attributes would be lazilly evaluated. That means that it |
91 |
would parse if not available, lookup otherwise. The speed of "emerge -s" |
92 |
is stunning on the program as it uses a directory search which is orders |
93 |
of magnitudes faster than python doing the same thing. |
94 |
|
95 |
> Getting to the point; cache is 100x to 400x faster then sourcing for |
96 |
> <=2.0.51. Haven't tested it under 2.1, should be different due to |
97 |
> cache and regen fixups/rewrites. |
98 |
|
99 |
Don't forget the fact that bash must be execed for normal parses, and that |
100 |
python has extremely slow string handling when not using one of the |
101 |
standard parsing modules (that work in C). To put my money where my mouth |
102 |
is, I've tarred up my code and put it on my dev space: |
103 |
http://dev.gentoo.org/~pauldv/portage_native-0.1.tar.bz2 |
104 |
|
105 |
Just run make in the extracted dir. The binary created is xbuildparse, |
106 |
this is a standalone parser that takes the ebuild as argument. It will |
107 |
look for eclasses in /usr/portage/eclass. |
108 |
|
109 |
The python module can be built with "make xbuildparse.so", and includes a |
110 |
little bit of help reachable through the normal python way. |
111 |
> |
112 |
> Back to the point, essentially, EAPI matters in two places; |
113 |
> 1) metadata transfer from the ebuild env into python side during |
114 |
> depends phase; has to know what to transfer key wise. |
115 |
> 2) actual ebuild build phase executions; if it isn't the depends phase, |
116 |
> eapi being required so that the parser can swap drop in the |
117 |
> appropriate ebuild env template. |
118 |
|
119 |
I think it also matters in actually allowing future incompatible versions |
120 |
of ebuild formats. I don't mean to say good bye to the current format, |
121 |
but when redesigning the format, we should now design it for |
122 |
extensionability. |
123 |
|
124 |
> The restrictions suggested for EAPI would only make sense if eyeing |
125 |
> #1, an alternative parser; no reason to drop the cache unless the |
126 |
> parser is capable of hitting the same runtime performance the cache |
127 |
> can hit (frankly, it's not possible from where I'm sitting although |
128 |
> the gap can be narrowed). |
129 |
|
130 |
You're probably right, but the time needed to parse an ebuild can be |
131 |
reduced that much that parsing will not be the issue anymore, but |
132 |
building the right tree is: |
133 |
|
134 |
time ./xbuildparse /usr/portage/sys-libs/db/db-4.2.52_p2.ebuild |
135 |
&>/dev/null |
136 |
|
137 |
real 0m0.054s |
138 |
user 0m0.048s |
139 |
sys 0m0.002s |
140 |
|
141 |
Please note that the parser is incomplete, does have some small bugs |
142 |
(don't try it on flag-o-matic as it someway goes into an endless loop), |
143 |
and could probably do some things smarter. |
144 |
|
145 |
> So... the EAPI limitations, not much for due to the conclusion above. |
146 |
> |
147 |
> Interested in the parser however, since ebd is effectively a pipe |
148 |
> hack so that pythonic portage can control ebuild.sh. I (and others) |
149 |
> have been after a bashlib for a while, just no one has crunched down |
150 |
> and done it (easier said then done I suspect). |
151 |
|
152 |
See it above. It does not fully understand every bash statement around. |
153 |
And important is that it currently does not understand the "if" |
154 |
statement. This is easy to add though, just wasn't added out of "policy". |
155 |
But being that even my own ebuilds (like db) use it, it should probably |
156 |
be added. |
157 |
|
158 |
I do believe that the parser could be made usefull for most ebuilds. This |
159 |
would however still mean a small restriction in allowed syntax. The |
160 |
parser module has basically one function which is "parse" it parses an |
161 |
ebuild, the eclasses, and returns a list of variables. Not all variables |
162 |
are substituted though, I have a python function that does this. If |
163 |
people are interested I can take a look at sanitizing my whole tree and |
164 |
providing it. |
165 |
|
166 |
Paul |
167 |
|
168 |
-- |
169 |
Paul de Vrieze |
170 |
Gentoo Developer |
171 |
Mail: pauldv@g.o |
172 |
Homepage: http://www.devrieze.net |