Gentoo Archives: gentoo-portage-dev

From: Brian Harring <ferringb@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH] inital Manifest2 support
Date: Tue, 22 Nov 2005 21:27:25
Message-Id: 20051122212623.GE5340@nightcrawler
In Reply to: Re: [gentoo-portage-dev] [PATCH] inital Manifest2 support by Marius Mauch
1 On Tue, Nov 22, 2005 at 09:41:46PM +0100, Marius Mauch wrote:
2 > On Fri, 18 Nov 2005 18:40:14 -0600
3 > Brian Harring <ferringb@g.o> wrote:
4 >
5 > > On Wed, Nov 16, 2005 at 04:33:02AM +0100, Marius Mauch wrote:
6 > > > - digestfn =
7 > > > mysettings["FILESDIR"]+"/digest-"+mysettings["PF"]
8 > > > + digestfn =
9 > > > portdb.finddigest(mysettings["CATEGORY"]+"/"+mysettings["PF"])
10 > > Like this mod personally, but don't like the '/'. You're passing in
11 > > two chunks of data, pass them in seperate imo, rather then collapsing
12 > > it for every call.
13 >
14 > Not sure I understand this, you mean changing finddigest() to use two
15 > params? Doesn't really make sense to me, should just work for any CPV
16 > entry.
17
18 Well, I'd personally prefer passing in an object, but that's
19 probably not going to occur ;-)
20
21 using two params deviates from what we do in other funcs, but it also
22 removes hardcoding of path seperator. For windows support (yes it's
23 nuts, but bear with) it would have to split and rejoin the path,
24 converting from '/' to os.path.sep.
25
26 Pass it in as two seperated components, it can join it internally on
27 it's own (if/when we N deep categories, it'll have to split that param
28 though).
29
30 Debatable obviously, I just dislike the requirement of knowing a heck
31 of a lot more about the func prototype then is required-
32 finddigest(category, package, version) vs
33 finddigest(category+"/"+package+"-"+version) fex. One I can figure
34 out from pydoc, the other I'll probably dig up an example in the code
35 to see if it's cp or cpv.
36
37
38 > > Kind of disliking the 'calledFor' being thrown around here, plus the
39 > > v2 check, then resorting to a seperate method of checking.
40 > > Integrate 'em potentially?
41 > >
42 > > You're basically treating it as two seperate funcs (from how I look
43 > > at it), should split it up (even if the seperated funcs call each
44 > > other, or a common func).
45 >
46 > Yeah, was just going the least intrusive route for now, so keep the API
47 > compatible. This wasn't intended for inclusion yet anyway, just some
48 > "sneak preview" ;)
49
50 Iron out the implementation a bit, and I personally wouldn't have
51 issues with it going into svn.
52
53 Helluva lot easier to hack on it out of a repo, then via a set of
54 patches anyways :)
55
56 > > > @@ -2218,7 +2229,7 @@
57 > > > return 1
58 > > >
59 > > >
60 > > > -def digestParseFile(myfilename):
61 > > > +def digestParseFile(myfilename, calledFor="manifest"):
62 > > > """(filename) -- Parses a given file for entries matching:
63 > > > MD5 MD5_STRING_OF_HEX_CHARS FILE_NAME FILE_SIZE
64 > > > Ignores lines that do not begin with 'MD5' and returns a
65 > > > @@ -2230,6 +2241,10 @@
66 > > >
67 > > > mydigests={}
68 > > > for x in mylines:
69 > > > + tmp = portage_manifest.convertManifest2ToDigest(x,
70 > > > calledAs=calledFor)
71 > > > + if tmp[0] == True:
72 > > > + mydigests[tmp[1]] = tmp[2]
73 > > > + continue
74 > > > myline=string.split(x)
75 > > > if len(myline) < 4:
76 > > > #invalid line
77 > > > @@ -2258,7 +2273,10 @@
78 > > > the basedir given. Returns 1 only if all files exist and
79 > > > match the md5s. """
80 > > > for x in myfiles:
81 > > > - if not mydigests.has_key(x):
82 > > > + if
83 > > > portage_manfest.getManifestVersion(basedir+"/Manifest") == 2 and
84 > > > x.startswith("files/digest-"):
85 > > hardcoding sucks, kthnx
86 > > If possible, try to shove it into the func so modification (down the
87 > > line) occurs in one spot, rather then having to resort to
88 > > search/replace :)
89 >
90 > Shove what into which func? I'm bad at guessing today.
91
92 Well, you're hardcoding /Manifest for all calls to it.
93
94 Kind of an indication the api should be changed and internally handle
95 it imo, or encapsulate in some other fashion.
96
97 Yes it's anal, but new additions to portage api I'm after having saner
98 functions in use rather then what's there now. Saner is relative
99 obviously, but assume you get what I mean.
100
101 > > > Index: pym/portage_checksum.py
102 > > > ===================================================================
103 > > > --- pym/portage_checksum.py (revision 2312)
104 > > > +++ pym/portage_checksum.py (working copy)
105 > > > @@ -126,3 +126,12 @@
106 > > > portage_locks.unlockfile(mylock)
107 > > >
108 > > > return (myhash,mysize)
109 > > > +
110 > > > +
111 > > > +hashfunc_map = {"MD5": md5hash, "SHA1": sha1hash}
112 > > > +
113 > > > +def perform_multiple_checksums(filename, hashes=["MD5"],
114 > > > calc_prelink=0):
115 > > > + rVal = {}
116 > > > + for x in hashes:
117 > > > + rVal[x] = perform_checksum(filename,
118 > > > hashfunc_map[x], calc_prelink)[0]
119 > > > + return rVal
120 > > Not sure if I like the [0] tbh.
121 > > Still think the size test should be broken off as it's own pseudo
122 > > hashfunc, and implemented in that way. Course, would need to weight
123 > > it somehow, so that size test is done first, then the more expensive
124 > > chf tests...
125 >
126 > Yep, see above about "least intrusive".
127
128 I can live with intrusive if the new code is better quality then
129 existing- stable code sucks, if it needs to be slapped around a bit to
130 play nice with new code, works for me. Chunking size off so it's no
131 longer a special case makes sense in my opinion, since otherwise
132 each/every chksum func will have to do it's own stat and return size
133 just to maintain a similar return to md5.
134
135
136 > > > --- pym/portage_manifest.py (revision 0)
137 > > > +++ pym/portage_manifest.py (revision 0)
138 > > > @@ -0,0 +1,163 @@
139 > > > +import os
140 > > > +
141 > > > +import portage, portage_exception, portage_checksum,
142 > > > portage_versions +
143 > > > +class Manifest2Entry:
144 > > > + def __init__(self, entrytype, name, config,
145 > > > methods=portage_checksum.get_valid_checksum_keys()):
146 > > The methods trick you're doing is a one time evaluation... not sure
147 > > if that's the best approach tbh.
148 > >
149 > > > + if entrytype not in ["EBUILD", "AUXFILE",
150 > > > "MISCFILE", "SRCURI"]:
151 > > > + raise
152 > > > portage_exception.PortageException("unknown Manifest entry type:
153 > > > %s" % entrytype)
154 > > break that out into a class constant, and please kill off the capps
155 > > (they kill kittens, or something).
156 >
157 > Ok on the constant, but I like the caps here ;)
158
159 Constants kill kittens.
160
161 You don't want to kill kittens, do you? ;)
162
163 > > > + def generate(self, pkgdir, genOldStyle=True):
164 > > > + if self.type == "EBUILD" or self.type ==
165 > > > "MISCFILE":
166 > > > + filename = pkgdir+"/"+self.name
167 > > > + elif self.type == "AUXFILE":
168 > > > + filename = pkgdir+"/files/"+self.name
169 > > > + elif self.type == "SRCURI":
170 > > > + filename =
171 > > > self.config["DISTDIR"]+"/"+self.name
172 > > > + if not os.path.exists(filename):
173 > > > + raise
174 > > > portage_exception.PortageException("file for Manifest entry doesn't
175 > > > exist: %s" % filename)
176 > > > +
177 > > > + self.size = os.stat(filename).st_size
178 > > > + self.checksums =
179 > > > portage_checksum.perform_multiple_checksums(filename, self.methods)
180 > > See comment about making size into a pseudo chf func- would eliminate
181 > > this case.
182 > > Chunking this up into type handlers might be wise- would make it
183 > > easier to extend (modify a class dict rather then N funcs). You've
184 > > already got the seperated beast, just throw it into a mapping and
185 > > it'll be nice and pretty.
186 >
187 > Yeah, though somehow having a function just for filename composition
188 > seems a bit overkill. Don't mind either way.
189 Eliminates a special case though; see comment above about alternative
190 chf funcs having to return a list instead of the chksum, and why this
191 should be done for the long haul should be clear :)
192
193 > > > + cs_list = [x+" "+self.checksums[x] for x in
194 > > > self.checksums.keys()]
195 > > > + self.entry = " ".join([self.type, self.name,
196 > > > str(self.size)]+cs_list)
197 > > > + if genOldStyle and self.type != "SRCURI":
198 > > > + if "MD5" in self.methods:
199 > > > + oldentry =
200 > > > Manifest2Entry._makeOldEntry(pkgdir, filename, size=self.size,
201 > > > md5sum=self.checksums["MD5"])
202 > > > + else:
203 > > > + oldentry =
204 > > > Manifest2Entry._makeOldEntry(pkgdir, filename, size=self.size)
205 > > Kinda iffy on this approach also, seems unclean (quite subjective I
206 > > realize).
207 >
208 > Better ideas welcome.
209 Seperated class potentially, although kind of overkill.
210
211 Could potentially make it easier to seperate new and old though, so
212 that old can be thrown out easier down the line (instead of an
213 invasive removal of code from the combined class).
214
215 Haven't looked at the code since the first review, so the paragraph
216 above could be mildly insane. :)
217
218 > > > +
219 > > > + if debug:
220 > > > + print "pkgdir:", pkgdir
221 > > > +
222 > > > + cpv_list = [pkgdir.rstrip("/").split("/")[-2]+"/"+x[:-7]
223 > > > for x in portage.listdir(pkgdir) if x.endswith(".ebuild")]
224 > > os.path.sep...
225 >
226 > Cosmetics.
227 Readability/maintainability ;)
228
229 > > > +
230 > > > + return (0, None)
231 > > Eh?
232 > > Exceptions seem like a better route here then magic constants and
233 > > string error msgs.
234 >
235 > Eh, I don't return error messages but the name of the object that
236 > caused the failure. Wether to use exceptions for this comes down on who
237 > is going to call this, don't really see the point in using exceptions
238 > here as I expect only two or three callers (namely repoman and ebuild)
239 > which have to act on the return code directly.
240
241 Those two or three callers are going to have some funky voodoo in
242 place that a year down the line, I'm going to scratch my head over
243 (since I'm an idiot).
244
245 You're passing data back and flagging an error, exception is really
246 the route to go. It's extensible mainly, such that if the '2 or 3'
247 becoms '5 or 6' we don't have magic constants/voodoo spreading through
248 the code.
249
250 > > > +def convertManifest2ToDigest(manifestline, calledAs="manifest"):
251 > > > + mysplit = manifestline.split()
252 > > > + if mysplit[0] not in ["EBUILD", "AUXFILE", "MISCFILE",
253 > > > "SRCURI"] or ((len(mysplit) - 3) % 2) != 0:
254 > > > + return (False, None, None)
255 > > > + if mysplit[0] == "SRCURI" and calledAs == "manifest":
256 > > > + return (False, None, None)
257 > > > + if mysplit[0] in ["EBUILD", "AUXFILE", "MISCFILE"] and
258 > > > calledAs == "digest":
259 > > > + return (False, None, None)
260 > > > +
261 > > > + checksums = {}
262 > > > +
263 > > > + mytype = mysplit.pop(0)
264 > > > + filename = mysplit.pop(0)
265 > > > + if mytype == "AUXFILE":
266 > > > + filename = "files/"+filename
267 > > > + size = long(mysplit.pop(0))
268 > > > + checksums = dict(zip(mysplit[::2], mysplit[1::2]))
269 > > > + if not checksums.has_key("size"):
270 > > > + checksums["size"] = size
271 > > > +
272 > > > + return (True, filename, checksums)
273 > > > +
274 > > > +def getManifestVersion(manifest):
275 > > > + rVal = 0
276 > > > +
277 > > > + if not os.path.exists(manifest):
278 > > > + return rVal
279 > > > + myfile = open(manifest, "r")
280 > > > + lines = myfile.readlines()
281 > > > + for l in lines:
282 > > > + mysplit = l.split()
283 > > > + if mysplit[0] == "MD5" and len(mysplit) == 4 and
284 > > > rVal == 0:
285 > > > + rVal = 1
286 > > > + elif mysplit[0] in ["EBUILD", "AUXFILE",
287 > > > "MISCFILE", "SRCURI"] and ((len(mysplit) - 3) % 2) == 0 and rVal in
288 > > > [0,1]:
289 > > > + rVal = 2
290 > > > + return rVal
291 > >
292 > > Offhand... I really think you need to do some subclassing above.
293 > > Flow's mostly there, but a lot of icky stuff for long term
294 > > maintenance (imo). Seems like a refactoring of the patch would bring
295 > > line count down, and up readability.
296 >
297 > Maybe, but I'm no such an OO maniac and also suck at refactoring ;)
298 Might want to see what you can raid from saviors portage.fs.*
299
300 Yes it's extra, but the fs objs will wind up in a stable portage
301 sooner or later for the contents benefits, and in this case the
302 encapsulation might be useful for you.
303 ~harring