Gentoo Archives: gentoo-portage-dev

From: Alec Warner <antarus@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH 3/3] pym/portage/package/ebuild/fetch.py: Factor out _get_uris
Date: Sun, 19 Jan 2014 22:36:48
Message-Id: CAAr7Pr-vakRAbU5Y8wcHown8JvfE_Y6SjCS+HL+32KiUDm=bLQ@mail.gmail.com
In Reply to: [gentoo-portage-dev] [PATCH 3/3] pym/portage/package/ebuild/fetch.py: Factor out _get_uris by "W. Trevor King"
1 On Sat, Jan 18, 2014 at 7:07 PM, W. Trevor King <wking@×××××××.us> wrote:
2
3 > The current fetch() function is quite long, which makes it hard to
4 > know what I can change without adverse side effects. By pulling this
5 > logic out of the main function, we get clearer logic in fetch() and
6 > more explicit input for the config extraction.
7 >
8 > This block was especially complicated, so I also created the helper
9 > functions _get_file_uri_tuples and _expand_mirror. I'd prefer if
10 > _expand_mirror iterated through URIs instead of (group, URI) tuples,
11 > but we need a distinct marker for third-party URIs to build
12 > third_party_mirror_uris which is used to build primaryuri_dict which
13 > is used way down in fetch():
14 >
15 > if checksum_failure_count == \
16 > checksum_failure_primaryuri:
17 > # Switch to "primaryuri" mode in order
18 > # to increase the probablility of
19 > # of success.
20 > primaryuris = \
21 > primaryuri_dict.get(myfile)
22 > if primaryuris:
23 > uri_list.extend(
24 > reversed(primaryuris))
25 >
26 > I don't know if this is useful enough to motivate the uglier
27 > _expandmirror return values, but I'll kick that can down the road for
28 > now.
29 > ---
30 > pym/portage/package/ebuild/fetch.py | 197
31 > +++++++++++++++++++++---------------
32 > 1 file changed, 117 insertions(+), 80 deletions(-)
33 >
34 > diff --git a/pym/portage/package/ebuild/fetch.py
35 > b/pym/portage/package/ebuild/fetch.py
36 > index bd572fa..a617945 100644
37 > --- a/pym/portage/package/ebuild/fetch.py
38 > +++ b/pym/portage/package/ebuild/fetch.py
39 > @@ -15,9 +15,9 @@ import sys
40 > import tempfile
41 >
42 > try:
43 > - from urllib.parse import urlparse
44 > + from urllib.parse import urlparse, urlunparse
45 > except ImportError:
46 > - from urlparse import urlparse
47 > + from urlparse import urlparse, urlunparse
48 >
49 > import portage
50 > portage.proxy.lazyimport.lazyimport(globals(),
51 > @@ -297,6 +297,118 @@ def _get_fetch_resume_size(settings, default='350K'):
52 > return v
53 >
54 >
55 > +def _get_file_uri_tuples(uris):
56 > + """Return a list of (filename, uri) tuples
57 > + """
58 >
59
60 As mike noted on another thread:
61
62 ""Return a list of (filename, uri) tuples."""
63
64 + file_uri_tuples = []
65 > + # Check for 'items' attribute since OrderedDict is not a dict.
66 > + if hasattr(uris, 'items'):
67 > + for filename, uri_set in uris.items():
68 > + for uri in uri_set:
69 > + file_uri_tuples.append((filename, uri))
70 > + if not uri_set:
71 > + file_uri_tuples.append((filename, None))
72 > + else:
73 > + for uri in uris:
74 > + if urlparse(uri).scheme:
75 > + file_uri_tuples.append(
76 > + (os.path.basename(myuri), myuri))
77 > + else:
78 > + file_uri_tuples.append(
79 > + (os.path.basename(myuri), None))
80 > + return file_uri_tuples
81 > +
82 > +
83 > +def _expand_mirror(uri, custom_mirrors=(), third_party_mirrors=()):
84 > + """Replace the 'mirror://' scheme in the uri
85 >
86
87 Sentence should end in a period.
88
89
90 > +
91 > + Returns an iterable listing expanded (group, URI) tuples,
92 > + where the group is either 'custom' or 'third-party'.
93 > + """
94 > + parsed = urlparse(uri)
95 > + mirror = parsed.netloc
96 > + path = parsed.path
97 > + if path:
98 > + # Try user-defined mirrors first
99 > + if mirror in custom_mirrors:
100 > + for cmirr in custom_mirrors[mirror]:
101 > + m_uri = urlparse(cmirr)
102 > + yield ('custom', urlunparse((
103 > + cmirr.scheme, cmirr.netloc, path) +
104 > + parsed[3:]))
105 > +
106 > + # now try the official mirrors
107 > + if mirror in third_party_mirrors:
108 > + uris = []
109 > + for locmirr in third_party_mirrors[mirror]:
110 > + m_uri = urlparse(cmirr)
111 > + uris.append(urlunparse((
112 > + cmirr.scheme, cmirr.netloc, path) +
113 > + parsed[3:]))
114 > + random.shuffle(uris)
115 > + for uri in uris:
116 > + yield ('third-party', uri)
117 > + else:
118 > + writemsg(_("Invalid mirror definition in SRC_URI:\n"),
119 > + noiselevel=-1)
120 > + writemsg(" %s\n" % (uri), noiselevel=-1)
121
122
123 Is this a py3k thing? You should be able to write:
124
125 writemsg(" %s\n" % uri, noiselevel=-1)
126
127 The tuple is only needed for multiple arguments in py2k.
128
129
130
131 +
132 > +
133 > +def _get_uris(uris, settings, custom_mirrors=(), locations=()):
134 >
135
136 I want you to be careful here. This is bad style when you are constructing
137 mutable objects in parameter defaults:
138
139 >>> def func(a=[]):
140 ... a.append(5)
141 ...
142 >>> del func
143 >>> def func(a=[]):
144 ... a.append(5)
145 ... print a
146 ...
147 >>> func()
148 [5]
149 >>> func()
150 [5, 5]
151 >>> func()
152 [5, 5, 5]
153 >>> func()
154 [5, 5, 5, 5]
155 >>> func()
156 [5, 5, 5, 5, 5]
157
158 That doesn't function as most would expect, because it turns out that 'a'
159 is constructed at function definition time (not call time) and is re-used
160 between function calls.
161
162 Now of course, your code is constructing tuples, which are immutable. That
163 is handy.
164
165 I'm not sure if I appreciate that more or less than the other form....
166
167 def _get_uris(uris, settings, custom_mirrors=None, locations=None):
168 if not custom_mirrors:
169 custom_mirrors = ()
170 if not locations:
171 locations = ()
172
173 Another advisable way to write it is to simply not have default arguments,
174 and to force the caller to sync in an empty iterable:
175
176 def_get_uris(uris, settings, custom_mirrors, locations):
177 ...
178
179 and so they will be forced to create the empty iterables at the call site.
180
181 A brief discussion with Sebastian on irc seemed to indicate that he thought
182 creating immutable objects in this way was permissible (we certainly do it
183 often for strings) so I won't raise a fuss. I merely want to point out that
184 you really need to be aware of whether the objects are mutable or immutable.
185
186
187 > + third_party_mirrors = settings.thirdpartymirrors()
188 >
189
190 Why pass in all of settings? If you only need settings.thirdpartymirrors,
191 then just ask for 'a thirdparty mirrors iterable' or whatever it is. You do
192 not want the entire settings object, it is full of disgusting globals, and
193 it will only screw you over later.
194
195 Think about testing this function. Do you really want to try to construct
196 some sort of 'testing' settings object, or simply construct a list?
197
198
199 > + third_party_mirror_uris = {}
200 > + filedict = OrderedDict()
201 > + primaryuri_dict = {}
202 > + for filename, uri in _get_file_uri_tuples(uris=uris):
203 > + if filename not in filedict:
204 > + filedict[filename] = [
205 > + os.path.join(location, 'distfiles',
206 > filename)
207 > + for location in locations]
208 > + if uri is None:
209 > + continue
210 > + if uri.startswith('mirror://'):
211 > + uris = _expand_mirror(
212 >
213
214 too many uri / uris variables...
215
216 can we called this 'expanded_uris'?
217
218 I'm really having trouble tracking what is what. Remember that 'uris' is
219 already a parameter to this function. Are you shadowing it here, or trying
220 to overwrite it?
221
222
223 > + uri=uri, custom_mirrors=custom_mirrors,
224 > + third_party_mirrors=third_party_mirrors)
225 > + filedict[filename].extend(uri for group, uri in
226 > uris)
227 >
228
229 group appears unused in your implicit iterable here.
230
231 perhaps:
232
233 filedict[filename].extend(uri for _, uri in uris)
234
235
236 > + third_party_mirror_uris.setdefault(filename,
237 > []).extend(
238 > + uri for group, uri in uris
239 > + if group == 'third-party')
240 >
241
242 I'm curious if this matters. We are iterator over 'uris' twice. Is it
243 cheaper to do it once and build the iterables once?
244
245 third_party_uris = []
246 for group, uri in uris:
247 if group == 'third_party':
248 third_party_uris.append(uri)
249 filedict[filename].append(uri)
250
251 I'm guessing the iterable is so small it doesn't matter.
252
253
254
255 > + if not filedict[filename]:
256 > + writemsg(
257 > + _("No known mirror by the name:
258 > %s\n")
259 > + % (mirror,))
260 > + else:
261 > + if restrict_fetch or force_mirror:
262 >
263
264 Are these globals or am I missing somethng?
265
266
267 > + # Only fetch from specific mirrors is
268 > allowed.
269 > + continue
270 > + primaryuris = primaryuri_dict.get(filename)
271 > + if primaryuris is None:
272 > + primaryuris = []
273 > + primaryuri_dict[filename] = primaryuris
274 > + primaryuris.append(uri)
275 > +
276 > + # Order primaryuri_dict values to match that in SRC_URI.
277 > + for uris in primaryuri_dict.values():
278 > + uris.reverse()
279 >
280
281 Is there any guaranteed ordering for dict.values()? I thought dict order
282 was random (and they seriously make it random in modern python versions, to
283 detect bugs.) How does this uris.reverse() match the SRC_URI ordering?
284
285
286 > +
287 > + # Prefer third_party_mirrors over normal mirrors in cases when
288 > + # the file does not yet exist on the normal mirrors.
289 > + for filename, uris in third_party_mirror_uris.items():
290 > + primaryuri_dict.setdefault(filename, []).extend(uris)
291 > +
292 > + # Now merge primaryuri values into filedict (includes mirrors
293 > + # explicitly referenced in SRC_URI).
294 > + if "primaryuri" in restrict:
295 >
296
297 same questoin here about where 'restrict' comes from.
298
299
300 > + for filename, uris in filedict.items():
301 > + filedict[filename] = primaryuri_dict.get(filename,
302 > []) + uris
303 > + else:
304 > + for filename in filedict:
305 > + filedict[filename] +=
306 > primaryuri_dict.get(filename, [])
307 > +
308 > + return filedict, primaryuri_dict
309 > +
310 > +
311 > def fetch(myuris, mysettings, listonly=0, fetchonly=0,
312 > locks_in_subdir=".locks", use_locks=1, try_mirrors=1, digests=None,
313 > allow_missing_digests=True):
314 > @@ -328,7 +440,6 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
315 > # couple of checksum failures, to increase the probablility
316 > # of success before checksum_failure_max_tries is reached.
317 > checksum_failure_primaryuri = 2
318 > - thirdpartymirrors = mysettings.thirdpartymirrors()
319 >
320 > # In the background parallel-fetch process, it's safe to skip
321 > checksum
322 > # verification of pre-existing files in $DISTDIR that have the
323 > correct
324 > @@ -412,83 +523,9 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
325 > else:
326 > locations = mymirrors
327 >
328 > - file_uri_tuples = []
329 > - # Check for 'items' attribute since OrderedDict is not a dict.
330 > - if hasattr(myuris, 'items'):
331 > - for myfile, uri_set in myuris.items():
332 > - for myuri in uri_set:
333 > - file_uri_tuples.append((myfile, myuri))
334 > - if not uri_set:
335 > - file_uri_tuples.append((myfile, None))
336 > - else:
337 > - for myuri in myuris:
338 > - if urlparse(myuri).scheme:
339 > -
340 > file_uri_tuples.append((os.path.basename(myuri), myuri))
341 > - else:
342 > -
343 > file_uri_tuples.append((os.path.basename(myuri), None))
344 > -
345 > - filedict = OrderedDict()
346 > - primaryuri_dict = {}
347 > - thirdpartymirror_uris = {}
348 > - for myfile, myuri in file_uri_tuples:
349 > - if myfile not in filedict:
350 > - filedict[myfile]=[]
351 > - for y in range(0,len(locations)):
352 > -
353 > filedict[myfile].append(locations[y]+"/distfiles/"+myfile)
354 > - if myuri is None:
355 > - continue
356 > - if myuri[:9]=="mirror://":
357 > - eidx = myuri.find("/", 9)
358 > - if eidx != -1:
359 > - mirrorname = myuri[9:eidx]
360 > - path = myuri[eidx+1:]
361 > -
362 > - # Try user-defined mirrors first
363 > - if mirrorname in custommirrors:
364 > - for cmirr in
365 > custommirrors[mirrorname]:
366 > - filedict[myfile].append(
367 > - cmirr.rstrip("/")
368 > + "/" + path)
369 > -
370 > - # now try the official mirrors
371 > - if mirrorname in thirdpartymirrors:
372 > - uris = [locmirr.rstrip("/") + "/"
373 > + path \
374 > - for locmirr in
375 > thirdpartymirrors[mirrorname]]
376 > - random.shuffle(uris)
377 > - filedict[myfile].extend(uris)
378 > -
379 > thirdpartymirror_uris.setdefault(myfile, []).extend(uris)
380 > -
381 > - if not filedict[myfile]:
382 > - writemsg(_("No known mirror by the
383 > name: %s\n") % (mirrorname))
384 > - else:
385 > - writemsg(_("Invalid mirror definition in
386 > SRC_URI:\n"), noiselevel=-1)
387 > - writemsg(" %s\n" % (myuri), noiselevel=-1)
388 > - else:
389 > - if restrict_fetch or force_mirror:
390 > - # Only fetch from specific mirrors is
391 > allowed.
392 > - continue
393 > - primaryuris = primaryuri_dict.get(myfile)
394 > - if primaryuris is None:
395 > - primaryuris = []
396 > - primaryuri_dict[myfile] = primaryuris
397 > - primaryuris.append(myuri)
398 > -
399 > - # Order primaryuri_dict values to match that in SRC_URI.
400 > - for uris in primaryuri_dict.values():
401 > - uris.reverse()
402 > -
403 > - # Prefer thirdpartymirrors over normal mirrors in cases when
404 > - # the file does not yet exist on the normal mirrors.
405 > - for myfile, uris in thirdpartymirror_uris.items():
406 > - primaryuri_dict.setdefault(myfile, []).extend(uris)
407 > -
408 > - # Now merge primaryuri values into filedict (includes mirrors
409 > - # explicitly referenced in SRC_URI).
410 > - if "primaryuri" in restrict:
411 > - for myfile, uris in filedict.items():
412 > - filedict[myfile] = primaryuri_dict.get(myfile, [])
413 > + uris
414 > - else:
415 > - for myfile in filedict:
416 > - filedict[myfile] += primaryuri_dict.get(myfile, [])
417 > + filedict, primaryuri_dict = _get_uris(
418 > + uris=myuris, settings=mysettings,
419 > + custom_mirrors=custom_mirrors, locations=locations)
420 >
421 > can_fetch=True
422 >
423 > --
424 > 1.8.5.2.8.g0f6c0d1
425 >
426 >
427 >

Replies