Gentoo Archives: gentoo-dev

From: Michael Cummings <mcummings@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Policy/thoughts for ebuilds that survive only because of mirror caches?
Date: Wed, 31 Dec 2003 18:06:58
1 Should probably start by saying I'm not trying to ruffle any feathers
2 here. I recently started looking into cleaning up dev-perl of ebuilds
3 who's tarballs or no longer available at the SRC_URI. Part way into it I
4 discovered that although the SRC no longer supports/contains the
5 tarball, the ebuild itself still functions because the tarball has been
6 mirrored on the gentoo mirror cache's. So I started wondering - how
7 many other ebuilds fit the bill?
9 My methods were hopefully not too questionable - I did a scan of the
10 portage tree, attempted as best as possible to replicate the creation of
11 the P, PV, PN, MY_P, MY_PV, etc. variables, and then took the listings for
12 the SRC_URI's in that context and checked to see if what they pointed at
13 was still there. I used two different techniques for this for two
14 passes. The first was a set of short perl scripts that attempted to
15 verify that there was a something to stream from the SRC_URI. First pass
16 eliminated about 10,000 SRC_URI's (if I remember right, the total
17 number was 15,000'ish SRC_URI's, which includes patches, multiple
18 sources, etc). The second pass took the resulting list of possibly bad
19 ebuild/src's and attempted a wget against the target. I still have some
20 misgivings about the accuracy because this is still dependant on my
21 correctly creating the internal variables correctly back in step one. I
22 eliminated, from the get-go, any ebuilds that used the mirror://
23 syntax, and I know that there are false failures for SRC_URI's that use an
24 inline ${P/some/change/}. But the numbers are still pretty high, and I've
25 done random spot checking to confirm that, yep, there's nothing there.
27 So I guess my question is, what's the take on this? Should we be only
28 providing ebuilds that point to src's that still work outside of our
29 cacheing system? My results were that there were 1915 ebuilds pointing to
30 2290 invalid URL's. Here's the list[1] that I came up with after the second
31 pass. I welcome (ok, I live in fear of criticism, but that's
32 counterproductive) feedback on the scripts. This[2] is the bash script that
33 did the initial pass, as well as the perl[3] script that did the initial
34 checks. This[4] is the second pass script that attempted to perform actual
35 wget's on the final list. For the weak of eye, here's the secondpass as
36 html[5].
38 If you want to attempt to use my scripts yourself - beware the second pass,
39 which is definitely necessary (the secondpass file was about half the size
40 of the first pass - network issues? not sure) as it does a complete wget to
41 confirm against.
43 Thanks all for taking the time to read this mess, and yes I realize
44 there are a fair number of perl ebuilds in there too,
46 Mike
50 1.
51 2.
52 3.
53 4.
54 5.
56 --
57 gentoo-dev@g.o mailing list