1 |
Should probably start by saying I'm not trying to ruffle any feathers |
2 |
here. I recently started looking into cleaning up dev-perl of ebuilds |
3 |
who's tarballs or no longer available at the SRC_URI. Part way into it I |
4 |
discovered that although the SRC no longer supports/contains the |
5 |
tarball, the ebuild itself still functions because the tarball has been |
6 |
mirrored on the gentoo mirror cache's. So I started wondering - how |
7 |
many other ebuilds fit the bill? |
8 |
|
9 |
My methods were hopefully not too questionable - I did a scan of the |
10 |
portage tree, attempted as best as possible to replicate the creation of |
11 |
the P, PV, PN, MY_P, MY_PV, etc. variables, and then took the listings for |
12 |
the SRC_URI's in that context and checked to see if what they pointed at |
13 |
was still there. I used two different techniques for this for two |
14 |
passes. The first was a set of short perl scripts that attempted to |
15 |
verify that there was a something to stream from the SRC_URI. First pass |
16 |
eliminated about 10,000 SRC_URI's (if I remember right, the total |
17 |
number was 15,000'ish SRC_URI's, which includes patches, multiple |
18 |
sources, etc). The second pass took the resulting list of possibly bad |
19 |
ebuild/src's and attempted a wget against the target. I still have some |
20 |
misgivings about the accuracy because this is still dependant on my |
21 |
correctly creating the internal variables correctly back in step one. I |
22 |
eliminated, from the get-go, any ebuilds that used the mirror:// |
23 |
syntax, and I know that there are false failures for SRC_URI's that use an |
24 |
inline ${P/some/change/}. But the numbers are still pretty high, and I've |
25 |
done random spot checking to confirm that, yep, there's nothing there. |
26 |
|
27 |
So I guess my question is, what's the take on this? Should we be only |
28 |
providing ebuilds that point to src's that still work outside of our |
29 |
cacheing system? My results were that there were 1915 ebuilds pointing to |
30 |
2290 invalid URL's. Here's the list[1] that I came up with after the second |
31 |
pass. I welcome (ok, I live in fear of criticism, but that's |
32 |
counterproductive) feedback on the scripts. This[2] is the bash script that |
33 |
did the initial pass, as well as the perl[3] script that did the initial |
34 |
checks. This[4] is the second pass script that attempted to perform actual |
35 |
wget's on the final list. For the weak of eye, here's the secondpass as |
36 |
html[5]. |
37 |
|
38 |
If you want to attempt to use my scripts yourself - beware the second pass, |
39 |
which is definitely necessary (the secondpass file was about half the size |
40 |
of the first pass - network issues? not sure) as it does a complete wget to |
41 |
confirm against. |
42 |
|
43 |
Thanks all for taking the time to read this mess, and yes I realize |
44 |
there are a fair number of perl ebuilds in there too, |
45 |
|
46 |
Mike |
47 |
|
48 |
|
49 |
|
50 |
1. http://dev.gentoo.org/~mcummings/secondpass.txt |
51 |
2. http://dev.gentoo.org/~mcummings/track_builds.txt |
52 |
3. http://dev.gentoo.org/~mcummings/getter.txt |
53 |
4. http://dev.gentoo.org/~mcummings/pass2.txt |
54 |
5. http://dev.gentoo.org/~mcummings/secondpass.html |
55 |
|
56 |
-- |
57 |
gentoo-dev@g.o mailing list |