1 |
Marijn Schouten (hkBst) wrote: |
2 |
> Sebastian Pipping wrote: |
3 |
>> I start to understand the real benefits of moving a larger |
4 |
>> part of the maintenance down to the distro level as you proposed. |
5 |
> |
6 |
>> Okay, let's add support for CPEs at distro package level |
7 |
>> and sync up and down with the central packagemap database. |
8 |
>> Please contact me for collaboration on sync scripts |
9 |
>> and "modeling" of details. |
10 |
> |
11 |
> Do we not already have enough information available to automatically determine |
12 |
> derived unique identifiers like CPE? |
13 |
> |
14 |
> We have the package homepage and the package name (and the package category) and |
15 |
> the combination should be enough information to do direct comparisons to data |
16 |
> gathered from other repos (assuming they also contain such data). |
17 |
|
18 |
You are asking a valid question. The homepage links can be a great |
19 |
helper in mapping and they have been of help already for the mapping |
20 |
of the first 1000 Gentoo packages in packagemap. |
21 |
|
22 |
However it might not be as easy you make it sound, as there are |
23 |
a few things that complicate things and produce extra work: |
24 |
|
25 |
- In many cases a project can be reached from several URLs. |
26 |
For a project on SF.net you might have |
27 |
- http://sf.net/projects/${name} |
28 |
- http://${name}.sf.net/ |
29 |
- http://www.${name}.org/ |
30 |
That case can be handled rather easily but there are many more |
31 |
special cases and a manual map may be required for stuff that's |
32 |
not hosted on a larger hosting site. |
33 |
|
34 |
- Split packages (think Git or Qt) may all have the same homepage. |
35 |
In Debian the source package might help there, in Gentoo you'd |
36 |
have to do common prefix detection or so, that's special |
37 |
cases again, and continuous review that it still does what you need. |
38 |
|
39 |
|
40 |
> For example you can determine automatically that gentoo:dev-scheme/gambit and |
41 |
> debian:gambc are the same package because although their names differ they have |
42 |
> the same homepage and share a category. |
43 |
|
44 |
To detect equal categories you need a map for categories for all |
45 |
participating distros. Yes, it's smaller than mapping all packages |
46 |
but it involves a manual map and keeping it in sync. |
47 |
|
48 |
Another word on homepage collisions: A few days before I wrote |
49 |
a script that builds a map from homepages to packagenames for the |
50 |
whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh). |
51 |
The generated table from my run was 12330 lines long, each line for |
52 |
a different package. |
53 |
|
54 |
If you run an analysis over that table you see that many |
55 |
homepages appear many more times than just once. |
56 |
Here's the top ten: |
57 |
|
58 |
68 http://www.gnome.org/ |
59 |
67 http://www.gentoo.org/ |
60 |
58 http://www.gentoo.org/proj/en/perl/ |
61 |
42 http://lingucomponent.openoffice.org/ |
62 |
26 http://www.kde.org/ |
63 |
25 http://www.gentoo.org |
64 |
20 http://sourceforge.net/projects/synce/ |
65 |
19 http://www.trolltech.com/ |
66 |
19 http://search.cpan.org/~rjbs/ |
67 |
18 http://opensuse.foehr-it.de/ |
68 |
|
69 |
The command I used is |
70 |
|
71 |
$ sed 's| *.*$||' homepage-to-package.txt \ |
72 |
| sort | uniq -c | sort -n -r | head -n 10 |
73 |
|
74 |
I think this three cases alone show that it would be |
75 |
- also a lot of work |
76 |
- be many special cases |
77 |
- still require manual mappings here and there |
78 |
|
79 |
Another disadvantage is the current static XML approach of |
80 |
packagemap is language independent. We can easily build |
81 |
tools for packagemap in any language that has an XML parser. |
82 |
If the data actually is the code we suddenly have to keep |
83 |
code from different languages in precise special case sync. |
84 |
|
85 |
I'm not sure if the approach you describe is less work in total. |
86 |
I guess to find out we'd have to do both in parallel :-) |
87 |
|
88 |
It could be interesting how much the list of homepages |
89 |
in say Debian packages and Gentoo packages overlap. |
90 |
|
91 |
|
92 |
|
93 |
Sebastian |