Gentoo Archives: gentoo-dev

From: Sebastian Pipping <webmaster@××××××××.org>
To: PackageKit users and developers list <packagekit@×××××××××××××××××.org>
Cc: gentoo-dev@l.g.o, Paul Wise <pabs@××××××.org>, Christian Faulhammer <fauli@g.o>, "Petteri Räty" <betelgeuse@g.o>, Robert Buchholz <rbu@g.o>
Subject: Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
Date: Thu, 18 Jun 2009 00:09:37
Message-Id: 4A3985BD.4040005@hartwork.org
In Reply to: Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" by "Marijn Schouten (hkBst)"
1 Marijn Schouten (hkBst) wrote:
2 > Sebastian Pipping wrote:
3 >> I start to understand the real benefits of moving a larger
4 >> part of the maintenance down to the distro level as you proposed.
5 >
6 >> Okay, let's add support for CPEs at distro package level
7 >> and sync up and down with the central packagemap database.
8 >> Please contact me for collaboration on sync scripts
9 >> and "modeling" of details.
10 >
11 > Do we not already have enough information available to automatically determine
12 > derived unique identifiers like CPE?
13 >
14 > We have the package homepage and the package name (and the package category) and
15 > the combination should be enough information to do direct comparisons to data
16 > gathered from other repos (assuming they also contain such data).
17
18 You are asking a valid question. The homepage links can be a great
19 helper in mapping and they have been of help already for the mapping
20 of the first 1000 Gentoo packages in packagemap.
21
22 However it might not be as easy you make it sound, as there are
23 a few things that complicate things and produce extra work:
24
25 - In many cases a project can be reached from several URLs.
26 For a project on SF.net you might have
27 - http://sf.net/projects/${name}
28 - http://${name}.sf.net/
29 - http://www.${name}.org/
30 That case can be handled rather easily but there are many more
31 special cases and a manual map may be required for stuff that's
32 not hosted on a larger hosting site.
33
34 - Split packages (think Git or Qt) may all have the same homepage.
35 In Debian the source package might help there, in Gentoo you'd
36 have to do common prefix detection or so, that's special
37 cases again, and continuous review that it still does what you need.
38
39
40 > For example you can determine automatically that gentoo:dev-scheme/gambit and
41 > debian:gambc are the same package because although their names differ they have
42 > the same homepage and share a category.
43
44 To detect equal categories you need a map for categories for all
45 participating distros. Yes, it's smaller than mapping all packages
46 but it involves a manual map and keeping it in sync.
47
48 Another word on homepage collisions: A few days before I wrote
49 a script that builds a map from homepages to packagenames for the
50 whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh).
51 The generated table from my run was 12330 lines long, each line for
52 a different package.
53
54 If you run an analysis over that table you see that many
55 homepages appear many more times than just once.
56 Here's the top ten:
57
58 68 http://www.gnome.org/
59 67 http://www.gentoo.org/
60 58 http://www.gentoo.org/proj/en/perl/
61 42 http://lingucomponent.openoffice.org/
62 26 http://www.kde.org/
63 25 http://www.gentoo.org
64 20 http://sourceforge.net/projects/synce/
65 19 http://www.trolltech.com/
66 19 http://search.cpan.org/~rjbs/
67 18 http://opensuse.foehr-it.de/
68
69 The command I used is
70
71 $ sed 's| *.*$||' homepage-to-package.txt \
72 | sort | uniq -c | sort -n -r | head -n 10
73
74 I think this three cases alone show that it would be
75 - also a lot of work
76 - be many special cases
77 - still require manual mappings here and there
78
79 Another disadvantage is the current static XML approach of
80 packagemap is language independent. We can easily build
81 tools for packagemap in any language that has an XML parser.
82 If the data actually is the code we suddenly have to keep
83 code from different languages in precise special case sync.
84
85 I'm not sure if the approach you describe is less work in total.
86 I guess to find out we'd have to do both in parallel :-)
87
88 It could be interesting how much the list of homepages
89 in say Debian packages and Gentoo packages overlap.
90
91
92
93 Sebastian

Replies

Subject Author
Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" "Marijn Schouten (hkBst)" <hkBst@g.o>