1 |
On Sun, Nov 30, 2008 at 3:12 PM, Diego 'Flameeyes' Pettenò |
2 |
<flameeyes@×××××.com> wrote: |
3 |
> "Alec Warner" <antarus@g.o> writes: |
4 |
> |
5 |
>> Diego, What are the concrete benefits of your proposal? |
6 |
> |
7 |
> As I said: |
8 |
> |
9 |
> - no need to replicate homepage data between versions; even though forks |
10 |
> can change homepage, I would expect that to at worse split in two a |
11 |
> package, or have to be different by slot, like Java; |
12 |
> - allows proper handling of packages lacking a HOMEPAGE; |
13 |
> - less data in metadata cache; |
14 |
> - users can check the metadata much more easily by just opening the xml |
15 |
> file or interfacing to that rather than having to skim through the |
16 |
> ebuild, the xml files are probably more user readable then ebuilds |
17 |
> using multiple eclasses; |
18 |
> - displaying info about the package does not require parsing the full |
19 |
> ebuild file, with its eclasses; |
20 |
> - extensible to provide more links than just the homepage (forums, |
21 |
> trackers, gentoo-specific documentation, ...); |
22 |
> - if we also move DESCRIPTION, search software can ignore everything |
23 |
> about ebuild parsing, and just use the metadata.xml files; considering |
24 |
> how many people actually use or used eix, it would make sense to allow |
25 |
> third-party applications to be able to search through the tree; |
26 |
> - webapps like packages.gentoo.org would be able to display basic |
27 |
> information without having to parse the ebuilds or the metadata cache. |
28 |
> - as much as people might think metadata is easier to parse than |
29 |
> anything, XML has one huge advantage: there are plently of parsers for |
30 |
> any language without having to actually write one, even as easy as it |
31 |
> can be, and it's easily interfaced with anything; I wrote a simple XSL |
32 |
> file that outputs the basic metadata details for packages without |
33 |
> having any parser or executable code but xsltproc (or any other XSLT |
34 |
> software), correlating data with herds.xml too; |
35 |
> - it really is metadata, and it makes very little sense to need parsing |
36 |
> of eclasses and EAPI handling to get some data from a package that is |
37 |
> non-functional in nature and free form (just like DESCRIPTION, and |
38 |
> unlike LICENSE like Alec said), and that changes at worse once each |
39 |
> slot (unlike LICENSE that can change at any given version). |
40 |
> |
41 |
> Disadvantages: |
42 |
> |
43 |
> - it requires user-interface software to parse metadata.xml to show |
44 |
> data for a package; which is already needed to show per-package USE |
45 |
> flag meaning; |
46 |
> |
47 |
> General points: |
48 |
> |
49 |
> - it does not solve unrelated problems like code replication; |
50 |
> |
51 |
> Can someone come up with any other point beside "I don't like XML" |
52 |
> (which I already said is a puny answer) and "it can theorically be 10 |
53 |
> different homepages for 10 different versions" (which I have sincerely |
54 |
> some beef with myself since if you fork a software you might as well |
55 |
> change its name)? |
56 |
> |
57 |
> As I said, moving out the HOMEPAGE field from a package manager |
58 |
> prospective is non functional; if you're showing to the user some data |
59 |
> about a package you might as well show as much as you can, like long |
60 |
> descriptions, other links, and USE flags. And the fact that you can ask |
61 |
> the package manager for something is for me not a valid reason to avoi |
62 |
> moving something in a more approchable place for other software. |
63 |
|
64 |
Ciaran covered most of my points already. |
65 |
|
66 |
Third party programs should not parse ebuilds and eclasses by hand. |
67 |
I'd expect half of them to get it wrong if they tried. |
68 |
Ebuild parsing is hard, that is why we have three complex software |
69 |
packages that for the most part do it properly. |
70 |
|
71 |
Why is 'ask the package manager' an invalid reason to not making |
72 |
something more accessible? |
73 |
How accessible must this data be? |
74 |
|
75 |
Writing an XML parser is not accessible enough (for me), we should |
76 |
just put it in plain text on the hard drive, perhaps in |
77 |
"/var/cache/edb/dep/${PORTDIR}/$C/$PV" |
78 |
|
79 |
Oh wait, we do that already[1]. |
80 |
|
81 |
So really this is where I'm confused. |
82 |
If third parties are using the package manager APIs to get at this |
83 |
data; the only rationale to move it out of ebuilds is: |
84 |
|
85 |
- Space savings. Certainly your scheme may be smaller, but the XML |
86 |
tag overhead may eat into the savings. You should do some estimates |
87 |
to show the community how much smaller the tree will be from this |
88 |
proposal. |
89 |
|
90 |
Randomly looking: |
91 |
|
92 |
cd /var/cache/edb/dep/usr/portage |
93 |
grep -hR HOMEPGE | wc -m |
94 |
yields 1.1million characters. Each character is 1 byte (is that so in UTF8?) |
95 |
So at best you could save the 1.2GB tree 2.2 million bytes (about 2 |
96 |
megs) if your scheme was (more than) 100% efficient. |
97 |
The extra 1.1 million characters comes from the space freed in the |
98 |
cache (since we don't cache metadata.xml). |
99 |
|
100 |
2 megs into 1200 megs is.. ".166666%" of the tree. As I thought, not |
101 |
very compelling. |
102 |
|
103 |
Looking at DESCRIPTION: |
104 |
|
105 |
grep -hR DESCRIPTION | wc -m |
106 |
yields ~1.5 million characters. Nice! |
107 |
|
108 |
So if we purge that from the cache and replace it with a (more than) |
109 |
100% efficient metadata.xml solution we could save: 3 megs |
110 |
|
111 |
3 megs saved + 2 megs saved = 5 megs saved. 5 / 1200 = .416666% of |
112 |
the tree. Still again not very compelling. |
113 |
|
114 |
- Extra Tags. Extending HOMEPAGE is harder than changing |
115 |
metadata.xml, which I imagine is part of the reason why you proposed |
116 |
it. |
117 |
It will be until EAPI3 at least until we can get the HOMEPAGE tags |
118 |
in ebuilds implemented and then we have to bump affected ebuilds |
119 |
to EAPI3. |
120 |
|
121 |
However if we drop the 'extra tags' bit then the only reason to move |
122 |
the data is space, and I imagine the space savings will not be |
123 |
compelling; but feel free to prove me wrong. |
124 |
|
125 |
[1] For ebuilds that have cache entries, using the default cache |
126 |
implementation for portage. |
127 |
|
128 |
> |
129 |
> -- |
130 |
> Diego "Flameeyes" Pettenò |
131 |
> http://blog.flameeyes.eu/ |
132 |
> |