Gentoo Archives: gentoo-dev

From: "Kevin F. Quinn" <ml@××××××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Re: New category proposal
Date: Thu, 12 May 2005 11:33:48
Message-Id: KUK88.19874BTC@kevquinn.com
In Reply to: Re: [gentoo-dev] Re: New category proposal by Brian Harring
1 Brian Harring wrote:
2 > > The layout on disk and the semantics of categories do not need to be > > related.
3 > Yes and no. You're assuming that people don't use the layout on
4 > disk for digging around without calling portage. Personally, I do.
5
6 Sometimes I do the same; but other times I find the layout a barrier. Many's the time I've done:
7
8 $ ls -d /usr/portage/*/<package name pattern>
9
10 to find a package, for example - that indicates the categories are actually hindering searches in this case. Incidentally it also treats the tree as if it were a flat namespace.
11
12 However, ideally I wouldn't be searching the tree directly like that at all, I'd be searching metadata based on various criteria. Indeed package names as they stand are frequently uninformative; if you decide you need something for a particular function, you can have a look in what you think may be the relevant categories, only to find a list of mostly meaningless package names. Then you start grepping the DESCRIPTIONs, and so on finally trying equery. This whole process is rather unsatisfactory, and in my experience often fruitless. Many times I've gone the other way; google for something to find candidate package names, then 'ls -d /usr/portage/*/<name>' to see if some kind soul has already added an ebuild to the tree.
13
14 For me, the whole point of a flat namespace is to _remove_ categories from the atom. Obviously this has far-reaching disruptive consequences as you describe, and in practice is not workable in the short to medium term at least.
15
16 I'd like to be able to ask questions like, "what app-text packages exist for <some function>?". At the moment, listing app-text, grepping app-text/*/*ebuild may get somewhere, but what about packages placed in different categories for reasons like name clash, other functionality and so on?
17
18 Cieran McCreesh wrote:
19 > So we end up not using upstream naming, leading to major hassle with
20 > tarballs, major user confusion and inconsistent naming (why are some vim
21 > things vim- and others not?). Bad! Now that portage *tells* you when you
22 > need to be more specific, there's no problem with name matches.
23
24 I agree maintaing upstream naming is very important. However obviously upstream names can and do clash. That raises the question of how such clashes should be resolved. Categories are a rather arbitrary way of doing that - it's quite possible that a clash could occur between two packages that naturally fall into the same category - in the current system that means one of the packages gets dumped in a second-choice category.
25
26 Talking atoms, one could handle clashes by differentiating occurrences with an extension to the name. To take the sudo example, sudo could be the normal sudo, sudo:vim (or perhaps sudo__vim to be acceptable to more filesystems) could be the vim extension sudo.
27
28 Brian Harring wrote:
29 > Re-asserting that the fs layout *does* matter, how is that more intuitive > when trying
30 > to track down the ebuild for dev-util/diffball ? How many directories > deep would I have to go before I reached the ebuild?
31
32 $ ls -d /usr/portage/*/<name pattern>
33
34 becomes
35
36 $ find /usr/portage -type d -name <name pattern> -print
37
38 and for quick&dirty things like
39
40 $ grep -l <pattern> /usr/portage/*/<name pattern>/*ebuild
41
42 instead do:
43
44 $ find /usr/portage -type d -name <name pattern> \
45 -exec grep -l <pattern> \{\}/*ebuild \;
46
47 or somesuch.
48
49
50 An interesting possibility is that the portage mirrors and clients can have different layouts depending what is most suitable. Those with reiserfs could sensibly choose the very wide layout. Others on ext2 could choose a s/u/sudo approach to avoid problems with very wide directories. Obviously this means modifying the sync process somewhat deal with this, but it's quite possible, in a scalable efficient manner.
51
52 Brian Harring wrote:
53 > > The key here is to separate the category (metadata) and filesystem [snip]
54 > This also locks out several possibilities, like relying on dir structure > to limit the searches.
55 > You force category classification to be metadata, you need an additional > db to do searching,
56 > and basic atom lookup. That's 19000+ keys in a db. No db, and you force > a tree wide search, which _will_ be as fast as emerge -S is.
57
58 If you retain category in the atom; for me there's no point flattening the namespace without removing the category completely from the atom.
59
60 Where at the moment you perhaps want to do:
61
62 $ grep <pattern> /usr/portage/app-text/*/*ebuild
63
64 then yes, an additional db of some kind is necessary, or perhaps a more efficient way of searching the metadata.xml files. However I disagree with the 19000+ keys. Portage could for example maintain a simple category->package name mapping - only needs to be updated when packages are added/removed from the tree or metadata is changed, and can be trivial. For example, it could be a simple shell script with entries like:
65
66 PC_<category>=<name> <name> <name>
67
68 at which point you only need to do:
69
70 $ source <category db>
71 $ for pkg in ${PC_<category>}; do ... ; done
72
73 Brian Harring wrote:
74 > cpvs can't conflict, pure and simple under the current
75 > layout, which is
76 > enforce by the single category/fs layout.
77
78 cpvs can't conflict because when a package name already exists in a category, a conflicting package name has to go into a different category even if it's not the most natural category for the package. What you've done there, is assert a rule (cpvs are unique) thus
79
80 Brian Harring wrote:
81 > What are we gaining? Ability to find a package under two categories?
82
83 That, and stability of package location. Moving packages around the tree is disruptive, not just to ebuilds that reference them but also cause unnecessary mirror activity.
84
85 For me, categories are a search criteria. Making them part of the tree makes it difficult to revise those criteria.
86
87 Brian Harring wrote:
88 > > The benefits include
89 > > 1) no more "moving packages around the tree"
90 > cpv conflict. You aren't moving the fs position of it, but it still
91 > requires walking the tree and updating all atom's that reference the old > position.
92 The point is that *DEPEND would not mention the category.
93
94 Brian Harring wrote:
95 > > 2) categories can be added to a package in the most natural way
96 > Elaborate.
97
98 The idea is that packages can naturally belong in more than one category. Thinking of categories more like search keywords, if you like. A package that processes text would match app-text, but perhaps it's also a financial tool which would therefore also match app-finance.
99
100 Another good example of the usefulness of more than one category are the sys-* categories, where all the packages in sys-* categories naturally fall both into their sys- category but also the relevant non-sys category. Take GCC; currently in sys-devel/gcc, not in dev-lang/gcc which is where a naive user would look for it. With multiple category markings, it could be in both.
101
102 Brian Harring wrote:
103 > > 3) overlays can be tidier
104 > Eh?
105
106 This is a result of the dynamic s/u/sudo approach where the directory depth is arbitrary. In the overlay you could drop the s/u/ bit. I'd guess most overlays modify relatively few packages; I know I have a bunch of categories in my overlay that only contain one package. Given that portage would take a top-down search approach to locate the package (i.e. try sudo, then s/sudo, then s/u/sudo ... first in overlay then in the mirror) this works transparently.
107
108 Brian Harring wrote:
109 > What do we gain from a flat namespace?
110
111 Eliminating categories from package names
112
113 > Right now, I can infer an atom out of a DEPEND string's purpose to
114 > some degree, based upon it's category.
115
116 You could use this argument for appending the description to the atom, but noone would suggest such a thing seriously. What you're justifying, is building metadata into the package name.
117
118 > To head off the "well you
119 > don't need to know the category, you should know the packages
120 > intentions if you're modifying the ebuild", that dodges the point; via > the category portion of an atom, I can infer at least -intention- of a > package.
121
122 To be more accurate, you can infer an aspect of the intention of a package that the original committer felt was most important whilst avoiding clashes. That's the point - by forcing a package to be a member of exactly one category, the implications from category membership are limited.
123
124
125 I'm the first to admit that doing the changes to the fs layout I've talked about are hugely disruptive, and as such are not sensible, most especially in the short to medium term. This discussion however does serve to understand the problem properly before making any changes. I think adding categories to metadata.xml, removing the few clashes (but otherwise leaving the fs layout as it is), and coming up with an efficient search tool (e.g. getting portage to maintain something like the script I mentioned above, or creating a widget to build it from the metadata.xml files) could eliminate the primary problem of moving packages around, and the arguments like should a package be in dev-cpp or dev-libs. The rule could then be that once a package is in a physical category in the tree then it will not physically move, no matter what. *DEPEND would continue to use the physical category, at least in the short term - it could ultimately drop the category if that becomes sensible. Changing the few existing clashing names could be undertaken gradually (e.g. appending :<differentiator> as describe above), to allow clashing names to belong to the same category.
126
127
128 This is quite benign and relatively painless. Ultimately you have a flat namespace, packages will no longer move inside the fs tree, the old q&d ls/grep tricks to try to find suitable packages would work as well as they do now, arguments about which category to place a package disappear, searches using category can become more intuitive, different packages that have the same upstream name can be members of the same category.
129
130 Kev.
131
132
133
134 --
135 gentoo-dev@g.o mailing list