Gentoo Archives: gentoo-dev

From: Wyatt Epp <wyatt.epp@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
Date: Sat, 25 Jun 2011 10:48:30
Message-Id: BANLkTimHg946DNNmMny2kPB21n2-o8UZng@mail.gmail.com
In Reply to: Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category) by Kent Fredric
1 On Sat, Jun 25, 2011 at 02:49, Kent Fredric <kentfredric@×××××.com> wrote:
2 > I'm strongly of the mind that by making the tag system arbitrarily
3 > flat, you might be prematurely limiting yourself, as well as risking a
4 > future where the "tag index" is a sea of meaningless words.
5 >
6 > Tags in my mind, should be grouped by the sort of information they are
7 > trying to convey, as opposed to being arbitrary and completely
8 > un-grouped.
9 >
10 > The present category system only has one namespace, which is more or
11 > less "what-you-use-it-for", and if your tag system is likewise going
12 > to take that vector as the only approach, you will ultimately end up
13 > duplicating the category system, albeit without the present limitation
14 > that means one package can only exist in one place.
15 >
16 > This need not be the case, we can suggest alternative tag namespaces,
17 > such as : The sorts of files it supports working with, the sorts of
18 > things it can read, the sorts of things it can write.
19 >
20 > At present, things that migrate one type of media to another, such as
21 > pdf -> image , image -> pdf, image -> video , video -> images , etc
22 > have to be forced to a sort of useless categorisation system.
23 >
24 > However, if via tag data, we were able to annotate a) what can be
25 > written and b) what can be read, this system could be leveraged to
26 > epic proportions of win.
27 >
28 Okay, apologies in advance for my long-windedness. I hope this all
29 makes sense to everyone.
30
31 I should probably clarify that cloying strictly to flatness is not
32 what I'm proposing. Reality has borne out the need for implications
33 and aliases in sanitising an unruly dataset with a complex
34 user-generated index, while arbitrary democratised group building has
35 improved some aspects of discovery. However, I would consider these
36 features to be a lower priority than having a system at all.
37
38 So to break it down:
39 Tags - a concise vocabulary used for search. In their default state
40 they are untyped and non-hierarchical. They identify traits of a
41 package. Suggest using lower-case and simple, descriptive naming
42 conventions. Highest priority.
43 Example: alien {{converter nogui package_management reads_tgz
44 reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm
45 writes_pkg writes_slp writes_lsb}}
46
47 Alias - a relationship between two tags establishing equivalence.
48 Query of the left term returns results of the right. This type of
49 relationship helps reduce dictionary clutter. Low priority.
50 Example: sound = audio. Attempting to add "sound" to a package will
51 instead add "audio" and searches for sound will return the results for
52 audio.
53
54 Implication - a relationship between two tags where the presence of
55 the left term necessarily requires the right. This relationship
56 reduces menial work. Low priority.
57 Example: mpd -> audio. Adding "mpd" to the package will also add "audio".
58
59 Kent, your idea is pretty interesting and I rather like it.
60 Fortunately, it's completely possible within the context of the basic
61 flat layout, as I outlined with Alien above. It probably looks ugly
62 to you-- this is no illusion; it's pretty ugly. But it also grants us
63 the flexibility to get a basic system in place quickly and without a
64 lot of hassle. We get 90% of the benefit up front, and can extend it
65 as necessary.
66
67 Unfortunately for "real" hierarchical methods, people still have
68 difficulty with even simple metadata systems. Fetch some MP3s off the
69 internet and check their tags or look at search engine queries and
70 you'll find an entire class of people hampered by what is currently a
71 largely alien art. In the end, this system needs to be usable by
72 people and by keeping it primarily flat, we ease the conceptual
73 overhead of its implementation and its use. If it can't be
74 implemented on itch-scratching timescales, we have failed. If people
75 can't use it with very little learning curve, we have failed.
76
77 A word on vocabulary:
78 As you've no doubt noticed, there seems to be a degree of combinatoric
79 explosion of tags in the method I propose. In practical use, it's not
80 as bad as it looks. For Gentoo, I'd recommend a basic "canonical"
81 list of general tags based on the current category system (subject to
82 discussion and addition/subtraction) and incorporate suggestions like
83 Kent's as they come up. It's okay to control the vocabulary. What
84 you find is that after the initial implementation, it grows fairly
85 slowly. (Even with reads_* and writes_* the number will probably be
86 south of 500 tags for a long time; the current categories dissolve
87 into about 175 tags from what I can see.)
88
89 Regards,
90 Wyatt