1 |
On Sat, Jun 25, 2011 at 02:49, Kent Fredric <kentfredric@×××××.com> wrote: |
2 |
> I'm strongly of the mind that by making the tag system arbitrarily |
3 |
> flat, you might be prematurely limiting yourself, as well as risking a |
4 |
> future where the "tag index" is a sea of meaningless words. |
5 |
> |
6 |
> Tags in my mind, should be grouped by the sort of information they are |
7 |
> trying to convey, as opposed to being arbitrary and completely |
8 |
> un-grouped. |
9 |
> |
10 |
> The present category system only has one namespace, which is more or |
11 |
> less "what-you-use-it-for", and if your tag system is likewise going |
12 |
> to take that vector as the only approach, you will ultimately end up |
13 |
> duplicating the category system, albeit without the present limitation |
14 |
> that means one package can only exist in one place. |
15 |
> |
16 |
> This need not be the case, we can suggest alternative tag namespaces, |
17 |
> such as : The sorts of files it supports working with, the sorts of |
18 |
> things it can read, the sorts of things it can write. |
19 |
> |
20 |
> At present, things that migrate one type of media to another, such as |
21 |
> pdf -> image , image -> pdf, image -> video , video -> images , etc |
22 |
> have to be forced to a sort of useless categorisation system. |
23 |
> |
24 |
> However, if via tag data, we were able to annotate a) what can be |
25 |
> written and b) what can be read, this system could be leveraged to |
26 |
> epic proportions of win. |
27 |
> |
28 |
Okay, apologies in advance for my long-windedness. I hope this all |
29 |
makes sense to everyone. |
30 |
|
31 |
I should probably clarify that cloying strictly to flatness is not |
32 |
what I'm proposing. Reality has borne out the need for implications |
33 |
and aliases in sanitising an unruly dataset with a complex |
34 |
user-generated index, while arbitrary democratised group building has |
35 |
improved some aspects of discovery. However, I would consider these |
36 |
features to be a lower priority than having a system at all. |
37 |
|
38 |
So to break it down: |
39 |
Tags - a concise vocabulary used for search. In their default state |
40 |
they are untyped and non-hierarchical. They identify traits of a |
41 |
package. Suggest using lower-case and simple, descriptive naming |
42 |
conventions. Highest priority. |
43 |
Example: alien {{converter nogui package_management reads_tgz |
44 |
reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm |
45 |
writes_pkg writes_slp writes_lsb}} |
46 |
|
47 |
Alias - a relationship between two tags establishing equivalence. |
48 |
Query of the left term returns results of the right. This type of |
49 |
relationship helps reduce dictionary clutter. Low priority. |
50 |
Example: sound = audio. Attempting to add "sound" to a package will |
51 |
instead add "audio" and searches for sound will return the results for |
52 |
audio. |
53 |
|
54 |
Implication - a relationship between two tags where the presence of |
55 |
the left term necessarily requires the right. This relationship |
56 |
reduces menial work. Low priority. |
57 |
Example: mpd -> audio. Adding "mpd" to the package will also add "audio". |
58 |
|
59 |
Kent, your idea is pretty interesting and I rather like it. |
60 |
Fortunately, it's completely possible within the context of the basic |
61 |
flat layout, as I outlined with Alien above. It probably looks ugly |
62 |
to you-- this is no illusion; it's pretty ugly. But it also grants us |
63 |
the flexibility to get a basic system in place quickly and without a |
64 |
lot of hassle. We get 90% of the benefit up front, and can extend it |
65 |
as necessary. |
66 |
|
67 |
Unfortunately for "real" hierarchical methods, people still have |
68 |
difficulty with even simple metadata systems. Fetch some MP3s off the |
69 |
internet and check their tags or look at search engine queries and |
70 |
you'll find an entire class of people hampered by what is currently a |
71 |
largely alien art. In the end, this system needs to be usable by |
72 |
people and by keeping it primarily flat, we ease the conceptual |
73 |
overhead of its implementation and its use. If it can't be |
74 |
implemented on itch-scratching timescales, we have failed. If people |
75 |
can't use it with very little learning curve, we have failed. |
76 |
|
77 |
A word on vocabulary: |
78 |
As you've no doubt noticed, there seems to be a degree of combinatoric |
79 |
explosion of tags in the method I propose. In practical use, it's not |
80 |
as bad as it looks. For Gentoo, I'd recommend a basic "canonical" |
81 |
list of general tags based on the current category system (subject to |
82 |
discussion and addition/subtraction) and incorporate suggestions like |
83 |
Kent's as they come up. It's okay to control the vocabulary. What |
84 |
you find is that after the initial implementation, it grows fairly |
85 |
slowly. (Even with reads_* and writes_* the number will probably be |
86 |
south of 500 tags for a long time; the current categories dissolve |
87 |
into about 175 tags from what I can see.) |
88 |
|
89 |
Regards, |
90 |
Wyatt |