1 |
On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann <whissi@g.o> wrote: |
2 |
> |
3 |
> On 2020-04-26 15:46, Kent Fredric wrote: |
4 |
> > On Sun, 26 Apr 2020 14:38:54 +0200 |
5 |
> > Thomas Deutschmann <whissi@g.o> wrote: |
6 |
> > |
7 |
> >> Let's assume we will get reports that app-misc/foo is only installed 20 |
8 |
> >> times. If you are going to judge based on this data, "Obviously, nobody |
9 |
> >> is using that package, it's stuck on <whatever>... safe to remove" your |
10 |
> >> view is biased: |
11 |
> > |
12 |
> > I see this as more like what bloom filters get you, but in reverse: |
13 |
> > |
14 |
> > [...] |
15 |
> > |
16 |
> > - But now, instead of having "we don't know if anybody uses this", you |
17 |
> > *can* have a "we know for sure somebody uses this". |
18 |
> |
19 |
> But how does that information really help us to decide anything in the end? |
20 |
> |
21 |
> Case A, stats are showing 0 users: |
22 |
> |
23 |
> Like said, we can't know if this is true or if this package is only used |
24 |
> in setups where people don't report stats. |
25 |
> |
26 |
> |
27 |
> Case B, stats are showing x users: |
28 |
> |
29 |
> Now what? Package from case A could have similar users -- we just don't |
30 |
> know. Assume firefox has 1.000 users, chromium has 500 users and vivaldi |
31 |
> doesn't show up in stats. How does that help us? Would this allow us to |
32 |
> skip publishing GLSAs for vivalid because we assume nobody in Gentoo is |
33 |
> using vivaldi? Does it allow Python project to go forward pushing a mask |
34 |
> for removal in case vivaldi would depend on Python version, Python |
35 |
> project want to get rid of? Would this allow Gentoo PR to make a public |
36 |
> statement like "Firefox is the most popular browser in Gentoo, twice as |
37 |
> users as chromium"? |
38 |
|
39 |
I hate the saying "the perfect is the enemy of the good" but I think |
40 |
it applies here. |
41 |
|
42 |
You're of course correct that we would not have perfect information. |
43 |
But the thing about statistics is that you can still know some things |
44 |
based on a sampling of that perfect information. |
45 |
|
46 |
I would personally like to have data on whether users of my packages |
47 |
have certain USE flags enabled. Knowing that would allow me to decide |
48 |
whether its worth the maintenance burden of supporting features that I |
49 |
*think* are very rarely used. If instead the data showed me that 50% |
50 |
of users had IUSE=xyz enabled, I probably wouldn't consider removing |
51 |
it. |
52 |
|
53 |
I think your example of potential misuse of data is a bit over dramatic. |