Gentoo Archives: gentoo-dev

From: Matt Turner <mattst88@g.o>
To: gentoo development <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation
Date: Tue, 05 May 2020 05:14:22
Message-Id: CAEdQ38H=6U0XrZc7eDY30Up47rrt+_=SmgMAkGyUUfNBfs=Jag@mail.gmail.com
In Reply to: Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation by Thomas Deutschmann
1 On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann <whissi@g.o> wrote:
2 >
3 > On 2020-04-26 15:46, Kent Fredric wrote:
4 > > On Sun, 26 Apr 2020 14:38:54 +0200
5 > > Thomas Deutschmann <whissi@g.o> wrote:
6 > >
7 > >> Let's assume we will get reports that app-misc/foo is only installed 20
8 > >> times. If you are going to judge based on this data, "Obviously, nobody
9 > >> is using that package, it's stuck on <whatever>... safe to remove" your
10 > >> view is biased:
11 > >
12 > > I see this as more like what bloom filters get you, but in reverse:
13 > >
14 > > [...]
15 > >
16 > > - But now, instead of having "we don't know if anybody uses this", you
17 > > *can* have a "we know for sure somebody uses this".
18 >
19 > But how does that information really help us to decide anything in the end?
20 >
21 > Case A, stats are showing 0 users:
22 >
23 > Like said, we can't know if this is true or if this package is only used
24 > in setups where people don't report stats.
25 >
26 >
27 > Case B, stats are showing x users:
28 >
29 > Now what? Package from case A could have similar users -- we just don't
30 > know. Assume firefox has 1.000 users, chromium has 500 users and vivaldi
31 > doesn't show up in stats. How does that help us? Would this allow us to
32 > skip publishing GLSAs for vivalid because we assume nobody in Gentoo is
33 > using vivaldi? Does it allow Python project to go forward pushing a mask
34 > for removal in case vivaldi would depend on Python version, Python
35 > project want to get rid of? Would this allow Gentoo PR to make a public
36 > statement like "Firefox is the most popular browser in Gentoo, twice as
37 > users as chromium"?
38
39 I hate the saying "the perfect is the enemy of the good" but I think
40 it applies here.
41
42 You're of course correct that we would not have perfect information.
43 But the thing about statistics is that you can still know some things
44 based on a sampling of that perfect information.
45
46 I would personally like to have data on whether users of my packages
47 have certain USE flags enabled. Knowing that would allow me to decide
48 whether its worth the maintenance burden of supporting features that I
49 *think* are very rarely used. If instead the data showed me that 50%
50 of users had IUSE=xyz enabled, I probably wouldn't consider removing
51 it.
52
53 I think your example of potential misuse of data is a bit over dramatic.

Replies

Subject Author
Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation Alec Warner <antarus@g.o>