Gentoo Archives: gentoo-dev

From: Kent Fredric <kentnl@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation
Date: Sun, 26 Apr 2020 14:12:07
Message-Id: 20200427021151.161c9a62@katipo2.lan
In Reply to: Re: [gentoo-dev] [RFC] Ideas for gentoostats implementation by "Michał Górny"
1 On Sun, 26 Apr 2020 10:52:27 +0200
2 Michał Górny <mgorny@g.o> wrote:
3
4 > Do you have any other idea for spam protection then?
5
6 What is the realistic risk here for spamming?
7
8 If the record is well formed, and pertains to known packages, the worst
9 I currently imagine is astroturfing: A single individual attempting to
10 make a package seem more popular than it is.
11
12 Just generally IME, spamming aims to make a buck somehow, but if
13 there's no fields in the data set that can be used for this, and abuse
14 of existing fields to fill with spam prose get filtered by not
15 correlating to any known possible values, then the entire record is
16 simply invalid, and can be removed on that basis.
17
18 Conceptually, you could have a report with
19 "dev-foo/plz-sir-halp-me-I-have-money-and-an-a-nigerian-prince::nigeria-prince",
20 but for anybody to see that they'd have to be querying data about the
21 ::nigeria-prince overlay, and that's assuming we even show data about
22 overlays we can't locate.
23
24 Trolling ::gentoo with packages that don't exist seems easy to eliminate.
25
26 I don't like that astroturfing could be a thing ... but like, I also
27 don't really care about that happening.
28
29 For instance, crates.io has per-crate and per-crate-version download
30 statistics.
31
32 That's super easy to rig, you get lots of spiky noise in infrequently
33 used packages simply due to various automated services fetching things.
34
35 But at scale, the data still turns out to be quasi-useful, as it allows
36 you to chart adoption and migration... because as soon as a new version
37 gets shipped, if people are using it, then you'll start to see an
38 uptick in reports from the new version.
39
40 The "change" and "change response" information is very useful, and a
41 very odd target for astroturfing.
42
43 I for one would be greatly interested in "new perl version shipped,
44 explosion of results due to people upgrading", because then I can gauge
45 roughly how many people managed to upgrade perl without having to join
46 #gentoo and cry about it being broken.
47
48 (We could also designate a certain UUID flag for use by Gentoo infra,
49 possibly even a UUID-per-host, the results of which were invisible in
50 the public data, but still visible to people with approved perms,
51 because we really do value the ability to know which packages we have
52 to be careful about causing problems in, and where infra is at with
53 upgrading various things before we remove the versions infra is using,
54 whereas currently, working out what infra are currently running
55 requires lots of direct communication)