1 |
On Sun, 26 Apr 2020 10:52:27 +0200 |
2 |
Michał Górny <mgorny@g.o> wrote: |
3 |
|
4 |
> Do you have any other idea for spam protection then? |
5 |
|
6 |
What is the realistic risk here for spamming? |
7 |
|
8 |
If the record is well formed, and pertains to known packages, the worst |
9 |
I currently imagine is astroturfing: A single individual attempting to |
10 |
make a package seem more popular than it is. |
11 |
|
12 |
Just generally IME, spamming aims to make a buck somehow, but if |
13 |
there's no fields in the data set that can be used for this, and abuse |
14 |
of existing fields to fill with spam prose get filtered by not |
15 |
correlating to any known possible values, then the entire record is |
16 |
simply invalid, and can be removed on that basis. |
17 |
|
18 |
Conceptually, you could have a report with |
19 |
"dev-foo/plz-sir-halp-me-I-have-money-and-an-a-nigerian-prince::nigeria-prince", |
20 |
but for anybody to see that they'd have to be querying data about the |
21 |
::nigeria-prince overlay, and that's assuming we even show data about |
22 |
overlays we can't locate. |
23 |
|
24 |
Trolling ::gentoo with packages that don't exist seems easy to eliminate. |
25 |
|
26 |
I don't like that astroturfing could be a thing ... but like, I also |
27 |
don't really care about that happening. |
28 |
|
29 |
For instance, crates.io has per-crate and per-crate-version download |
30 |
statistics. |
31 |
|
32 |
That's super easy to rig, you get lots of spiky noise in infrequently |
33 |
used packages simply due to various automated services fetching things. |
34 |
|
35 |
But at scale, the data still turns out to be quasi-useful, as it allows |
36 |
you to chart adoption and migration... because as soon as a new version |
37 |
gets shipped, if people are using it, then you'll start to see an |
38 |
uptick in reports from the new version. |
39 |
|
40 |
The "change" and "change response" information is very useful, and a |
41 |
very odd target for astroturfing. |
42 |
|
43 |
I for one would be greatly interested in "new perl version shipped, |
44 |
explosion of results due to people upgrading", because then I can gauge |
45 |
roughly how many people managed to upgrade perl without having to join |
46 |
#gentoo and cry about it being broken. |
47 |
|
48 |
(We could also designate a certain UUID flag for use by Gentoo infra, |
49 |
possibly even a UUID-per-host, the results of which were invisible in |
50 |
the public data, but still visible to people with approved perms, |
51 |
because we really do value the ability to know which packages we have |
52 |
to be careful about causing problems in, and where infra is at with |
53 |
upgrading various things before we remove the versions infra is using, |
54 |
whereas currently, working out what infra are currently running |
55 |
requires lots of direct communication) |