Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev <gentoo-dev@l.g.o>
Subject: [gentoo-dev] [RFC] Anti-spam for goose
Date: Thu, 21 May 2020 08:47:16
Message-Id: 496f9d713dc1d890d8af717c77429faac20912e1.camel@gentoo.org
1 Hi,
2
3 TL;DR: I'm looking for opinions on how to protect goose from spam,
4 i.e. mass fake submissions.
5
6
7 Problem
8 =======
9 Goose currently lacks proper limiting of submitted data. The only
10 limiter currently in place is based on unique submitter id that is
11 randomly generated at setup time and in full control of the submitter.
12 This only protects against accidental duplicates but it can't protect
13 against deliberate action.
14
15 An attacker could easily submit thousands (millions?) of fake entries by
16 issuing a lot of requests with different ids. Creating them is
17 as trivial as using successive numbers. The potential damage includes:
18
19 - distorting the metrics to the point of it being useless (even though
20 some people consider it useless by design).
21
22 - submitting lots of arbitrary data to cause DoS via growing
23 the database until no disk space is left.
24
25 - blocking large range of valid user ids, causing collisions with
26 legitimate users more likely.
27
28 I don't think it worthwhile to discuss the motivation for doing so:
29 whether it would be someone wishing harm to Gentoo, disagreeing with
30 the project or merely wanting to try and see if it would work. The case
31 of SKS keyservers teaches us a lesson that you can't leave holes like
32 this open a long time because someone eventually will abuse them.
33
34
35 Option 1: IP-based limiting
36 ===========================
37 The original idea was to set a hard limit of submissions per week based
38 on IP address of the submitter. This has (at least as far as IPv4 is
39 concerned) the advantages that:
40
41 - submitted has limited control of his IP address (i.e. he can't just
42 submit stuff using arbitrary data)
43
44 - IP address range is naturally limited
45
46 - IP addresses have non-zero cost
47
48 This method could strongly reduce the number of fake submissions one
49 attacker could devise. However, it has a few problems too:
50
51 - a low limit would harm legitimate submitters sharing IP address
52 (i.e. behind NAT)
53
54 - it actively favors people with access to large number of IP addresses
55
56 - it doesn't map cleanly to IPv6 (where some people may have just one IP
57 address, and others may have whole /64 or /48 ranges)
58
59 - it may cause problems for anonymizing network users (and we want to
60 encourage Tor usage for privacy)
61
62 All this considered, IP address limiting can't be used the primary
63 method of preventing fake submissions. However, I suppose it could work
64 as an additional DoS prevention, limiting the number of submissions from
65 a single address over short periods of time.
66
67 Example: if we limit to 10 requests an hour, then a single IP can be
68 used ot manufacture at most 240 submissions a day. This might be
69 sufficient to render them unusable but should keep the database
70 reasonably safe.
71
72
73 Option 2: proof-of-work
74 =======================
75 An alternative of using a proof-of-work algorithm was suggested to me
76 yesterday. The idea is that every submission has to be accompanied with
77 the result of some cumbersome calculation that can't be trivially run
78 in parallel or optimized out to dedicated hardware.
79
80 On the plus side, it would rely more on actual physical hardware than IP
81 addresses provided by ISPs. While it would be a waste of CPU time
82 and memory, doing it just once a week wouldn't be that much harm.
83
84 On the minus side, it would penalize people with weak hardware.
85
86 For example, 'time hashcash -m -b 28 -r test' gives:
87
88 - 34 s (-s estimated 38 s) on Ryzen 5 3600
89
90 - 3 minutes (estimated 92 s) on some old 32-bit Celeron M
91
92 At the same time, it would still permit a lot of fake submissions. For
93 example, randomx [1] claims to require 2G of memory in fast mode. This
94 would still allow me to use 7 threads. If we adjusted the algorithm to
95 take ~30 seconds, that means 7 submissions every 30 s, i.e. 20k
96 submissions a day.
97
98 So in the end, while this is interesting, it doesn't seem like
99 a workable anti-spam measure.
100
101
102 Option 3: explicit CAPTCHA
103 ==========================
104 A traditional way of dealing with spam -- require every new system
105 identifier to be confirmed by solving a CAPTCHA (or a few identifiers
106 for one CAPTCHA).
107
108 The advantage of this method is that it requires a real human work to be
109 performed, effectively limiting the ability to submit spam.
110 The disadvantage is that it is cumbersome to users, so many of them will
111 just resign from participating.
112
113
114 Other ideas
115 ===========
116 Do you have any other ideas on how we could resolve this?
117
118
119 [1] https://github.com/tevador/RandomX
120
121
122 --
123 Best regards,
124 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] [RFC] Anti-spam for goose "Toralf Förster" <toralf@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose Tomas Mozes <hydrapolic@×××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose Fabian Groffen <grobian@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose Robert Bridge <robert@××××××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose Gordon Pettey <petteyg359@×××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose Kent Fredric <kentnl@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose Kent Fredric <kentnl@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose Peter Stuge <peter@×××××.se>