Gentoo Archives: gentoo-dev

From: Gordon Pettey <petteyg359@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] Anti-spam for goose
Date: Thu, 21 May 2020 13:23:35
Message-Id: CAHY5MeeOukM8DDP+AXGtSf4L6sv8yT=95BEcMrWUxjBgwyGU=g@mail.gmail.com
In Reply to: [gentoo-dev] [RFC] Anti-spam for goose by "Michał Górny"
1 Require browser-based interaction to use the service. Do something funky
2 with AJAX so the page can't be properly used with curl or anything so that
3 manual effort is required to get the UUID to submit as. Only allow
4 registered UUIDs, and only allow one submission per day per UUID.
5 Sure, somebody can go to Mechanical Turk and pay a few cents to generate
6 fake submission IDs, but at least you have that tiny deterrent of "I've got
7 to pay 3 cents per spam account :(".
8
9 Maybe also add some minor tracking to the database if it isn't already
10 there to count submissions over time per UUID, and make the default cron
11 script weekly. If you see some UUID that is submitting at the maximum rate
12 of daily, you may lean towards accusations of spam.
13
14 On Thu, May 21, 2020 at 3:47 AM Michał Górny <mgorny@g.o> wrote:
15
16 > Hi,
17 >
18 > TL;DR: I'm looking for opinions on how to protect goose from spam,
19 > i.e. mass fake submissions.
20 >
21 >
22 > Problem
23 > =======
24 > Goose currently lacks proper limiting of submitted data. The only
25 > limiter currently in place is based on unique submitter id that is
26 > randomly generated at setup time and in full control of the submitter.
27 > This only protects against accidental duplicates but it can't protect
28 > against deliberate action.
29 >
30 > An attacker could easily submit thousands (millions?) of fake entries by
31 > issuing a lot of requests with different ids. Creating them is
32 > as trivial as using successive numbers. The potential damage includes:
33 >
34 > - distorting the metrics to the point of it being useless (even though
35 > some people consider it useless by design).
36 >
37 > - submitting lots of arbitrary data to cause DoS via growing
38 > the database until no disk space is left.
39 >
40 > - blocking large range of valid user ids, causing collisions with
41 > legitimate users more likely.
42 >
43 > I don't think it worthwhile to discuss the motivation for doing so:
44 > whether it would be someone wishing harm to Gentoo, disagreeing with
45 > the project or merely wanting to try and see if it would work. The case
46 > of SKS keyservers teaches us a lesson that you can't leave holes like
47 > this open a long time because someone eventually will abuse them.
48 >
49 >
50 > Option 1: IP-based limiting
51 > ===========================
52 > The original idea was to set a hard limit of submissions per week based
53 > on IP address of the submitter. This has (at least as far as IPv4 is
54 > concerned) the advantages that:
55 >
56 > - submitted has limited control of his IP address (i.e. he can't just
57 > submit stuff using arbitrary data)
58 >
59 > - IP address range is naturally limited
60 >
61 > - IP addresses have non-zero cost
62 >
63 > This method could strongly reduce the number of fake submissions one
64 > attacker could devise. However, it has a few problems too:
65 >
66 > - a low limit would harm legitimate submitters sharing IP address
67 > (i.e. behind NAT)
68 >
69 > - it actively favors people with access to large number of IP addresses
70 >
71 > - it doesn't map cleanly to IPv6 (where some people may have just one IP
72 > address, and others may have whole /64 or /48 ranges)
73 >
74 > - it may cause problems for anonymizing network users (and we want to
75 > encourage Tor usage for privacy)
76 >
77 > All this considered, IP address limiting can't be used the primary
78 > method of preventing fake submissions. However, I suppose it could work
79 > as an additional DoS prevention, limiting the number of submissions from
80 > a single address over short periods of time.
81 >
82 > Example: if we limit to 10 requests an hour, then a single IP can be
83 > used ot manufacture at most 240 submissions a day. This might be
84 > sufficient to render them unusable but should keep the database
85 > reasonably safe.
86 >
87 >
88 > Option 2: proof-of-work
89 > =======================
90 > An alternative of using a proof-of-work algorithm was suggested to me
91 > yesterday. The idea is that every submission has to be accompanied with
92 > the result of some cumbersome calculation that can't be trivially run
93 > in parallel or optimized out to dedicated hardware.
94 >
95 > On the plus side, it would rely more on actual physical hardware than IP
96 > addresses provided by ISPs. While it would be a waste of CPU time
97 > and memory, doing it just once a week wouldn't be that much harm.
98 >
99 > On the minus side, it would penalize people with weak hardware.
100 >
101 > For example, 'time hashcash -m -b 28 -r test' gives:
102 >
103 > - 34 s (-s estimated 38 s) on Ryzen 5 3600
104 >
105 > - 3 minutes (estimated 92 s) on some old 32-bit Celeron M
106 >
107 > At the same time, it would still permit a lot of fake submissions. For
108 > example, randomx [1] claims to require 2G of memory in fast mode. This
109 > would still allow me to use 7 threads. If we adjusted the algorithm to
110 > take ~30 seconds, that means 7 submissions every 30 s, i.e. 20k
111 > submissions a day.
112 >
113 > So in the end, while this is interesting, it doesn't seem like
114 > a workable anti-spam measure.
115 >
116 >
117 > Option 3: explicit CAPTCHA
118 > ==========================
119 > A traditional way of dealing with spam -- require every new system
120 > identifier to be confirmed by solving a CAPTCHA (or a few identifiers
121 > for one CAPTCHA).
122 >
123 > The advantage of this method is that it requires a real human work to be
124 > performed, effectively limiting the ability to submit spam.
125 > The disadvantage is that it is cumbersome to users, so many of them will
126 > just resign from participating.
127 >
128 >
129 > Other ideas
130 > ===========
131 > Do you have any other ideas on how we could resolve this?
132 >
133 >
134 > [1] https://github.com/tevador/RandomX
135 >
136 >
137 > --
138 > Best regards,
139 > Michał Górny
140 >
141 >