Gentoo Archives: gentoo-dev

From:	Tomas Mozes <hydrapolic@×××××.com>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] [RFC] Anti-spam for goose
Date:	Thu, 21 May 2020 09:48:58
Message-Id:	`CAG6MAzTLh+6P4jTPuZK4SunbXZ8-LhYNmkp6Pv0Y_ZJ+r1=1ig@mail.gmail.com`
In Reply to:	[gentoo-dev] [RFC] Anti-spam for goose by "Michał Górny"

1	On Thu, May 21, 2020 at 10:47 AM Michał Górny <mgorny@g.o> wrote:
2
3	> Hi,
4	>
5	> TL;DR: I'm looking for opinions on how to protect goose from spam,
6	> i.e. mass fake submissions.
7	>
8	>
9	> Problem
10	> =======
11	> Goose currently lacks proper limiting of submitted data. The only
12	> limiter currently in place is based on unique submitter id that is
13	> randomly generated at setup time and in full control of the submitter.
14	> This only protects against accidental duplicates but it can't protect
15	> against deliberate action.
16	>
17	> An attacker could easily submit thousands (millions?) of fake entries by
18	> issuing a lot of requests with different ids. Creating them is
19	> as trivial as using successive numbers. The potential damage includes:
20	>
21	> - distorting the metrics to the point of it being useless (even though
22	> some people consider it useless by design).
23	>
24	> - submitting lots of arbitrary data to cause DoS via growing
25	> the database until no disk space is left.
26	>
27	> - blocking large range of valid user ids, causing collisions with
28	> legitimate users more likely.
29	>
30	> I don't think it worthwhile to discuss the motivation for doing so:
31	> whether it would be someone wishing harm to Gentoo, disagreeing with
32	> the project or merely wanting to try and see if it would work. The case
33	> of SKS keyservers teaches us a lesson that you can't leave holes like
34	> this open a long time because someone eventually will abuse them.
35	>
36	>
37	> Option 1: IP-based limiting
38	> ===========================
39	> The original idea was to set a hard limit of submissions per week based
40	> on IP address of the submitter. This has (at least as far as IPv4 is
41	> concerned) the advantages that:
42	>
43	> - submitted has limited control of his IP address (i.e. he can't just
44	> submit stuff using arbitrary data)
45	>
46	> - IP address range is naturally limited
47	>
48	> - IP addresses have non-zero cost
49	>
50	> This method could strongly reduce the number of fake submissions one
51	> attacker could devise. However, it has a few problems too:
52	>
53	> - a low limit would harm legitimate submitters sharing IP address
54	> (i.e. behind NAT)
55	>
56	> - it actively favors people with access to large number of IP addresses
57	>
58	> - it doesn't map cleanly to IPv6 (where some people may have just one IP
59	> address, and others may have whole /64 or /48 ranges)
60	>
61	> - it may cause problems for anonymizing network users (and we want to
62	> encourage Tor usage for privacy)
63	>
64	> All this considered, IP address limiting can't be used the primary
65	> method of preventing fake submissions. However, I suppose it could work
66	> as an additional DoS prevention, limiting the number of submissions from
67	> a single address over short periods of time.
68	>
69	> Example: if we limit to 10 requests an hour, then a single IP can be
70	> used ot manufacture at most 240 submissions a day. This might be
71	> sufficient to render them unusable but should keep the database
72	> reasonably safe.
73	>
74	>
75	> Option 2: proof-of-work
76	> =======================
77	> An alternative of using a proof-of-work algorithm was suggested to me
78	> yesterday. The idea is that every submission has to be accompanied with
79	> the result of some cumbersome calculation that can't be trivially run
80	> in parallel or optimized out to dedicated hardware.
81	>
82	> On the plus side, it would rely more on actual physical hardware than IP
83	> addresses provided by ISPs. While it would be a waste of CPU time
84	> and memory, doing it just once a week wouldn't be that much harm.
85	>
86	> On the minus side, it would penalize people with weak hardware.
87	>
88	> For example, 'time hashcash -m -b 28 -r test' gives:
89	>
90	> - 34 s (-s estimated 38 s) on Ryzen 5 3600
91	>
92	> - 3 minutes (estimated 92 s) on some old 32-bit Celeron M
93	>
94	> At the same time, it would still permit a lot of fake submissions. For
95	> example, randomx [1] claims to require 2G of memory in fast mode. This
96	> would still allow me to use 7 threads. If we adjusted the algorithm to
97	> take ~30 seconds, that means 7 submissions every 30 s, i.e. 20k
98	> submissions a day.
99	>
100	> So in the end, while this is interesting, it doesn't seem like
101	> a workable anti-spam measure.
102	>
103	>
104	> Option 3: explicit CAPTCHA
105	> ==========================
106	> A traditional way of dealing with spam -- require every new system
107	> identifier to be confirmed by solving a CAPTCHA (or a few identifiers
108	> for one CAPTCHA).
109	>
110	> The advantage of this method is that it requires a real human work to be
111	> performed, effectively limiting the ability to submit spam.
112	> The disadvantage is that it is cumbersome to users, so many of them will
113	> just resign from participating.
114	>
115	>
116	> Other ideas
117	> ===========
118	> Do you have any other ideas on how we could resolve this?
119	>
120	>
121	> [1] https://github.com/tevador/RandomX
122	>
123	>
124	> --
125	> Best regards,
126	> Michał Górny
127	>
128
129
130
131	Sadly, the problem with IP addresses is (in this case), that there are
132	anonymous. One can easily start an attack with thousands of IPs (all around
133	the world).
134
135	One solution would be to introduce user accounts:
136	- one needs to register with an email
137	- you can rate limit based on the client (not the IP)
138
139	For example I've 200 servers, I'd create one account, verify my email
140	(maybe captcha too) and deploy a config with my token on all servers. Then
141	I'd setup a cron job on every server to submit stats. A token can have some
142	lifetime and you could create a new one when the old is about to expire.
143
144	If you discover I'm doing false reports, you'd block all my submissions. I
145	can still do fake submissions, but you'd need a per-host verification to
146	avoid that.
147
148	Tomas

Replies

Subject	Author
Re: [gentoo-dev] [RFC] Anti-spam for goose	"Michał Górny" <mgorny@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Jaco Kroon <jaco@××××××.za>

Report Message

Find on MARC Find on Google Groups