Gentoo Archives: gentoo-dev

From:	"Michał Górny" <mgorny@g.o>
To:	gentoo-dev <gentoo-dev@l.g.o>
Subject:	[gentoo-dev] [RFC] Anti-spam for goose
Date:	Thu, 21 May 2020 08:47:16
Message-Id:	`496f9d713dc1d890d8af717c77429faac20912e1.camel@gentoo.org`

1	Hi,
2
3	TL;DR: I'm looking for opinions on how to protect goose from spam,
4	i.e. mass fake submissions.
5
6
7	Problem
8	=======
9	Goose currently lacks proper limiting of submitted data. The only
10	limiter currently in place is based on unique submitter id that is
11	randomly generated at setup time and in full control of the submitter.
12	This only protects against accidental duplicates but it can't protect
13	against deliberate action.
14
15	An attacker could easily submit thousands (millions?) of fake entries by
16	issuing a lot of requests with different ids. Creating them is
17	as trivial as using successive numbers. The potential damage includes:
18
19	- distorting the metrics to the point of it being useless (even though
20	some people consider it useless by design).
21
22	- submitting lots of arbitrary data to cause DoS via growing
23	the database until no disk space is left.
24
25	- blocking large range of valid user ids, causing collisions with
26	legitimate users more likely.
27
28	I don't think it worthwhile to discuss the motivation for doing so:
29	whether it would be someone wishing harm to Gentoo, disagreeing with
30	the project or merely wanting to try and see if it would work. The case
31	of SKS keyservers teaches us a lesson that you can't leave holes like
32	this open a long time because someone eventually will abuse them.
33
34
35	Option 1: IP-based limiting
36	===========================
37	The original idea was to set a hard limit of submissions per week based
38	on IP address of the submitter. This has (at least as far as IPv4 is
39	concerned) the advantages that:
40
41	- submitted has limited control of his IP address (i.e. he can't just
42	submit stuff using arbitrary data)
43
44	- IP address range is naturally limited
45
46	- IP addresses have non-zero cost
47
48	This method could strongly reduce the number of fake submissions one
49	attacker could devise. However, it has a few problems too:
50
51	- a low limit would harm legitimate submitters sharing IP address
52	(i.e. behind NAT)
53
54	- it actively favors people with access to large number of IP addresses
55
56	- it doesn't map cleanly to IPv6 (where some people may have just one IP
57	address, and others may have whole /64 or /48 ranges)
58
59	- it may cause problems for anonymizing network users (and we want to
60	encourage Tor usage for privacy)
61
62	All this considered, IP address limiting can't be used the primary
63	method of preventing fake submissions. However, I suppose it could work
64	as an additional DoS prevention, limiting the number of submissions from
65	a single address over short periods of time.
66
67	Example: if we limit to 10 requests an hour, then a single IP can be
68	used ot manufacture at most 240 submissions a day. This might be
69	sufficient to render them unusable but should keep the database
70	reasonably safe.
71
72
73	Option 2: proof-of-work
74	=======================
75	An alternative of using a proof-of-work algorithm was suggested to me
76	yesterday. The idea is that every submission has to be accompanied with
77	the result of some cumbersome calculation that can't be trivially run
78	in parallel or optimized out to dedicated hardware.
79
80	On the plus side, it would rely more on actual physical hardware than IP
81	addresses provided by ISPs. While it would be a waste of CPU time
82	and memory, doing it just once a week wouldn't be that much harm.
83
84	On the minus side, it would penalize people with weak hardware.
85
86	For example, 'time hashcash -m -b 28 -r test' gives:
87
88	- 34 s (-s estimated 38 s) on Ryzen 5 3600
89
90	- 3 minutes (estimated 92 s) on some old 32-bit Celeron M
91
92	At the same time, it would still permit a lot of fake submissions. For
93	example, randomx [1] claims to require 2G of memory in fast mode. This
94	would still allow me to use 7 threads. If we adjusted the algorithm to
95	take ~30 seconds, that means 7 submissions every 30 s, i.e. 20k
96	submissions a day.
97
98	So in the end, while this is interesting, it doesn't seem like
99	a workable anti-spam measure.
100
101
102	Option 3: explicit CAPTCHA
103	==========================
104	A traditional way of dealing with spam -- require every new system
105	identifier to be confirmed by solving a CAPTCHA (or a few identifiers
106	for one CAPTCHA).
107
108	The advantage of this method is that it requires a real human work to be
109	performed, effectively limiting the ability to submit spam.
110	The disadvantage is that it is cumbersome to users, so many of them will
111	just resign from participating.
112
113
114	Other ideas
115	===========
116	Do you have any other ideas on how we could resolve this?
117
118
119	[1] https://github.com/tevador/RandomX
120
121
122	--
123	Best regards,
124	Michał Górny

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Replies

Subject	Author
Re: [gentoo-dev] [RFC] Anti-spam for goose	"Toralf Förster" <toralf@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Tomas Mozes <hydrapolic@×××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Fabian Groffen <grobian@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Robert Bridge <robert@××××××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Gordon Pettey <petteyg359@×××××.com>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Kent Fredric <kentnl@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Kent Fredric <kentnl@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose	Peter Stuge <peter@×××××.se>

Report Message

Find on MARC Find on Google Groups