Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Cc: "Michał Górny" <mgorny@g.o>, Tomas Mozes <hydrapolic@×××××.com>
Subject: Re: [gentoo-dev] [RFC] Anti-spam for goose
Date: Fri, 22 May 2020 00:38:24
Message-Id: CAAr7Pr-oX2Ox86phBZOb8jcMe=_xoYkNYnv7Hitwi5Pga2WT8w@mail.gmail.com
In Reply to: Re: [gentoo-dev] [RFC] Anti-spam for goose by Viktar Patotski
1 On Thu, May 21, 2020 at 1:13 PM Viktar Patotski <xp.vit.blr@×××××.com>
2 wrote:
3
4 > Hi all,
5 >
6 > I believe that we are all have forgotten about Donald Knuth: Premature
7 > optimisation is the root of all evill.
8 >
9 > We don't have "spam" yet, but we are already trying to protect. There
10 > might be cases when some systems will be posting stats more often than we
11 > want, but probably that will not harm us. Or this will be done by our main
12 > users who runs 1kk of gentoo installations and this "spam" will be
13 > actually valuable. Moreover, nobody forces us to treat info from 'goose' as
14 > first priority, so we are still able to select on which packages to work.
15 > In short: this topic is not so important yet, I think.
16 >
17
18 I raised a similar question on irc and the conclusion was that 'it is good
19 to have ideas' and I don't necessarily disagree there[0]. We cannot build a
20 foolproof system but some are feasible in some scenarios[1].
21
22 [0] Gentoo offers numerous no-login-required services; most of these are
23 read-only but they typically don't suffer from attacks; or at least, not
24 attacks that we need to respond to. The most obvious one of these is our
25 gentoo.org mail service which accepts unauthenticated email to gentoo.org.
26 Our anti-email-spam countermeasures are what I would call complex, but we
27 still employ broad measures when needed and the tradeoffs are similar to
28 the options for goose; e.g. if we are too broad we can block email from
29 large swaths of the internet.
30 [1] Bugzilla *has* recently been the target of spam attacks, it *has*
31 logins required (e.g. to create / modify bugs) and it has not stopped the
32 spammers from creating accounts. We have discussed different protections
33 for bugzilla, as it has different parameters. A basic bugzilla account
34 can't do all that much (you can't modify the bugs of others easily) and
35 spam posts are easily identified. This is to differentiate from goose where
36 the powers of each token are the same (submit report) and it may be
37 difficult to tell an abusive report from a real report.
38
39
40 > Viktar
41 >
42 >
43 > On Thu, May 21, 2020, 16:28 Jaco Kroon <jaco@××××××.za> wrote:
44 >
45 >> Hi Michał,
46 >>
47 >> On 2020/05/21 13:02, Michał Górny wrote:
48 >> > On Thu, 2020-05-21 at 12:45 +0200, Jaco Kroon wrote:
49 >> >> Even for v4, as an attacker ... well, as I'm sitting here right now
50 >> I've
51 >> >> got direct access to almost a /20 (4096 addresses). I know a number of
52 >> >> people with larger scopes than that. Use bot-nets and the scope goes
53 >> up
54 >> >> even more.
55 >> > See how unfair the world is! You are filling your bathtub with IP
56 >> > addresses, and my ISP has taken mine only recently.
57 >> I must admit, I work for an ISP :$
58 >> >>> Option 3: explicit CAPTCHA
59 >> >>> ==========================
60 >> >>> A traditional way of dealing with spam -- require every new system
61 >> >>> identifier to be confirmed by solving a CAPTCHA (or a few
62 >> identifiers
63 >> >>> for one CAPTCHA).
64 >> >>>
65 >> >>> The advantage of this method is that it requires a real human work
66 >> >>> to be
67 >> >>> performed, effectively limiting the ability to submit spam.
68 >> >>>
69 >> >> Yea. One would think. CAPTCHAs are massively intrusive and in my
70 >> >> opinion more effort than they're worth.
71 >> >>
72 >> >> This may be beneficial to *generate* a token. In other words - when
73 >> >> generating a token, that token needs to be registered by way of
74 >> capthca.
75 >> >>
76 >> >>> Other ideas
77 >> >>> ===========
78 >> >>> Do you have any other ideas on how we could resolve this?
79 >> >>>
80 >> >> Generated token + hardware based hash.
81 >> > How are you going to verify that the hardware-based hash is real,
82 >> > and not just a random value created to circumvent the protection?
83 >>
84 >> So the generation of the hash is more to validate that it's still on the
85 >> same installation (ie, not a cloned token). Sorry if that wasn't clear,
86 >> so trying to solve two possible problems in one go.
87 >>
88 >> >
89 >> >> Rate limit the combination to 1/day.
90 >> >>
91 >> >> Don't use included results until it's been kept up to date for a
92 >> minimum
93 >> >> period. Say updated at least 20 times 30 days.
94 >> > For privacy reasons, we don't correlate the results. So this is
95 >> > impossible to implement.
96 >>
97 >> Ok, but a token cannot (unless we issue it based on an email based
98 >> account) be linked back to a specific user, so does it matter if we
99 >> associate uploads with a token?
100 >>
101 >> >> The downside here is that many machines are not powered up at least
102 >> once
103 >> >> a day to be able to perform that initial submission sequence. So
104 >> >> perhaps it's a bit stringent.
105 >> > Exactly. Even once a week is a bit risky but once a day is too narrow
106 >> > a period.
107 >> >
108 >> > To some degree, we could decide we don't care about exact numbers
109 >> > as much as some degree of weighed proportions. This would mean that,
110 >> > say, people who submit daily get the count of 7, at the loss of people
111 >> > who don't run their machines that much. It would effectively put more
112 >> > emphasis on more active users. It's debatable whether this is desirable
113 >> > or not.
114 >> Decaying averages. Simple to implement, don't need all historic data.
115 >> >
116 >> > Both the token and hardware hash can of course be tainted and is under
117 >> >> "attacker control".
118 >> > Exactly. So it really looks like exercise for the sake of exercise.
119 >>
120 >> Unless tokens are *issued* as per the rest of my email you snipped
121 >> away. Wherein I proposed an issuing of both anonymous and non-anonymous
122 >> tokens.
123 >>
124 >> Kind Regards,
125 >> Jaco
126 >>
127 >>
128 >>