1 |
On Thu, May 21, 2020 at 1:13 PM Viktar Patotski <xp.vit.blr@×××××.com> |
2 |
wrote: |
3 |
|
4 |
> Hi all, |
5 |
> |
6 |
> I believe that we are all have forgotten about Donald Knuth: Premature |
7 |
> optimisation is the root of all evill. |
8 |
> |
9 |
> We don't have "spam" yet, but we are already trying to protect. There |
10 |
> might be cases when some systems will be posting stats more often than we |
11 |
> want, but probably that will not harm us. Or this will be done by our main |
12 |
> users who runs 1kk of gentoo installations and this "spam" will be |
13 |
> actually valuable. Moreover, nobody forces us to treat info from 'goose' as |
14 |
> first priority, so we are still able to select on which packages to work. |
15 |
> In short: this topic is not so important yet, I think. |
16 |
> |
17 |
|
18 |
I raised a similar question on irc and the conclusion was that 'it is good |
19 |
to have ideas' and I don't necessarily disagree there[0]. We cannot build a |
20 |
foolproof system but some are feasible in some scenarios[1]. |
21 |
|
22 |
[0] Gentoo offers numerous no-login-required services; most of these are |
23 |
read-only but they typically don't suffer from attacks; or at least, not |
24 |
attacks that we need to respond to. The most obvious one of these is our |
25 |
gentoo.org mail service which accepts unauthenticated email to gentoo.org. |
26 |
Our anti-email-spam countermeasures are what I would call complex, but we |
27 |
still employ broad measures when needed and the tradeoffs are similar to |
28 |
the options for goose; e.g. if we are too broad we can block email from |
29 |
large swaths of the internet. |
30 |
[1] Bugzilla *has* recently been the target of spam attacks, it *has* |
31 |
logins required (e.g. to create / modify bugs) and it has not stopped the |
32 |
spammers from creating accounts. We have discussed different protections |
33 |
for bugzilla, as it has different parameters. A basic bugzilla account |
34 |
can't do all that much (you can't modify the bugs of others easily) and |
35 |
spam posts are easily identified. This is to differentiate from goose where |
36 |
the powers of each token are the same (submit report) and it may be |
37 |
difficult to tell an abusive report from a real report. |
38 |
|
39 |
|
40 |
> Viktar |
41 |
> |
42 |
> |
43 |
> On Thu, May 21, 2020, 16:28 Jaco Kroon <jaco@××××××.za> wrote: |
44 |
> |
45 |
>> Hi Michał, |
46 |
>> |
47 |
>> On 2020/05/21 13:02, Michał Górny wrote: |
48 |
>> > On Thu, 2020-05-21 at 12:45 +0200, Jaco Kroon wrote: |
49 |
>> >> Even for v4, as an attacker ... well, as I'm sitting here right now |
50 |
>> I've |
51 |
>> >> got direct access to almost a /20 (4096 addresses). I know a number of |
52 |
>> >> people with larger scopes than that. Use bot-nets and the scope goes |
53 |
>> up |
54 |
>> >> even more. |
55 |
>> > See how unfair the world is! You are filling your bathtub with IP |
56 |
>> > addresses, and my ISP has taken mine only recently. |
57 |
>> I must admit, I work for an ISP :$ |
58 |
>> >>> Option 3: explicit CAPTCHA |
59 |
>> >>> ========================== |
60 |
>> >>> A traditional way of dealing with spam -- require every new system |
61 |
>> >>> identifier to be confirmed by solving a CAPTCHA (or a few |
62 |
>> identifiers |
63 |
>> >>> for one CAPTCHA). |
64 |
>> >>> |
65 |
>> >>> The advantage of this method is that it requires a real human work |
66 |
>> >>> to be |
67 |
>> >>> performed, effectively limiting the ability to submit spam. |
68 |
>> >>> |
69 |
>> >> Yea. One would think. CAPTCHAs are massively intrusive and in my |
70 |
>> >> opinion more effort than they're worth. |
71 |
>> >> |
72 |
>> >> This may be beneficial to *generate* a token. In other words - when |
73 |
>> >> generating a token, that token needs to be registered by way of |
74 |
>> capthca. |
75 |
>> >> |
76 |
>> >>> Other ideas |
77 |
>> >>> =========== |
78 |
>> >>> Do you have any other ideas on how we could resolve this? |
79 |
>> >>> |
80 |
>> >> Generated token + hardware based hash. |
81 |
>> > How are you going to verify that the hardware-based hash is real, |
82 |
>> > and not just a random value created to circumvent the protection? |
83 |
>> |
84 |
>> So the generation of the hash is more to validate that it's still on the |
85 |
>> same installation (ie, not a cloned token). Sorry if that wasn't clear, |
86 |
>> so trying to solve two possible problems in one go. |
87 |
>> |
88 |
>> > |
89 |
>> >> Rate limit the combination to 1/day. |
90 |
>> >> |
91 |
>> >> Don't use included results until it's been kept up to date for a |
92 |
>> minimum |
93 |
>> >> period. Say updated at least 20 times 30 days. |
94 |
>> > For privacy reasons, we don't correlate the results. So this is |
95 |
>> > impossible to implement. |
96 |
>> |
97 |
>> Ok, but a token cannot (unless we issue it based on an email based |
98 |
>> account) be linked back to a specific user, so does it matter if we |
99 |
>> associate uploads with a token? |
100 |
>> |
101 |
>> >> The downside here is that many machines are not powered up at least |
102 |
>> once |
103 |
>> >> a day to be able to perform that initial submission sequence. So |
104 |
>> >> perhaps it's a bit stringent. |
105 |
>> > Exactly. Even once a week is a bit risky but once a day is too narrow |
106 |
>> > a period. |
107 |
>> > |
108 |
>> > To some degree, we could decide we don't care about exact numbers |
109 |
>> > as much as some degree of weighed proportions. This would mean that, |
110 |
>> > say, people who submit daily get the count of 7, at the loss of people |
111 |
>> > who don't run their machines that much. It would effectively put more |
112 |
>> > emphasis on more active users. It's debatable whether this is desirable |
113 |
>> > or not. |
114 |
>> Decaying averages. Simple to implement, don't need all historic data. |
115 |
>> > |
116 |
>> > Both the token and hardware hash can of course be tainted and is under |
117 |
>> >> "attacker control". |
118 |
>> > Exactly. So it really looks like exercise for the sake of exercise. |
119 |
>> |
120 |
>> Unless tokens are *issued* as per the rest of my email you snipped |
121 |
>> away. Wherein I proposed an issuing of both anonymous and non-anonymous |
122 |
>> tokens. |
123 |
>> |
124 |
>> Kind Regards, |
125 |
>> Jaco |
126 |
>> |
127 |
>> |
128 |
>> |