1 |
Hi all, |
2 |
|
3 |
I believe that we are all have forgotten about Donald Knuth: Premature |
4 |
optimisation is the root of all evill. |
5 |
|
6 |
We don't have "spam" yet, but we are already trying to protect. There might |
7 |
be cases when some systems will be posting stats more often than we want, |
8 |
but probably that will not harm us. Or this will be done by our main users |
9 |
who runs 1kk of gentoo installations and this "spam" will be actually |
10 |
valuable. Moreover, nobody forces us to treat info from 'goose' as first |
11 |
priority, so we are still able to select on which packages to work. In |
12 |
short: this topic is not so important yet, I think. |
13 |
|
14 |
Viktar |
15 |
|
16 |
|
17 |
On Thu, May 21, 2020, 16:28 Jaco Kroon <jaco@××××××.za> wrote: |
18 |
|
19 |
> Hi Michał, |
20 |
> |
21 |
> On 2020/05/21 13:02, Michał Górny wrote: |
22 |
> > On Thu, 2020-05-21 at 12:45 +0200, Jaco Kroon wrote: |
23 |
> >> Even for v4, as an attacker ... well, as I'm sitting here right now I've |
24 |
> >> got direct access to almost a /20 (4096 addresses). I know a number of |
25 |
> >> people with larger scopes than that. Use bot-nets and the scope goes up |
26 |
> >> even more. |
27 |
> > See how unfair the world is! You are filling your bathtub with IP |
28 |
> > addresses, and my ISP has taken mine only recently. |
29 |
> I must admit, I work for an ISP :$ |
30 |
> >>> Option 3: explicit CAPTCHA |
31 |
> >>> ========================== |
32 |
> >>> A traditional way of dealing with spam -- require every new system |
33 |
> >>> identifier to be confirmed by solving a CAPTCHA (or a few |
34 |
> identifiers |
35 |
> >>> for one CAPTCHA). |
36 |
> >>> |
37 |
> >>> The advantage of this method is that it requires a real human work |
38 |
> >>> to be |
39 |
> >>> performed, effectively limiting the ability to submit spam. |
40 |
> >>> |
41 |
> >> Yea. One would think. CAPTCHAs are massively intrusive and in my |
42 |
> >> opinion more effort than they're worth. |
43 |
> >> |
44 |
> >> This may be beneficial to *generate* a token. In other words - when |
45 |
> >> generating a token, that token needs to be registered by way of capthca. |
46 |
> >> |
47 |
> >>> Other ideas |
48 |
> >>> =========== |
49 |
> >>> Do you have any other ideas on how we could resolve this? |
50 |
> >>> |
51 |
> >> Generated token + hardware based hash. |
52 |
> > How are you going to verify that the hardware-based hash is real, |
53 |
> > and not just a random value created to circumvent the protection? |
54 |
> |
55 |
> So the generation of the hash is more to validate that it's still on the |
56 |
> same installation (ie, not a cloned token). Sorry if that wasn't clear, |
57 |
> so trying to solve two possible problems in one go. |
58 |
> |
59 |
> > |
60 |
> >> Rate limit the combination to 1/day. |
61 |
> >> |
62 |
> >> Don't use included results until it's been kept up to date for a minimum |
63 |
> >> period. Say updated at least 20 times 30 days. |
64 |
> > For privacy reasons, we don't correlate the results. So this is |
65 |
> > impossible to implement. |
66 |
> |
67 |
> Ok, but a token cannot (unless we issue it based on an email based |
68 |
> account) be linked back to a specific user, so does it matter if we |
69 |
> associate uploads with a token? |
70 |
> |
71 |
> >> The downside here is that many machines are not powered up at least once |
72 |
> >> a day to be able to perform that initial submission sequence. So |
73 |
> >> perhaps it's a bit stringent. |
74 |
> > Exactly. Even once a week is a bit risky but once a day is too narrow |
75 |
> > a period. |
76 |
> > |
77 |
> > To some degree, we could decide we don't care about exact numbers |
78 |
> > as much as some degree of weighed proportions. This would mean that, |
79 |
> > say, people who submit daily get the count of 7, at the loss of people |
80 |
> > who don't run their machines that much. It would effectively put more |
81 |
> > emphasis on more active users. It's debatable whether this is desirable |
82 |
> > or not. |
83 |
> Decaying averages. Simple to implement, don't need all historic data. |
84 |
> > |
85 |
> > Both the token and hardware hash can of course be tainted and is under |
86 |
> >> "attacker control". |
87 |
> > Exactly. So it really looks like exercise for the sake of exercise. |
88 |
> |
89 |
> Unless tokens are *issued* as per the rest of my email you snipped |
90 |
> away. Wherein I proposed an issuing of both anonymous and non-anonymous |
91 |
> tokens. |
92 |
> |
93 |
> Kind Regards, |
94 |
> Jaco |
95 |
> |
96 |
> |
97 |
> |