Gentoo Archives: gentoo-dev

From: Viktar Patotski <xp.vit.blr@×××××.com>
To: gentoo-dev@l.g.o
Cc: "Michał Górny" <mgorny@g.o>, Tomas Mozes <hydrapolic@×××××.com>
Subject: Re: [gentoo-dev] [RFC] Anti-spam for goose
Date: Thu, 21 May 2020 20:13:52
Message-Id: CAA+wmxzBVioLjBO9i_ZVL9RMAq5gM7C_r=V5tbKxOdCRoTHgZw@mail.gmail.com
In Reply to: Re: [gentoo-dev] [RFC] Anti-spam for goose by Jaco Kroon
1 Hi all,
2
3 I believe that we are all have forgotten about Donald Knuth: Premature
4 optimisation is the root of all evill.
5
6 We don't have "spam" yet, but we are already trying to protect. There might
7 be cases when some systems will be posting stats more often than we want,
8 but probably that will not harm us. Or this will be done by our main users
9 who runs 1kk of gentoo installations and this "spam" will be actually
10 valuable. Moreover, nobody forces us to treat info from 'goose' as first
11 priority, so we are still able to select on which packages to work. In
12 short: this topic is not so important yet, I think.
13
14 Viktar
15
16
17 On Thu, May 21, 2020, 16:28 Jaco Kroon <jaco@××××××.za> wrote:
18
19 > Hi Michał,
20 >
21 > On 2020/05/21 13:02, Michał Górny wrote:
22 > > On Thu, 2020-05-21 at 12:45 +0200, Jaco Kroon wrote:
23 > >> Even for v4, as an attacker ... well, as I'm sitting here right now I've
24 > >> got direct access to almost a /20 (4096 addresses). I know a number of
25 > >> people with larger scopes than that. Use bot-nets and the scope goes up
26 > >> even more.
27 > > See how unfair the world is! You are filling your bathtub with IP
28 > > addresses, and my ISP has taken mine only recently.
29 > I must admit, I work for an ISP :$
30 > >>> Option 3: explicit CAPTCHA
31 > >>> ==========================
32 > >>> A traditional way of dealing with spam -- require every new system
33 > >>> identifier to be confirmed by solving a CAPTCHA (or a few
34 > identifiers
35 > >>> for one CAPTCHA).
36 > >>>
37 > >>> The advantage of this method is that it requires a real human work
38 > >>> to be
39 > >>> performed, effectively limiting the ability to submit spam.
40 > >>>
41 > >> Yea. One would think. CAPTCHAs are massively intrusive and in my
42 > >> opinion more effort than they're worth.
43 > >>
44 > >> This may be beneficial to *generate* a token. In other words - when
45 > >> generating a token, that token needs to be registered by way of capthca.
46 > >>
47 > >>> Other ideas
48 > >>> ===========
49 > >>> Do you have any other ideas on how we could resolve this?
50 > >>>
51 > >> Generated token + hardware based hash.
52 > > How are you going to verify that the hardware-based hash is real,
53 > > and not just a random value created to circumvent the protection?
54 >
55 > So the generation of the hash is more to validate that it's still on the
56 > same installation (ie, not a cloned token). Sorry if that wasn't clear,
57 > so trying to solve two possible problems in one go.
58 >
59 > >
60 > >> Rate limit the combination to 1/day.
61 > >>
62 > >> Don't use included results until it's been kept up to date for a minimum
63 > >> period. Say updated at least 20 times 30 days.
64 > > For privacy reasons, we don't correlate the results. So this is
65 > > impossible to implement.
66 >
67 > Ok, but a token cannot (unless we issue it based on an email based
68 > account) be linked back to a specific user, so does it matter if we
69 > associate uploads with a token?
70 >
71 > >> The downside here is that many machines are not powered up at least once
72 > >> a day to be able to perform that initial submission sequence. So
73 > >> perhaps it's a bit stringent.
74 > > Exactly. Even once a week is a bit risky but once a day is too narrow
75 > > a period.
76 > >
77 > > To some degree, we could decide we don't care about exact numbers
78 > > as much as some degree of weighed proportions. This would mean that,
79 > > say, people who submit daily get the count of 7, at the loss of people
80 > > who don't run their machines that much. It would effectively put more
81 > > emphasis on more active users. It's debatable whether this is desirable
82 > > or not.
83 > Decaying averages. Simple to implement, don't need all historic data.
84 > >
85 > > Both the token and hardware hash can of course be tainted and is under
86 > >> "attacker control".
87 > > Exactly. So it really looks like exercise for the sake of exercise.
88 >
89 > Unless tokens are *issued* as per the rest of my email you snipped
90 > away. Wherein I proposed an issuing of both anonymous and non-anonymous
91 > tokens.
92 >
93 > Kind Regards,
94 > Jaco
95 >
96 >
97 >

Replies

Subject Author
Re: [gentoo-dev] [RFC] Anti-spam for goose Alec Warner <antarus@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose "Michał Górny" <mgorny@g.o>
Re: [gentoo-dev] [RFC] Anti-spam for goose waebbl <waebbl@×××××.com>