Gentoo Archives: gentoo-soc

From: Joachim Bartosik <jbartosik@×××××.com>
To: gentoo-soc@l.g.o
Subject: Re: [gentoo-soc] Re: Gentoo stats server/client,
Date: Mon, 23 Mar 2009 23:40:20
Message-Id: 53d3ab620903231640w24f07a82y17b2825d0fc96e4f@mail.gmail.com
In Reply to: Re: [gentoo-soc] Re: Gentoo stats server/client, by Alec Warner
1 This idea looks interesting so if you don't mind I'll join the thread.
2 I tried to cut everything short but it looks too long anyway :/ And trying
3 too keep it short probably made some parts understandable so please ask.
4 If you see * scroll down to end of email for explanations.
5
6
7 >> There have been many stats projects in the past that have failed due to
8 > >> various reasons. A simple question is: How are you planning on making
9 > >> your idea/proposal not fail? ;)
10 >
11
12 By being lazy and putting as much work on others as possible.
13
14 Authentication/ security overview
15
16 The idea from 2006 ( to create account one has to ask for id and submit
17 some data) makes usage very simple for users ( they don't even need to know
18 anything about authorisation), but unluckily it's very easy to write
19 "client" that would submit a lot data that would spoil the data ( I guess
20 that's a major issue with authentication and security).
21
22 To solve this problem I'd use less comfortable for users solution: user wold
23 have to create an account using an email ( of course it wouldn't be stored,
24 I'd store some one-way injective function of it*) and click an emailed link.
25 There would be no need for password - to confirm his[her] actions [s]he
26 would just click an emailed link.
27
28 Each user ( email) would have a hosts** limit ( probably set in server
29 configuration) 2 or 3 by default ( enough for average user, not enough to
30 easily spoil data). After some time of inactivity host/ account would be
31 removed.
32
33 The problem starts if one would need to get more hosts per account, right
34 now I have some ideas ( none very good):
35 - the easiest to implement method is "please email our admin and explain why
36 do you need them" but it's user unfriendly and admin unfriendly.
37 - give really big limit on the hosts per email - it would be easy to inject
38 a lot of false data, but it's easier to remove then in 2006 auth( identify
39 wrongdoing emails and delete their hosts).
40 - require users to give some non-free ( free as in beer) email to reduce
41 possibility of using fake emails and give big hosts limit.
42
43 I'd try to keep need to click email's to minimum - registration and
44 administrative tasks ( like removing hosts from account).
45
46
47
48 Components:
49
50 Client
51
52 Probably in python to take advantage of all the work portage developers have
53 done and save me work. I'd be a simple run-me-from command line ( cron)
54 program sending arch, all installed cpv and their USE ( for sure and before
55 end of summer) and maybe some more if time allows ( "A daemon with 2 working
56 modules is better than a daemon with 10 half finished ones."). Maybe [if
57 time allows] GUI wrapper to run it in tray.
58
59 Server:
60 Would be split into several independent programs ( to save me work). All
61 except first one would be written in python.
62
63 User communication:
64 Thanks for Rest idea - i thought about using HTML/ HTTPS but making it's
65 stateless saves a lot of work. To save me some work I'd start with Apache +
66 php + MySQL, one path per action ( register host, register user, send data,
67 ...). It'd put received data in MySQL ( not verify if their correct, simply
68 get data, and put it in table with information who and when sent it). It's
69 not a very elegant solution ( and may turn out to be slow) so -if the time
70 allows and there is need to- I'll rewrite it in python.
71
72 Data gathering:
73 It'd take data provided by user communication module, decompress it, apply
74 deltas etc. to create all-the-information-available about current state of
75 hosts.
76
77 Cleaner:
78 Run from time to time ( by cron, frequency adjusted to needs). Remove hosts
79 and users that do not send data ( to conserve space) etc.
80
81 Achiever:
82 Run from time to time ( cron, as needed). Data gathering provides only
83 information about hosts *right now*. Achiever would generate statistics (
84 like package popularity ( % hosts that installed it)) and store them to make
85 historical data available ( storing all host states history would be
86 extremely excessive).
87
88 * The one-way part means that there is no easy way to get users emails even
89 if someone gets access to the all data stored on server. The injective part
90 means that no two emails will generate the same output, so no two users will
91 get the same account. Hashes won't work because they are not injective
92 functions but I'm almost sure someone already wrote functions like that. I
93 don't recall any right now, but I'll have plenty of time to look for them or
94 in worst case write one me self ( easy: create asymmetric pair of keys,
95 throw private to /dev/null so none can decrypt it and encrypt emails with
96 public one).
97
98 ** 1 host == 1 data set ( installed packages, arch etc.)
99
100
101
102 I realized I forgot to tell who am I:
103
104 I'm Joachim live in Poland ( UTC + 1). Study mathematics ( 3rd year). Use
105 Gentoo since 2005 ( as main desktop OS) or 2004 ( first contact). Code since
106 2003 ( training for/ participating in http://www.oi.edu.pl English version
107 available - look in the right top corner) or 2001 ( started to play with
108 Vbasic ). Cannot drink black tea. Right now extremely tired after sleepless
109 weekend ( due to several breakdowns at home).
110 Good night.

Replies

Subject Author
Re: [gentoo-soc] Re: Gentoo stats server/client, Alec Warner <antarus@g.o>