1 |
This idea looks interesting so if you don't mind I'll join the thread. |
2 |
I tried to cut everything short but it looks too long anyway :/ And trying |
3 |
too keep it short probably made some parts understandable so please ask. |
4 |
If you see * scroll down to end of email for explanations. |
5 |
|
6 |
|
7 |
>> There have been many stats projects in the past that have failed due to |
8 |
> >> various reasons. A simple question is: How are you planning on making |
9 |
> >> your idea/proposal not fail? ;) |
10 |
> |
11 |
|
12 |
By being lazy and putting as much work on others as possible. |
13 |
|
14 |
Authentication/ security overview |
15 |
|
16 |
The idea from 2006 ( to create account one has to ask for id and submit |
17 |
some data) makes usage very simple for users ( they don't even need to know |
18 |
anything about authorisation), but unluckily it's very easy to write |
19 |
"client" that would submit a lot data that would spoil the data ( I guess |
20 |
that's a major issue with authentication and security). |
21 |
|
22 |
To solve this problem I'd use less comfortable for users solution: user wold |
23 |
have to create an account using an email ( of course it wouldn't be stored, |
24 |
I'd store some one-way injective function of it*) and click an emailed link. |
25 |
There would be no need for password - to confirm his[her] actions [s]he |
26 |
would just click an emailed link. |
27 |
|
28 |
Each user ( email) would have a hosts** limit ( probably set in server |
29 |
configuration) 2 or 3 by default ( enough for average user, not enough to |
30 |
easily spoil data). After some time of inactivity host/ account would be |
31 |
removed. |
32 |
|
33 |
The problem starts if one would need to get more hosts per account, right |
34 |
now I have some ideas ( none very good): |
35 |
- the easiest to implement method is "please email our admin and explain why |
36 |
do you need them" but it's user unfriendly and admin unfriendly. |
37 |
- give really big limit on the hosts per email - it would be easy to inject |
38 |
a lot of false data, but it's easier to remove then in 2006 auth( identify |
39 |
wrongdoing emails and delete their hosts). |
40 |
- require users to give some non-free ( free as in beer) email to reduce |
41 |
possibility of using fake emails and give big hosts limit. |
42 |
|
43 |
I'd try to keep need to click email's to minimum - registration and |
44 |
administrative tasks ( like removing hosts from account). |
45 |
|
46 |
|
47 |
|
48 |
Components: |
49 |
|
50 |
Client |
51 |
|
52 |
Probably in python to take advantage of all the work portage developers have |
53 |
done and save me work. I'd be a simple run-me-from command line ( cron) |
54 |
program sending arch, all installed cpv and their USE ( for sure and before |
55 |
end of summer) and maybe some more if time allows ( "A daemon with 2 working |
56 |
modules is better than a daemon with 10 half finished ones."). Maybe [if |
57 |
time allows] GUI wrapper to run it in tray. |
58 |
|
59 |
Server: |
60 |
Would be split into several independent programs ( to save me work). All |
61 |
except first one would be written in python. |
62 |
|
63 |
User communication: |
64 |
Thanks for Rest idea - i thought about using HTML/ HTTPS but making it's |
65 |
stateless saves a lot of work. To save me some work I'd start with Apache + |
66 |
php + MySQL, one path per action ( register host, register user, send data, |
67 |
...). It'd put received data in MySQL ( not verify if their correct, simply |
68 |
get data, and put it in table with information who and when sent it). It's |
69 |
not a very elegant solution ( and may turn out to be slow) so -if the time |
70 |
allows and there is need to- I'll rewrite it in python. |
71 |
|
72 |
Data gathering: |
73 |
It'd take data provided by user communication module, decompress it, apply |
74 |
deltas etc. to create all-the-information-available about current state of |
75 |
hosts. |
76 |
|
77 |
Cleaner: |
78 |
Run from time to time ( by cron, frequency adjusted to needs). Remove hosts |
79 |
and users that do not send data ( to conserve space) etc. |
80 |
|
81 |
Achiever: |
82 |
Run from time to time ( cron, as needed). Data gathering provides only |
83 |
information about hosts *right now*. Achiever would generate statistics ( |
84 |
like package popularity ( % hosts that installed it)) and store them to make |
85 |
historical data available ( storing all host states history would be |
86 |
extremely excessive). |
87 |
|
88 |
* The one-way part means that there is no easy way to get users emails even |
89 |
if someone gets access to the all data stored on server. The injective part |
90 |
means that no two emails will generate the same output, so no two users will |
91 |
get the same account. Hashes won't work because they are not injective |
92 |
functions but I'm almost sure someone already wrote functions like that. I |
93 |
don't recall any right now, but I'll have plenty of time to look for them or |
94 |
in worst case write one me self ( easy: create asymmetric pair of keys, |
95 |
throw private to /dev/null so none can decrypt it and encrypt emails with |
96 |
public one). |
97 |
|
98 |
** 1 host == 1 data set ( installed packages, arch etc.) |
99 |
|
100 |
|
101 |
|
102 |
I realized I forgot to tell who am I: |
103 |
|
104 |
I'm Joachim live in Poland ( UTC + 1). Study mathematics ( 3rd year). Use |
105 |
Gentoo since 2005 ( as main desktop OS) or 2004 ( first contact). Code since |
106 |
2003 ( training for/ participating in http://www.oi.edu.pl English version |
107 |
available - look in the right top corner) or 2001 ( started to play with |
108 |
Vbasic ). Cannot drink black tea. Right now extremely tired after sleepless |
109 |
weekend ( due to several breakdowns at home). |
110 |
Good night. |