1 |
-----BEGIN PGP SIGNED MESSAGE----- |
2 |
Hash: SHA1 |
3 |
|
4 |
John Myers wrote: |
5 |
> |
6 |
> I designed a system where it took feedback from consenting users, sending the |
7 |
> file lists back to my server, were I was going to do some data crunching. The |
8 |
> data from just _my_ system was over 60 MB. |
9 |
|
10 |
It sounds like you really only need to index each package a few times at |
11 |
most. Sure, the raw data from a user could be 60MB each, but there are |
12 |
some ways to reduce that significantly: |
13 |
|
14 |
1. Don't send in data for anything in the base system install. |
15 |
|
16 |
2. As you populate your database, publish a list of indexed packages |
17 |
via a URL. Users would exclude any packages you've already indexed. If |
18 |
this were a GLEP you could probably put the file in the portage |
19 |
directory and everybody would get it via rsync. |
20 |
|
21 |
3. Start by only indexing each package ONCE. Don't worry about every |
22 |
combo of arches, CFLAGS, USE, etc. That means that most users wouldn't |
23 |
upload anything at all, and the rest would only send their unique |
24 |
contributions. |
25 |
|
26 |
If you get everything working without indexing by USE, you could start |
27 |
adding that capability in. Publish in #2 the list of USE flags indexed |
28 |
for each package, and individuals would only upload packages compiled |
29 |
with something that wasn't on that list. |
30 |
|
31 |
Sure, the final database could easily be 100MB or so, but if you just |
32 |
put it on a website you won't be sending the whole thing. Just put it |
33 |
in mysql/postgres and build a php front end (sorry, not a web dev |
34 |
personally, but it isn't that hard to do from the little I've messed |
35 |
with it). |
36 |
|
37 |
Sorry - I don't intend to make it sound like the whole thing can be done |
38 |
in 5 minutes, and I"m sure you've already poured hours into your effort. |
39 |
However, I don't see any theoretical issues with it as long as the |
40 |
design is right. The important thing is that users are only uploading |
41 |
diffs against your master repository - and not doing a complete dump of |
42 |
their entire system. Otherwise you will get buried in data! |
43 |
|
44 |
I must admit that it is easy to just talk about ideas like this - I |
45 |
really do want to commend you on the work you've undoubtedly already |
46 |
accomplished! OSS projects require lots of hard work by many volunteers |
47 |
and it is all too easy for people like me to just sit back and nitpick |
48 |
what could be done better... |
49 |
-----BEGIN PGP SIGNATURE----- |
50 |
Version: GnuPG v1.4.1 (GNU/Linux) |
51 |
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org |
52 |
|
53 |
iD8DBQFDXlvkg2bN8aFizRkRArU+AKCnEBdpoO2Acnwh3+FFR8CYj5CLtACcCboB |
54 |
2QIb31yXVdW0EQST8PEUPeY= |
55 |
=VF5P |
56 |
-----END PGP SIGNATURE----- |