Gentoo Archives: gentoo-amd64

From: Richard Freeman <rich@××××××××××××××.net>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] dig package
Date: Tue, 25 Oct 2005 16:34:31
Message-Id: 435E5BE4.9040801@thefreemanclan.net
In Reply to: Re: [gentoo-amd64] dig package by John Myers
1 -----BEGIN PGP SIGNED MESSAGE-----
2 Hash: SHA1
3
4 John Myers wrote:
5 >
6 > I designed a system where it took feedback from consenting users, sending the
7 > file lists back to my server, were I was going to do some data crunching. The
8 > data from just _my_ system was over 60 MB.
9
10 It sounds like you really only need to index each package a few times at
11 most. Sure, the raw data from a user could be 60MB each, but there are
12 some ways to reduce that significantly:
13
14 1. Don't send in data for anything in the base system install.
15
16 2. As you populate your database, publish a list of indexed packages
17 via a URL. Users would exclude any packages you've already indexed. If
18 this were a GLEP you could probably put the file in the portage
19 directory and everybody would get it via rsync.
20
21 3. Start by only indexing each package ONCE. Don't worry about every
22 combo of arches, CFLAGS, USE, etc. That means that most users wouldn't
23 upload anything at all, and the rest would only send their unique
24 contributions.
25
26 If you get everything working without indexing by USE, you could start
27 adding that capability in. Publish in #2 the list of USE flags indexed
28 for each package, and individuals would only upload packages compiled
29 with something that wasn't on that list.
30
31 Sure, the final database could easily be 100MB or so, but if you just
32 put it on a website you won't be sending the whole thing. Just put it
33 in mysql/postgres and build a php front end (sorry, not a web dev
34 personally, but it isn't that hard to do from the little I've messed
35 with it).
36
37 Sorry - I don't intend to make it sound like the whole thing can be done
38 in 5 minutes, and I"m sure you've already poured hours into your effort.
39 However, I don't see any theoretical issues with it as long as the
40 design is right. The important thing is that users are only uploading
41 diffs against your master repository - and not doing a complete dump of
42 their entire system. Otherwise you will get buried in data!
43
44 I must admit that it is easy to just talk about ideas like this - I
45 really do want to commend you on the work you've undoubtedly already
46 accomplished! OSS projects require lots of hard work by many volunteers
47 and it is all too easy for people like me to just sit back and nitpick
48 what could be done better...
49 -----BEGIN PGP SIGNATURE-----
50 Version: GnuPG v1.4.1 (GNU/Linux)
51 Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
52
53 iD8DBQFDXlvkg2bN8aFizRkRArU+AKCnEBdpoO2Acnwh3+FFR8CYj5CLtACcCboB
54 2QIb31yXVdW0EQST8PEUPeY=
55 =VF5P
56 -----END PGP SIGNATURE-----

Attachments

File name MIME type
smime.p7s application/x-pkcs7-signature

Replies

Subject Author
Re: [gentoo-amd64] dig package John Myers <electronerd@××××××××××.com>