Gentoo Archives: gentoo-dev

From: Ed Grimm <paranoid@××××××××××××××××××××××.org>
To:
Cc: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] A few modest suggestions regarding tree size
Date: Fri, 15 Oct 2004 00:42:19
Message-Id: Pine.LNX.4.58.0410141939150.21079@ybec.rq.iarg
In Reply to: Re: [gentoo-dev] A few modest suggestions regarding tree size by Luke-Jr
1 On Thu, 14 Oct 2004, Luke-Jr wrote:
2 > On Thursday 14 October 2004 4:35 pm, Roman Gaufman wrote:
3 >> On Thu, 14 Oct 2004 16:30:29 +0000, Luke-Jr <luke-jr@×××××××.org> wrote:
4 >>> On Thursday 14 October 2004 2:49 pm, Ciaran McCreesh wrote:
5 >>>> On Thu, 14 Oct 2004 07:43:11 -0700 Mark Dierolf <mark@×××.com> wrote:
6 >>>>| I've been watching this discussion as far as tree size, and i'm
7 >>>>| suprised nobody has brought the idea of on-demand downloading yet.
8
9 You should've watched closer. I did not mention true on-demand
10 downloading, because of having seen in the archive the last time it was
11 discussed and dismissed. I think that what I proposed would probably be
12 quicker than downloading the metadata, it doesn't deviate from the
13 concepts that have already made it into code so widely, and it hasn't
14 been rejected four times yet.
15
16 To reitterate, my idea was that you're probably most interested in the
17 packages you've already installed; so have an option to just sync
18 particular files, to complement the option of not syncing particular
19 files.
20
21 >> Huh? -- name 1 binary distribution that does that? -- all of the ones
22 >> I tried fetch a list of available packages -- which is exactly what
23 >> the portage tree provides.
24 >
25 > Why would they need a list of available packages? Such a list is
26 > useful *only* to the user. apt-get, ipkg, and urpmi are going to know
27 > the package name beforehand.
28
29 How do these programs accomplish that? They request a list of available
30 packages.
31
32 >>> On Thursday 14 October 2004 3:14 pm, Patrick Lauer wrote:
33 >>>> So you only have to rsync the dependency info. You save maybe 50%
34 >>>> traffic, but need some ebuild servers that will be hit by millions
35 >>>> of small requests for single ebuilds. No thanks.
36 >>>
37 >>> Actually, you don't even need to sync that. Simply download the
38 >>> primary ebuild, read the dep info, download the next one, etc. Most
39 >>> modern versions of file transfer protocols (HTTP and FTP, at least;
40 >>> don't know about rsync) support multiple transfers in a single
41 >>> connection.
42 >>
43 >> How would it know what ebuild to fetch exactly? --- just think about
44 >> that for a second.
45
46 The metadata files list dependancies, keywords, a description. It would
47 be technically feasible to do the dependancy evaluation and ebuild
48 selection for the entire ebuild session just using metadata, and have a
49 single medium rsync connection per emerge run. However, I couldn't code
50 it in Python, and I can't really explain it in English.
51
52 > ebuild doesn't deal with dependencys anyway, AFAIK. emerge would need
53 > the fetching functionality and could figure out the name based on
54 > (originally) the user's specification and (for deps) the DEPEND
55 > contents themselves. Portage *already* needs to know what the name of
56 > the package is anyway.
57
58 ebuild files are the ultimate source of the dependancy information. The
59 point on your side is that they're not the sole repository of same;
60 someone saw fit to export that data into cache files, so one could use
61 those cache files for your goal.
62
63 > On Thursday 14 October 2004 4:41 pm, Georgi Georgiev wrote:
64 >> the part where the http and ftp internals get handled by portage
65 >> internally, instead of handling them to an external program like
66 >> wget, are the reason why the idea was dismissed as unworkable several
67 >> times before.
68 >
69 > Not really a good excuse. HTTP isn't an overly complicated protocol.
70 > Including the fetching functionality also has other advantages, such
71 > as one less program to depend on (and thus one fewer that can be
72 > broken and screw up Portage).
73
74 I think the part that probably intimidates them is where we're
75 processing a particular list of stuff, and then we decide we want to get
76 more stuff. This basically requires explicit threading to pull it off
77 properly; it also requires a mindset that can deal with threading. As
78 someone with such a mindset, I can confidently say, no one writes that
79 kind of code without good cause. As an example, email servers could
80 definitely use this type of code, but most of them, including sendmail,
81 do not use it.
82
83
84 Luke, do you have the coding ability to write the changes that would be
85 required to get something like this to work? I ask, because I think
86 what would be needed for you to convince anyone would be a proof of
87 concept, which made at most one connection to a mirror. Until you have
88 such a thing, the prior ideas that have been discussed (which, despite
89 my having found the previous discussion, I did not find, as that was
90 another, "this has been discussed before" dismissal) are much firmer in
91 their minds than anything you are presenting, and I don't think you're
92 going to overcome that.
93
94 In any event, I think that you and I, and anyone else interested in
95 having this happen should get together off the list, outside of gentoo
96 discussion space. The idea is only partially formed, and none of the
97 devs are going to be convinced by anything less than a full plan that
98 addresses all of their concerns, although I think a working prototype
99 would be better. (You may think your idea is complete, but it could not
100 be coded simply on the ideas that have been discussed on this list over
101 the past couple of weeks. What we need is something thorough enough to
102 both build the code and demonstrate to all that it won't make the
103 infrastructure hurt. By the way, the only way to do that is to prove
104 that it will actually reduce infrastructure load.)
105
106 Ed
107
108 --
109 gentoo-dev@g.o mailing list