Gentoo Archives: gentoo-portage-dev

From: Emma Strubell <emma.strubell@×××××.com>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Google SoC and "cache sync"
Date: Thu, 02 Apr 2009 17:14:25
Message-Id: 5a8c638a0904021014y2c9a21bdl5297a2c94b6dcf41@mail.gmail.com
In Reply to: Re: [gentoo-portage-dev] Google SoC and "cache sync" by Zac Medico
1 Zac Medico wrote:
2 >
3 > Right. If you wanted to submit a competing "multiple repository
4 > support" soc proposal yourself then you might list the "unavaliable
5 > repository" thing as one of your goals. However, that might be a
6 > little over-ambitious since "multiple repository support" alone
7 > would provide enough work for a soc project.
8
9 ok, I will probably submit a "multiple repository support" proposal,
10 although if that other guy ends up submitting one I'm sure he'll be
11 accepted over me, it looks like he's way more qualified. That's okay
12 though, there's always next year, and just because I'm not doing
13 summer of code doesn't mean I can't look at code over the summer :] I
14 actually might just pass on this year's summer of code, and work some
15 more on that search thing this summer, honing my real-world
16 programming/python skills so that I'll be ready to kick some ass next
17 summer.
18
19 Actually, do you think more work on faster search would be an adequate
20 project for soc? In order for it to actually be a plausible
21 modification to portage I still need to implement regex search and
22 support for overlays. As I realized too soon before I had to submit
23 the project to my prof, integrating regex search into the
24 suffix-tree-like index that I created would require a pretty
25 substantial overhaul of my implementation, if not a separate
26 implementation to deal with regex queries. I'm sure it would be
27 possible though. The biggest problem remaining with my implementation,
28 as I believe I said before, is that cPickle unpickles (and pickles?)
29 the index wayy to slowly. Without a better serialization module my
30 implementation is pretty much useless, but ignoring the time it takes
31 to unpickle the index, my implementation is something like an order of
32 magnitude faster than the current search implementation. That's
33 promising, right? I haven't tried out any other picklers yet so I'm
34 not sure how much of an improvement it might be possible to get. I
35 sure, however, that writing my own superior pickler would be beyond my
36 abilities. Unless serialization is simpler than I think it is, and I
37 could maybe throw together something that is optimized for this
38 specific data structure.
39
40 > Expanding on this a bit... if you were going to pack an ebuild into
41 > a single file, you would need to include the eclasses which it
42 > inherits and also any patches that are included with it in cvs. If
43 > the eclasses are included in this way, each source package will
44 > contain a redundant copy of the inherited eclasses. Despite this
45 > redundancy, you might still have a net decrease in bandwidth usage
46 > since you'd only have to download the source packages that you
47 > actually want to build.
48
49 This is an interesting idea... but I think I agree with you on keeping
50 the ebuild format the way it is. Not only because changing to an
51 rpm-like system would be a an almost fundamental change in the way
52 portage works (ebuild-wise, anyway), but because this would add a
53 whole new level of complication to ebuilds, and I think one of their
54 strengths is their simplicity. Of course ideally one could write a
55 tool that would simply package ebuilds into these "rpms"... but I
56 suspect that would end up being more complicated than it seems. Plus,
57 I don't know about you but back when I used rpm-based distributions, I
58 found that more often than not, rather than simplifying the install
59 process, rpms were just a pain in the ass and didn't work half the
60 time. This was a few years ago and I'm assuming things have changed...
61 but I still think ebuilds are pretty cool and I don't really want to
62 mess with that.
63
64 On Thu, Apr 2, 2009 at 1:38 AM, Zac Medico <zmedico@g.o> wrote:
65 > -----BEGIN PGP SIGNED MESSAGE-----
66 > Hash: SHA1
67 >
68 > Zac Medico wrote:
69 >> Emma Strubell wrote:
70 >>> And to clarify: the goal of the project is to modify portage so that
71 >>> instead of fetching all of the ebuilds in the portage tree (or in an
72 >>> overlay) upon a sync, portage only fetches the metadata and cache info
73 >>> (via the metadata/cache/ directory) of the tree, and the ebuilds of
74 >>> packages that are already installed (packages found in the world
75 >>> file?) And then, additional ebuilds would be fetched only when they
76 >>> are needed?
77 >>
78 >> The problem with fetching the ebuilds separately is that the remote
79 >> repository might have changed. So, it's not a very reliable approach
80 >> unless there is some kind of guarantee that the remote repository
81 >> will provide a window of time during which older ebuilds that have
82 >> already been removed from the main tree can still be downloaded. In
83 >> order to accomplish this, you'd essentially have to devise a new
84 >> source package format which can be downloaded as a single file
85 >> (something like a source rpm file that an rpm based distro would
86 >> provide).
87 >
88 > Expanding on this a bit... if you were going to pack an ebuild into
89 > a single file, you would need to include the eclasses which it
90 > inherits and also any patches that are included with it in cvs. If
91 > the eclasses are included in this way, each source package will
92 > contain a redundant copy of the inherited eclasses. Despite this
93 > redundancy, you might still have a net decrease in bandwidth usage
94 > since you'd only have to download the source packages that you
95 > actually want to build.
96 >
97 > If you are going to implement something like this, I imagine that
98 > you'd create a tool which would pack an ebuild into a source package
99 > and optionally sign it with a digital signature. Source packages
100 > would be uploaded to a server which would serve them along with a
101 > metadata cache file that clients would download for use in
102 > dependency calculations (similar to how $PKGDIR/Packages is
103 > currently used for binary packages).
104 > - --
105 > Thanks,
106 > Zac
107 > -----BEGIN PGP SIGNATURE-----
108 > Version: GnuPG v2.0.11 (GNU/Linux)
109 >
110 > iEYEARECAAYFAknUT0oACgkQ/ejvha5XGaPmugCfVs0I4a15trwTgLnPwBac2xOj
111 > wI0AoInp1Jf6yaYV5rNvU2EXHbZ30AkS
112 > =tNrz
113 > -----END PGP SIGNATURE-----
114 >
115 >

Replies

Subject Author
Re: [gentoo-portage-dev] Google SoC and "cache sync" Zac Medico <zmedico@g.o>