Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Google SoC and "cache sync"
Date: Thu, 02 Apr 2009 04:06:22
Message-Id: 49D43992.4050200@gentoo.org
In Reply to: Re: [gentoo-portage-dev] Google SoC and "cache sync" by Emma Strubell
1 -----BEGIN PGP SIGNED MESSAGE-----
2 Hash: SHA1
3
4 Emma Strubell wrote:
5 > Zac Medico wrote:
6 >> The way that I imagine the "cache sync" idea should be implemented
7 >> is like paludis's "unavailable repository" which uses of tarball to
8 >> distribute package metadata[1]. The tarball approach that they use
9 >> seems pretty reasonable. However, it would probably also be nice to
10 >> be able to use a protocol such as rsync to download the
11 >> metadata/cache/ directory from the same URI which is used to fetch
12 >> the ebuilds themselves (maybe paludis supports this already, I don't
13 >> know).
14 >
15 > You're offering two different ideas here, right? The "unavailable
16 > repository" method, and the method using the metadata/cache/
17 > directory?
18
19 My idea was that both methods could be used interchangeably. This
20 would allow you to configure an "unavailable repository" using any
21 repository which provides a metadata/cache/ directory. Such
22 repositories aren't common now, but they should become more common
23 in the future, after people start using the egencache program that
24 I'm working on.
25
26 > If so, it makes sense to me to take the metadata/cache/ directory
27 > route, since, as you said, multiple repositories aren't yet supported
28 > in portage. At first I was thinking I could contact the guy who might
29 > be working on multiple repository support this summer and work with
30 > him to some extent... but the "unavaliable repository" solution would
31 > basically be dependent on/building off of multiple repository support,
32 > and it seems like building off of something that isn't fully built
33 > would be a bad idea.
34
35 Right. If you wanted to submit a competing "multiple repository
36 support" soc proposal yourself then you might list the "unavaliable
37 repository" thing as one of your goals. However, that might be a
38 little over-ambitious since "multiple repository support" alone
39 would provide enough work for a soc project.
40
41 > And to clarify: the goal of the project is to modify portage so that
42 > instead of fetching all of the ebuilds in the portage tree (or in an
43 > overlay) upon a sync, portage only fetches the metadata and cache info
44 > (via the metadata/cache/ directory) of the tree, and the ebuilds of
45 > packages that are already installed (packages found in the world
46 > file?) And then, additional ebuilds would be fetched only when they
47 > are needed?
48
49 The problem with fetching the ebuilds separately is that the remote
50 repository might have changed. So, it's not a very reliable approach
51 unless there is some kind of guarantee that the remote repository
52 will provide a window of time during which older ebuilds that have
53 already been removed from the main tree can still be downloaded. In
54 order to accomplish this, you'd essentially have to devise a new
55 source package format which can be downloaded as a single file
56 (something like a source rpm file that an rpm based distro would
57 provide). It would be a major change in the way that our source
58 packages are distributed and I'm not sure that it's really worth the
59 trouble (I touched on this in a reply on the soc list [1]). I tend
60 to think that it's better to simply keep the ebuild format as it is,
61 and sync the ebuilds at the same time as the cache. Note that our
62 binary package format is already suitable for the "download cache
63 first and package later" approach. A metadata cache is automatically
64 generated and saved as $PKGDIR/Packages. If you share $PKGDIR via
65 via http or ftp then clients can configure PORTAGE_BINHOST in
66 make.conf to download packages when emerge's --getbinpkg option is
67 specified.
68
69 > Or will only metadata/cache/ be fetched upon sync, and
70 > then all ebuilds will be fetched only when they are needed? Am I
71 > completely oversimplifying the project?
72
73 As mentioned above, the current ebuild format and the way that
74 ebuilds are distributed is not suited to the "download cache first
75 and package later" approach since the ebuild might have been removed
76 before it could be downloaded. However, the current ebuild format is
77 suitable for the "unavaliable repository" approach since that
78 approach only gives you a preview of the available packages and does
79 not allow you to create a real build/install plan for them until
80 you've done a full sync (full in the sense that the ebuilds
81 themselves are downloaded, not just the cache).
82
83 [1]
84 http://archives.gentoo.org/gentoo-soc/msg_8bb95057c99c45d8cceafcf4876f0976.xml
85 - --
86 Thanks,
87 Zac
88 -----BEGIN PGP SIGNATURE-----
89 Version: GnuPG v2.0.11 (GNU/Linux)
90
91 iEYEARECAAYFAknUOZEACgkQ/ejvha5XGaMnqgCg2bxQHlkpyJvqWMfPwSMuD/Y0
92 gSoAoL5iYs1jUN5GZHs+0RXzt4zBc7OP
93 =sWMP
94 -----END PGP SIGNATURE-----

Replies

Subject Author
Re: [gentoo-portage-dev] Google SoC and "cache sync" Zac Medico <zmedico@g.o>