1 |
On Tue, Mar 09, 2010 at 10:42:38AM -0800, Zac Medico wrote: |
2 |
> On 03/08/2010 09:21 AM, Robert R. Russell wrote: |
3 |
> > The cache sync project[1] wants a way to generate portage's cache |
4 |
> > on the portage tree and/or any chosen overlay and then distribute |
5 |
> > that cache by some method. Correct? |
6 |
> |
7 |
> Well, you can already do that with the egencache program that's |
8 |
> included with portage. I think the gist of the "cache sync" idea is |
9 |
> that you should be able to download the cache for dependency |
10 |
> calculations, and defer the download of the source package until |
11 |
> after the dependency calculation. For doing something like that, a |
12 |
> portage tree is probably not very suitable since the tree can change |
13 |
> rapidly and the cache may invalidate quickly. If the cache and the |
14 |
> source package will be distributed separately, it might be more |
15 |
> practical to make something like a source RPM that contains an |
16 |
> ebuild and eclasses. Many of these source packages could be |
17 |
> distributed in a repository that is independent of the portage tree, |
18 |
> and it's cache may be valid for a longer period of time. |
19 |
> |
20 |
|
21 |
I did not know about the egencache program. So I got the wrong initial |
22 |
impression of the project's goals, no problem. The goal is much |
23 |
simpler than my initial impression of it was. |
24 |
|
25 |
The worst case change I could see with only a partial copy of the |
26 |
portage tree available locally would be the complete removal of an |
27 |
ebuild between the last sync time and the attempt to that ebuild. By |
28 |
complete removal, I mean the deletion of the ebuild from the tree and |
29 |
removal of the tar-ball from the Gentoo mirror infrastructure. The |
30 |
other common problem would be an incomplete or inaccurate manifest of |
31 |
the ebuild, source tar-balls, and in tree patches. This problem is |
32 |
usually eliminated by re-syncing the tree. So the most 2 likely |
33 |
sources of problems are seen in the wild with the full portage tree |
34 |
and have known work arounds. |
35 |
|
36 |
Talking about source RPMs, do you mean something like a tar-ball of |
37 |
the ebuild with is associated patches, eclasses, and other directly |
38 |
dependent data, but no source code? This ebuild tar-ball is then |
39 |
fetched after dependency calculation is made and it provides the |
40 |
instructions for building, downloading, and installing the package |
41 |
from the source tar-ball. That sounds like a Gentoo style replacement |
42 |
for source RPMs. |
43 |
|
44 |
> |
45 |
> > This project sounds very similar to an idea I have been toying |
46 |
> > around with for a bit, but I have some questions before I apply |
47 |
> > for this project. |
48 |
> > |
49 |
> > How well documented is the current cache format portage uses? |
50 |
> |
51 |
> It's not very well documented. You might try experimenting with the |
52 |
> egencache program to get a feel for how it works. Cache is generated |
53 |
> by sourcing ebuilds, and it's stored in /var/cache/edb/dep. It's |
54 |
> validated by comparing ebuild and eclass timestamps to those that |
55 |
> are saved in the cache entry. After a complete cache entry is |
56 |
> generated for /var/cache/edb/dep, an incomplete cache entry (lacking |
57 |
> eclass timestamps, since the format hasn't been extended to support |
58 |
> them yet) is written into $PORTDIR/metadata/cache. There is a |
59 |
> discussion about extending the format to include eclass digests here: |
60 |
> |
61 |
> http://archives.gentoo.org/gentoo-dev/msg_cfa80e33ee5fa6f854120ddfb9b468b3.xml |
62 |
> |
63 |
> > What restrictions if any would be placed on extending the current cache format? |
64 |
> |
65 |
> It has to be backward compatible. If we want to change the format in |
66 |
> a backward incompatible way, for example by combining the whole |
67 |
> cache into a single text file, we'll have to distribute both formats |
68 |
> until users have had time to migrate to a package manager that |
69 |
> supports the new format. |
70 |
> |
71 |
|
72 |
I think that any change to support the ebuild tar-ball format would |
73 |
require the inclusion of some sort of cryptographic hash of the ebuild |
74 |
tar-ball into the cache format. Another solution might be distributing |
75 |
a large pile of public key signatures with the cache and then |
76 |
validating the signature of the ebuild tar-ball. With the exception of |
77 |
package manager support the cryptographic signature method is probably |
78 |
the least intrusive method. Well is at first glance. I might change my |
79 |
mind on that. |
80 |
|
81 |
> |
82 |
> > How well documented is the ebuild file format? |
83 |
> |
84 |
> It's pretty well documented by PMS. You can get that by installing |
85 |
> app-doc/pms. For something that's much shorter and less |
86 |
> comprehensive, there's the `man 5 ebuild`. |
87 |
> |
88 |
> > How much of the ebuild is essential for portage to create a valid |
89 |
> > cache entry? |
90 |
> |
91 |
> The whole ebuild and any eclasses that it inherits. |
92 |
> |
93 |
> > How stable and well documented is the format of the cache |
94 |
> > essential pieces of an ebuild? |
95 |
> |
96 |
> It's very stable because it has to be backward compatible. Breaking |
97 |
> compatibility would be a sever problem because dependency |
98 |
> calculations are very slow unless there is a valid/compatible cache |
99 |
> available. |
100 |
> |
101 |
|
102 |
I think that keeping the slim tree package manager cache format |
103 |
compatible with the full tree package manager cache format is not |
104 |
going to be easy. Mainly because of the amount of new data needed in |
105 |
the slim tree variant of the cache. |
106 |
|
107 |
stuff like: |
108 |
1. Repository -- Is this cache information from the main tree or |
109 |
from an overlay? |
110 |
2. Hash or signature of the ebuild tar-ball -- How do I validate |
111 |
whether the tar-ball I downloaded is Ok. |
112 |
3. Tags -- If the tags soc project is accepted then they will need |
113 |
to be cached for searching as well. |
114 |
4. Speed improvements -- Any required changes that can improve the |
115 |
performance of searches and the like. |
116 |
5. Change tracking -- Any cache format for a slim tree will need to |
117 |
be able to update from one revision to another easily and with |
118 |
as little bandwidth as reasonably possible. |
119 |
|
120 |
> |
121 |
> > Is there any previous work on this or a project that might overlap |
122 |
> > with this project? Such as, an attempt at a new parser for portage. |
123 |
> |
124 |
> I know that Mounir Lamouri (volkmar@gentoo) has been thinking about |
125 |
> a new cache format that will use a single file for the whole cache. |
126 |
> |
127 |
> > Will there be mandatory discussion between the person doing this |
128 |
> > project and the person doing the tags support project? |
129 |
> |
130 |
> Tags are a separate project. |
131 |
> |
132 |
> > Is improving the performance of the cache and/or search feature |
133 |
> > a mandatory goal of this project? |
134 |
> |
135 |
> Well, the cache should probably all go in a single file, and that |
136 |
> will probably improve performance because generally it's faster to |
137 |
> load one big file than a bunch of small files. |
138 |
> |
139 |
> > Thank you. |
140 |
> > |
141 |
> > [1] http://en.gentoo-wiki.com/wiki/Google_Summer_of_Code_2010_ideas#Cache_sync |
142 |
> > |
143 |
> -- |
144 |
> Thanks, |
145 |
> Zac |
146 |
> |
147 |
|
148 |
Thank you for the information and I will ponder it for a little bit |
149 |
and look at some different design angles. |