1 |
On Tue, Feb 10, 2009 at 12:55:51PM -0800, Zac Medico wrote: |
2 |
> Brian Harring wrote: |
3 |
> > On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote: |
4 |
> >> All that I can say right now is that I recall questions about it in |
5 |
> >> the past from overlay maintainers (I don't have a list) and the |
6 |
> >> funtoo project is the only one which I can name offhand. |
7 |
> >> |
8 |
> >> However, the ability to distribute cache via a vcs is only an |
9 |
> >> ancillary feature which is made possible by the DIGESTS data. The |
10 |
> >> DIGESTS data is useful regardless of the protocol that is used to |
11 |
> >> distribute the cache, since it allows the cache to be properly |
12 |
> >> validated for integrity. So, the real primary reason for introducing |
13 |
> >> the DIGESTS data is to provide a proper solution for cases like bug |
14 |
> >> #139134 [1] in which invalid metadata cache goes undetected. |
15 |
> > |
16 |
> > I'm sorry, but this proposal smells something awful. Because of the |
17 |
> > mtime requirement on cache entries you're proposing jamming another |
18 |
> > 1.4MB into the cache for validation purposes (which should be 4x that |
19 |
> > since a full checksum really should be in there) while trying to |
20 |
> > maintain compatibility. |
21 |
> |
22 |
> As I've said before [1], 10 hex digits gives 1.1e12 possible |
23 |
> combinations and that's probably sufficient for the given application. |
24 |
|
25 |
And as I said before, I don't agree with you on it (repeating it over |
26 |
and over isn't going to convince the other side either). |
27 |
|
28 |
The 1.4MB is more the concern then arguments over avalanche I might |
29 |
add. |
30 |
|
31 |
|
32 |
> > Frankly, forget compatibility- the current format could stand to die. |
33 |
> > The repository format is an ever growing mess- leave it as is and |
34 |
> > work on cutting over to something sane. |
35 |
> |
36 |
> Changing the repository layout is a pretty radical thing to do. |
37 |
> You're welcome to start a new subject for that if you'd like but I'd |
38 |
> prefer to keep the scope of this thread focussed on the cache format |
39 |
> for the existing repository layout. |
40 |
|
41 |
Vacuous arguement via focusing on the 'layout' part rather then the |
42 |
repository whole I implied; you're stating that one should not |
43 |
discuss changing the repository standard/spec while arguing that |
44 |
repealing the requirement that cache mtime entries match ebuild |
45 |
mtime (part of the repository spec) should be the point of discussion. |
46 |
|
47 |
The daft thing about this is that w/ effectively atomic sync (if the |
48 |
sync fails then mark the repo as screwed up till a sync completes), |
49 |
the current cache format can *still* do validation- no clue if |
50 |
paludis has it, but at least pkgcore and portage can handle this via |
51 |
awareness of the eclass stacking. |
52 |
|
53 |
So for git vcses bundling metadata (a bad idea anyways to be storing |
54 |
generated content in the mainline vcs), your proposal allows them to |
55 |
use a cache. For every other distribution mechanism that works fine, |
56 |
they wind up paying the cost for that corner case. The 80 pays for |
57 |
the 20 isn't the normal form of the 80/20 rule ;) |
58 |
|
59 |
Note that proper PM implementations *still* have to set the cache |
60 |
entries mtime for backwards compatibility w/ older PMs that don't |
61 |
support this new unversioned change thus muddying the implementation |
62 |
even further. |
63 |
|
64 |
I reiterate, this belongs in a seperate repository format, along w/ |
65 |
the rest of the unversioned repository changes you've been pushing in |
66 |
(profile package.mask breaking all non portage PMs is a perfect |
67 |
example). |
68 |
|
69 |
|
70 |
> > Overlay maintainers who want the latest/greatest obviously can convert |
71 |
> > over also; one would hope their would be enough cleanup to make it |
72 |
> > worth their time. |
73 |
> > |
74 |
> > As for the nasty gentoo-x86 compatibility, basically, do the |
75 |
> > following: |
76 |
> > |
77 |
> > 1) maintain the existing cvs repo as is |
78 |
> > 2) iron out what cleanup/restructuring is desired. glep55 being |
79 |
> > jammed in here is a potential for example. Nail down the new repo |
80 |
> > format basically (with an eye for translating the cvs repo to it on |
81 |
> > the fly). |
82 |
> > 3) use an eclass index holding the checksums, w/ the cache entries |
83 |
> > referencing the index numbers rather (sorting the index by |
84 |
> > consumption, meaning the more ebuilds using it the lower the index): |
85 |
> > this brings the cache addition down to around 285KB (acceptable imo) |
86 |
> > while giving full flexibility in the checksums available for eclasses. |
87 |
> > This is assuming the current flat_list format is still in use in the |
88 |
> > new repo... |
89 |
> |
90 |
> As previously discussed [2], having shared integrity data (as you |
91 |
> suggest) has implications in terms of reduced simplicity and robustness. |
92 |
|
93 |
The complexity arguement is a white elephant. Rsync is the sole |
94 |
transport that has atomicity issues; the rest don't (when you check |
95 |
out from vcs, you get an exact rev effectively). Rsync generation |
96 |
ought to be preparing the new snapshot then swapping it in, and if I |
97 |
recall correctly that's exactly what osprey does now (or whatever node |
98 |
y'all are using for generating gentoo-x86 these days). |
99 |
|
100 |
The point there is that there are specific steps taken preparing the |
101 |
repo- those steps already ensure the snapshot/rev is complete prior to |
102 |
being available so there isn't real potential of catching it mid |
103 |
update. Via that existing machinery, a shared index is *no issue*- |
104 |
the one spot it rears it's head is during a failed/partial sync (the |
105 |
repo should not be using in such a state since stale cache is the |
106 |
least concern at that point). |
107 |
|
108 |
|
109 |
From where I'm sitting, the changes you've either slipped in, or want |
110 |
to slip in to the repository format are completely ignoring the past |
111 |
bad history of breakage doing such things, and ignoring why EAPI |
112 |
exists these days. |
113 |
|
114 |
The reason a new format is realistically needed here is that existing |
115 |
PMS compliant implementations will wind up accessing the repo and |
116 |
behaving as if everything is fine and dandy without knowing the rules |
117 |
used to read no longer match the rules used to generate the repo. |
118 |
This is a no go, same reason EAPI awareness had to sit for a long ass |
119 |
time to ensure EAPI=1 would be properly masked by eapi aware PMs. |
120 |
|
121 |
Either way for changes like this (or package.mask as a directory since |
122 |
I'm still annoyed by that) the repo needs to be marked in some way, a |
123 |
versioned format specifically, so that when stuff like this is added |
124 |
the PM can handle it gracefully instead of doing the wrong thing. |
125 |
|
126 |
~brian |