Gentoo Archives: gentoo-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Wed, 11 Feb 2009 10:01:10
Message-Id: 4992A1F4.2040502@gentoo.org
In Reply to: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation by Brian Harring
1 -----BEGIN PGP SIGNED MESSAGE-----
2 Hash: SHA1
3
4 Brian Harring wrote:
5 > On Tue, Feb 10, 2009 at 12:55:51PM -0800, Zac Medico wrote:
6 >> Brian Harring wrote:
7 >>> Frankly, forget compatibility- the current format could stand to die.
8 >>> The repository format is an ever growing mess- leave it as is and
9 >>> work on cutting over to something sane.
10 >> Changing the repository layout is a pretty radical thing to do.
11 >> You're welcome to start a new subject for that if you'd like but I'd
12 >> prefer to keep the scope of this thread focussed on the cache format
13 >> for the existing repository layout.
14
15 I don't intend to repeal the cache mtime requirement, at least
16 (especially) not on gentoo's rsync tree. However, I wouldn't say
17 that it's something that necessarily needs to be a requirement for
18 other repositories or overlays, moving forward (assuming that an
19 alternative validation framework is in place).
20
21 > Vacuous arguement via focusing on the 'layout' part rather then the
22 > repository whole I implied; you're stating that one should not
23 > discuss changing the repository standard/spec while arguing that
24 > repealing the requirement that cache mtime entries match ebuild
25 > mtime (part of the repository spec) should be the point of discussion.
26 >
27 > The daft thing about this is that w/ effectively atomic sync (if the
28 > sync fails then mark the repo as screwed up till a sync completes),
29 > the current cache format can *still* do validation- no clue if
30 > paludis has it, but at least pkgcore and portage can handle this via
31 > awareness of the eclass stacking.
32
33 I want to have a more fault-tolerant solution than that.
34
35 > So for git vcses bundling metadata (a bad idea anyways to be storing
36 > generated content in the mainline vcs), your proposal allows them to
37 > use a cache. For every other distribution mechanism that works fine,
38 > they wind up paying the cost for that corner case. The 80 pays for
39 > the 20 isn't the normal form of the 80/20 rule ;)
40
41 What I'm concerned about is costs in terms of support and usability.
42 When something goes wrong and there's not enough data to detect it,
43 it triggers problems that confuse and annoy users. I want to have
44 the DIGESTS data available so that these sorts of problems are easy
45 to detect and handle appropriately. I think you're being too stingy
46 about disk space.
47
48 > Note that proper PM implementations *still* have to set the cache
49 > entries mtime for backwards compatibility w/ older PMs that don't
50 > support this new unversioned change thus muddying the implementation
51 > even further.
52
53 As said above, I wasn't intending that, at least (especially) not
54 for gentoo's rsync tree. I guess you got that idea from the mention
55 of bug 139134, but you don't need to worry about it.
56
57 > I reiterate, this belongs in a seperate repository format, along w/
58 > the rest of the unversioned repository changes you've been pushing in
59 > (profile package.mask breaking all non portage PMs is a perfect
60 > example).
61
62 The package.mask thing is a separate discussion. Let's do that in a
63 separate thread.
64
65 >>> Overlay maintainers who want the latest/greatest obviously can convert
66 >>> over also; one would hope their would be enough cleanup to make it
67 >>> worth their time.
68 >>>
69 >>> As for the nasty gentoo-x86 compatibility, basically, do the
70 >>> following:
71 >>>
72 >>> 1) maintain the existing cvs repo as is
73 >>> 2) iron out what cleanup/restructuring is desired. glep55 being
74 >>> jammed in here is a potential for example. Nail down the new repo
75 >>> format basically (with an eye for translating the cvs repo to it on
76 >>> the fly).
77 >>> 3) use an eclass index holding the checksums, w/ the cache entries
78 >>> referencing the index numbers rather (sorting the index by
79 >>> consumption, meaning the more ebuilds using it the lower the index):
80 >>> this brings the cache addition down to around 285KB (acceptable imo)
81 >>> while giving full flexibility in the checksums available for eclasses.
82 >>> This is assuming the current flat_list format is still in use in the
83 >>> new repo...
84 >> As previously discussed [2], having shared integrity data (as you
85 >> suggest) has implications in terms of reduced simplicity and robustness.
86 >
87 > The complexity arguement is a white elephant. Rsync is the sole
88 > transport that has atomicity issues; the rest don't (when you check
89 > out from vcs, you get an exact rev effectively). Rsync generation
90 > ought to be preparing the new snapshot then swapping it in, and if I
91 > recall correctly that's exactly what osprey does now (or whatever node
92 > y'all are using for generating gentoo-x86 these days).
93 >
94 > The point there is that there are specific steps taken preparing the
95 > repo- those steps already ensure the snapshot/rev is complete prior to
96 > being available so there isn't real potential of catching it mid
97 > update. Via that existing machinery, a shared index is *no issue*-
98 > the one spot it rears it's head is during a failed/partial sync (the
99 > repo should not be using in such a state since stale cache is the
100 > least concern at that point).
101
102 It's too fragile and the potential usability and support issues are
103 a liability. As said, I want a more fault-tolerant solution than that.
104
105 > From where I'm sitting, the changes you've either slipped in, or want
106 > to slip in to the repository format are completely ignoring the past
107 > bad history of breakage doing such things, and ignoring why EAPI
108 > exists these days.
109
110 > The reason a new format is realistically needed here is that existing
111 > PMS compliant implementations will wind up accessing the repo and
112 > behaving as if everything is fine and dandy without knowing the rules
113 > used to read no longer match the rules used to generate the repo.
114 > This is a no go, same reason EAPI awareness had to sit for a long ass
115 > time to ensure EAPI=1 would be properly masked by eapi aware PMs.
116 >
117 > Either way for changes like this (or package.mask as a directory since
118 > I'm still annoyed by that) the repo needs to be marked in some way, a
119 > versioned format specifically, so that when stuff like this is added
120 > the PM can handle it gracefully instead of doing the wrong thing.
121
122 Again, that's a separate discussion for a different thread.
123
124 - --
125 Thanks,
126 Zac
127 -----BEGIN PGP SIGNATURE-----
128 Version: GnuPG v2.0.9 (GNU/Linux)
129
130 iEYEARECAAYFAkmSoe4ACgkQ/ejvha5XGaMf4wCg8ldvDu/w/FDarHXCVv/HOkvy
131 qQUAn1WGYrqZVdM4nCUjE8bZglGWD9yU
132 =XN/T
133 -----END PGP SIGNATURE-----

Replies

Subject Author
Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Brian Harring <ferringb@×××××.com>