Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Tue, 10 Feb 2009 12:21:21
Message-Id: 20090210122046.GD4076@hrair
In Reply to: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation by Zac Medico
1 On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote:
2 > All that I can say right now is that I recall questions about it in
3 > the past from overlay maintainers (I don't have a list) and the
4 > funtoo project is the only one which I can name offhand.
5 >
6 > However, the ability to distribute cache via a vcs is only an
7 > ancillary feature which is made possible by the DIGESTS data. The
8 > DIGESTS data is useful regardless of the protocol that is used to
9 > distribute the cache, since it allows the cache to be properly
10 > validated for integrity. So, the real primary reason for introducing
11 > the DIGESTS data is to provide a proper solution for cases like bug
12 > #139134 [1] in which invalid metadata cache goes undetected.
13
14 I'm sorry, but this proposal smells something awful. Because of the
15 mtime requirement on cache entries you're proposing jamming another
16 1.4MB into the cache for validation purposes (which should be 4x that
17 since a full checksum really should be in there) while trying to
18 maintain compatibility.
19
20 Frankly, forget compatibility- the current format could stand to die.
21 The repository format is an ever growing mess- leave it as is and
22 work on cutting over to something sane.
23
24 Overlay maintainers who want the latest/greatest obviously can convert
25 over also; one would hope their would be enough cleanup to make it
26 worth their time.
27
28 As for the nasty gentoo-x86 compatibility, basically, do the
29 following:
30
31 1) maintain the existing cvs repo as is
32 2) iron out what cleanup/restructuring is desired. glep55 being
33 jammed in here is a potential for example. Nail down the new repo
34 format basically (with an eye for translating the cvs repo to it on
35 the fly).
36 3) use an eclass index holding the checksums, w/ the cache entries
37 referencing the index numbers rather (sorting the index by
38 consumption, meaning the more ebuilds using it the lower the index):
39 this brings the cache addition down to around 285KB (acceptable imo)
40 while giving full flexibility in the checksums available for eclasses.
41 This is assuming the current flat_list format is still in use in the
42 new repo...
43 4) drop mtime on cache entries, bump it forward whenever it's updated
44 (bug 139134 goes away) jamming in an ebuild checksum of some sort.
45 5) rsync nodes are required to have 10GB of storage available- so
46 storage shouldn't be an issue, but ensuring all nodes have been
47 updated to sync both the old and *new* format is required.
48 6) suffer through cvs for a year (or whatever time frame), converting
49 folks over to the new url.
50 7) kill the old format after whatever period deemed best (potentially
51 leaving a README telling folks how to update if they're seriously
52 behind).
53 8) convert the cvs repo to the new format, tear down the
54 transformation bits.
55
56 Yes, the plan above is coarse- there aren't any glaring holes as far
57 as I can see however. It does place restrictions on the repo format
58 choosen, but careful choices in the new format (heavy format
59 versioning) should make it possible to make this sort of issue less
60 of a pain down the line.
61
62
63 At the very least, doing a different repo format for repos/overlays
64 stored in a vcs that doesn't track mtime would solve their issues- it
65 also has the nice benefit of not making the repo more bloated for the
66 99% of folk who didn't even hit the issues spawning this.
67
68 If gentoo-x86 is left as is, bug 139134 can be head off w/out jamming
69 a new metadata key in; to be clear, I'm likely going to "Special Hell"
70 for suggesting this but if mtime/size on the new cache entry is the
71 same size as old, append a space to the value in the description
72 field.
73
74 All sane managers ought to be doing basic clean up of that value
75 anyways in their data layer (let alone at the UI level), but it's
76 enough to make rsync behave.
77
78 So... flame away.
79
80 ~brian

Replies

Subject Author
Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Nirbheek Chauhan <nirbheek.chauhan@×××××.com>
Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Zac Medico <zmedico@g.o>