Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: gentoo-dev@l.g.o
Cc: zmedico@g.o, solar@g.o, ciaran.mccreesh@××××××××××.com, fuzzyray@g.o
Subject: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)
Date: Mon, 26 Oct 2009 01:50:10
Message-Id: 20091026015005.GA12250@hrair.hsd1.ca.comcast.net
1 First of all, feel free to forward this to anyone who is responsible
2 for code pkged in the tree that access the vdb (/var/db/pkg) in some
3 fashion.
4
5 The proposal is pretty simple; if code modifies the vdb in any
6 fashion, it needs to update the mtime on a file named
7 '.modification_time' in the root of the vdb.
8
9 For example-
10
11 1) ${PACKAGE_MANAGER} fires ups, builds a pkg. it's now ready to
12 install it.
13 2) this step isn't strictly required, but is a zero cost safety
14 measure- prior to modifying the vdb, it updates the timestamp. The
15 reason for doing this is to protect against the manager blowing up in
16 some fashion and now updating the timestamp- there still is a window
17 if the manager breaks down during merging but it's far reduced.
18 3) manager does it's thing to the livefs, and to the vdb.
19 4) once finished, again, updates the timestamp.
20
21 This isn't an incredibly complex change. What it enables however is
22 package managers to get serious about optimizing access to the vdb.
23 For example for the 3 managers:
24
25 paludis:
26 installed-cache currently needs to be manually ran by the user;
27 specifically, the user is responsible for regenerating this cache if
28 they use a non paludis manager to modify the VDB. This can be
29 automated via checking the vdb timestamp against a stored copy of the
30 the vdb timestamp at the time of the cache generation.
31
32 portage:
33 portage maintains a set of denormalized caches of the vdb- it however
34 has to do validation of those caches on each access, meaning quite a
35 few stats. Same thing, can compare timestamp from current vdb to when
36 it was generated to identify if it is no longer authorative.
37
38 pkgcore:
39 pkgcore maintains a denormalized old style virtuals cache- same thing
40 w/ portage, it has to do validation (stat'ing) whenever it uses that
41 cache to ensure the data is accurate. Same thing, can compare
42 timestamp from current vdb to whenit was generated to identify if it
43 is no longer authorative.
44
45 The existing vdb caching could all be modified to use this timestamp.
46 One stat in the best (common) case, instead of having to either scan
47 the whole vdb each time or doing a subset of stats.
48
49 This change enables further caching/denormalization of the vdb data
50 while maintaining the old format- basically, it allows the manager to
51 build out a helluva lot faster access to the vdb while keeping on
52 disk compatibility in /var/db/pkg.
53
54
55 Now unfortunately since the vdb is not format versioned in any
56 fashion, to get this timestamp we have to do the following-
57
58 1) nudge everyone who has code poking into the vdb to update their
59 code to update the timestamp
60 2) sit on our hands for N months until such time we've deemed
61 "everyone we care about has upgraded"
62 3) push out a new release, and start pushing out versions of the
63 managers/vdb consumers that use this timestamp instead of just
64 updating it.
65
66 For anyone who has been around gentoo for a couple of years, this is a
67 pretty familiar pattern- eapi, profile changes, etc, all go through
68 this unfortunately.
69
70
71 That's the core of the proposal; there is a ticket open
72 ( http://bugs.gentoo.org/290428 ) regarding this although there is
73 some debate from ciaran which I'll try to now summarize, along w/ the
74 counterarguments.
75
76 1) do a new vdb.
77 Counter: this mechanism provides a way to synchronize the new vdb
78 while maintaining the old during it's transition period, so this is
79 needed anyways. Further, pinning all of our optimization hopes on a
80 new vdb is daft- it's been discussed for 5+ years now and still
81 hasn't materialized (pkgcore has been able to have a new vdb for
82 several years, but without a synchronization mechanism it would
83 require locking users into the new format and locking out old
84 consumers of the vdb- an unfriendly choice to push on users, hence
85 never being implemented).
86
87 2) code that hasn't been updated to adjust the timestamp, but is still
88 in use after the transition period will break things.
89 Counter: nature of any modification of this sort, frankly the gains
90 outweight the costs of users being rediculously out of date. Not
91 saying it's perfect, but until someone comes up with a proposal that
92 versions every PMS component (meaning PMS has to start documenting
93 the VDB), it's what we have if we wish to move forward in
94 refactoring.
95
96 3) the correct approach is to require users to tell each manager that
97 changes have occured outside it's purview (run paludis
98 --regenerate-installed-cache after every time you invoke pmerge or
99 emerge).
100 Counter: that's rather unfriendly to users, and isn't what
101 pkgcore/portage do. Further, it's historically the opposite of the
102 norm- consider the ebuild cache (we do validation as we go there,
103 instead of expecting users to do a emerge --regen everytime they
104 modify an ebuild).
105
106
107 That's roughly the three points raised; there is some minor quibbling
108 that mtime cannot be trusted, but that's mostly a variation of #2.
109 Feel free to dig into the bug for exact specifics, or wait for
110 ciaran's reply to this post.
111
112 So... thoughts?
113 ~harring

Replies