1 |
First of all, feel free to forward this to anyone who is responsible |
2 |
for code pkged in the tree that access the vdb (/var/db/pkg) in some |
3 |
fashion. |
4 |
|
5 |
The proposal is pretty simple; if code modifies the vdb in any |
6 |
fashion, it needs to update the mtime on a file named |
7 |
'.modification_time' in the root of the vdb. |
8 |
|
9 |
For example- |
10 |
|
11 |
1) ${PACKAGE_MANAGER} fires ups, builds a pkg. it's now ready to |
12 |
install it. |
13 |
2) this step isn't strictly required, but is a zero cost safety |
14 |
measure- prior to modifying the vdb, it updates the timestamp. The |
15 |
reason for doing this is to protect against the manager blowing up in |
16 |
some fashion and now updating the timestamp- there still is a window |
17 |
if the manager breaks down during merging but it's far reduced. |
18 |
3) manager does it's thing to the livefs, and to the vdb. |
19 |
4) once finished, again, updates the timestamp. |
20 |
|
21 |
This isn't an incredibly complex change. What it enables however is |
22 |
package managers to get serious about optimizing access to the vdb. |
23 |
For example for the 3 managers: |
24 |
|
25 |
paludis: |
26 |
installed-cache currently needs to be manually ran by the user; |
27 |
specifically, the user is responsible for regenerating this cache if |
28 |
they use a non paludis manager to modify the VDB. This can be |
29 |
automated via checking the vdb timestamp against a stored copy of the |
30 |
the vdb timestamp at the time of the cache generation. |
31 |
|
32 |
portage: |
33 |
portage maintains a set of denormalized caches of the vdb- it however |
34 |
has to do validation of those caches on each access, meaning quite a |
35 |
few stats. Same thing, can compare timestamp from current vdb to when |
36 |
it was generated to identify if it is no longer authorative. |
37 |
|
38 |
pkgcore: |
39 |
pkgcore maintains a denormalized old style virtuals cache- same thing |
40 |
w/ portage, it has to do validation (stat'ing) whenever it uses that |
41 |
cache to ensure the data is accurate. Same thing, can compare |
42 |
timestamp from current vdb to whenit was generated to identify if it |
43 |
is no longer authorative. |
44 |
|
45 |
The existing vdb caching could all be modified to use this timestamp. |
46 |
One stat in the best (common) case, instead of having to either scan |
47 |
the whole vdb each time or doing a subset of stats. |
48 |
|
49 |
This change enables further caching/denormalization of the vdb data |
50 |
while maintaining the old format- basically, it allows the manager to |
51 |
build out a helluva lot faster access to the vdb while keeping on |
52 |
disk compatibility in /var/db/pkg. |
53 |
|
54 |
|
55 |
Now unfortunately since the vdb is not format versioned in any |
56 |
fashion, to get this timestamp we have to do the following- |
57 |
|
58 |
1) nudge everyone who has code poking into the vdb to update their |
59 |
code to update the timestamp |
60 |
2) sit on our hands for N months until such time we've deemed |
61 |
"everyone we care about has upgraded" |
62 |
3) push out a new release, and start pushing out versions of the |
63 |
managers/vdb consumers that use this timestamp instead of just |
64 |
updating it. |
65 |
|
66 |
For anyone who has been around gentoo for a couple of years, this is a |
67 |
pretty familiar pattern- eapi, profile changes, etc, all go through |
68 |
this unfortunately. |
69 |
|
70 |
|
71 |
That's the core of the proposal; there is a ticket open |
72 |
( http://bugs.gentoo.org/290428 ) regarding this although there is |
73 |
some debate from ciaran which I'll try to now summarize, along w/ the |
74 |
counterarguments. |
75 |
|
76 |
1) do a new vdb. |
77 |
Counter: this mechanism provides a way to synchronize the new vdb |
78 |
while maintaining the old during it's transition period, so this is |
79 |
needed anyways. Further, pinning all of our optimization hopes on a |
80 |
new vdb is daft- it's been discussed for 5+ years now and still |
81 |
hasn't materialized (pkgcore has been able to have a new vdb for |
82 |
several years, but without a synchronization mechanism it would |
83 |
require locking users into the new format and locking out old |
84 |
consumers of the vdb- an unfriendly choice to push on users, hence |
85 |
never being implemented). |
86 |
|
87 |
2) code that hasn't been updated to adjust the timestamp, but is still |
88 |
in use after the transition period will break things. |
89 |
Counter: nature of any modification of this sort, frankly the gains |
90 |
outweight the costs of users being rediculously out of date. Not |
91 |
saying it's perfect, but until someone comes up with a proposal that |
92 |
versions every PMS component (meaning PMS has to start documenting |
93 |
the VDB), it's what we have if we wish to move forward in |
94 |
refactoring. |
95 |
|
96 |
3) the correct approach is to require users to tell each manager that |
97 |
changes have occured outside it's purview (run paludis |
98 |
--regenerate-installed-cache after every time you invoke pmerge or |
99 |
emerge). |
100 |
Counter: that's rather unfriendly to users, and isn't what |
101 |
pkgcore/portage do. Further, it's historically the opposite of the |
102 |
norm- consider the ebuild cache (we do validation as we go there, |
103 |
instead of expecting users to do a emerge --regen everytime they |
104 |
modify an ebuild). |
105 |
|
106 |
|
107 |
That's roughly the three points raised; there is some minor quibbling |
108 |
that mtime cannot be trusted, but that's mostly a variation of #2. |
109 |
Feel free to dig into the bug for exact specifics, or wait for |
110 |
ciaran's reply to this post. |
111 |
|
112 |
So... thoughts? |
113 |
~harring |