Gentoo Archives: gentoo-dev

From: Brian Harring <ferringb@×××××.com>
To: gentoo-dev@l.g.o
Cc: zmedico@g.o, solar@g.o, ciaran.mccreesh@××××××××××.com, fuzzyray@g.o
Subject: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)
Date: Mon, 26 Oct 2009 01:50:10
First of all, feel free to forward this to anyone who is responsible 
for code pkged in the tree that access the vdb (/var/db/pkg) in some 

The proposal is pretty simple; if code modifies the vdb in any 
fashion, it needs to update the mtime on a file named 
'.modification_time' in the root of the vdb.

For example-

1) ${PACKAGE_MANAGER} fires ups, builds a pkg.  it's now ready to 
install it.
2) this step isn't strictly required, but is a zero cost safety 
measure- prior to modifying the vdb, it updates the timestamp.  The 
reason for doing this is to protect against the manager blowing up in 
some fashion and now updating the timestamp- there still is a window 
if the manager breaks down during merging but it's far reduced.
3) manager does it's thing to the livefs, and to the vdb.
4) once finished, again, updates the timestamp.

This isn't an incredibly complex change.  What it enables however is 
package managers to get serious about optimizing access to the vdb.  
For example for the 3 managers:

 installed-cache currently needs to be manually ran by the user; 
specifically, the user is responsible for regenerating this cache if 
they use a non paludis manager to modify the VDB.  This can be 
automated via checking the vdb timestamp against a stored copy of the 
the vdb timestamp at the time of the cache generation.

 portage maintains a set of denormalized caches of the vdb- it however 
has to do validation of those caches on each access, meaning quite a 
few stats.  Same thing, can compare timestamp from current vdb to when 
it was generated to identify if it is no longer authorative.

 pkgcore maintains a denormalized old style virtuals cache- same thing 
w/ portage, it has to do validation (stat'ing) whenever it uses that 
cache to ensure the data is accurate.  Same thing, can compare 
timestamp from current vdb to whenit was generated to identify if it 
is no longer authorative.

The existing vdb caching could all be modified to use this timestamp.  
One stat in the best (common) case, instead of having to either scan 
the whole vdb each time or doing a subset of stats. 

This change enables further caching/denormalization of the vdb data 
while maintaining the old format- basically, it allows the manager to 
build out a helluva lot faster access to the vdb while keeping on 
disk compatibility in /var/db/pkg.

Now unfortunately since the vdb is not format versioned in any 
fashion, to get this timestamp we have to do the following-

1) nudge everyone who has code poking into the vdb to update their 
code to update the timestamp
2) sit on our hands for N months until such time we've deemed 
"everyone we care about has upgraded"
3) push out a new release, and start pushing out versions of the 
managers/vdb consumers that use this timestamp instead of just 
updating it.

For anyone who has been around gentoo for a couple of years, this is a 
pretty familiar pattern- eapi, profile changes, etc, all go through 
this unfortunately.

That's the core of the proposal; there is a ticket open 
( ) regarding this although there is 
some debate from ciaran which I'll try to now summarize, along w/ the 

1) do a new vdb.
Counter: this mechanism provides a way to synchronize the new vdb 
while maintaining the old during it's transition period, so this is 
needed anyways.  Further, pinning all of our optimization hopes on a 
new vdb is daft- it's been discussed for 5+ years now and still 
hasn't materialized (pkgcore has been able to have a new vdb for 
several years, but without a synchronization mechanism it would 
require locking users into the new format and locking out old 
consumers of the vdb- an unfriendly choice to push on users, hence 
never being implemented).

2) code that hasn't been updated to adjust the timestamp, but is still 
in use after the transition period will break things.
 Counter: nature of any modification of this sort, frankly the gains 
outweight the costs of users being rediculously out of date.  Not 
saying it's perfect, but until someone comes up with a proposal that 
versions every PMS component (meaning PMS has to start documenting 
the VDB), it's what we have if we wish to move forward in 

3) the correct approach is to require users to tell each manager that 
changes have occured outside it's purview (run paludis 
--regenerate-installed-cache after every time you invoke pmerge or 
 Counter: that's rather unfriendly to users, and isn't what 
pkgcore/portage do.  Further, it's historically the opposite of the 
norm- consider the ebuild cache (we do validation as we go there, 
instead of expecting users to do a emerge --regen everytime they 
modify an ebuild).

That's roughly the three points raised; there is some minor quibbling 
that mtime cannot be trusted, but that's mostly a variation of #2.  
Feel free to dig into the bug for exact specifics, or wait for 
ciaran's reply to this post.

So... thoughts?