Gentoo Archives: gentoo-dev

From: Patrick Lauer <patrick@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] [RFC] Overlays and Metadata Cache
Date: Sat, 20 Jun 2009 16:46:42
Message-Id: 200906201846.33981.patrick@gentoo.org
1 Hello everybody,
2
3 those of us using overlays might have noticed that they can seriously slow
4 down dependency calculation. This is mostly because of the lack of a metadata
5 cache.
6 For overlay maintainers providing a metadata cache is quite tricky because to
7 be really consistent and useful it'd have to be regenerated after every
8 commit. That's quite easy to forget or get wrong.
9
10 So I sat down, brained some thoughts and played around a bit. Here's what I
11 came up with:
12
13 * server-side each overlay is checked out
14 * for every overlay in our list:
15 - we add it to make.conf explicitly (avoids any spillover effects)
16 - we let egencache generate a metadata cache for that repository
17 * we rsync the repositories with metadata to a different directory
18
19 The last step is just there to get rid of all the "unneeded" data like .svn
20 directories and can be used to selectively exclude other data that is in the
21 repo but not needed for end-users. Plus it reduces inconsistent data when a
22 client copies the data while the metadata cache is being generated.
23
24 egencache creates the per-repository cache in metadata/cache, so it is nicely
25 bundled and won't interfere with anything else.
26
27 So now we have all repositories, with metadata, in one place. We can start an
28 rsync daemon sharing the parent directory. For users this makes things easier
29 - instead of needind cvs, svn, git, darcs, hg, etc. etc. they only need rsync
30 (which they already have installed!)
31
32 Layman gets easier too - it just needs to understand the rsync protocol and
33 select the right directory(s).
34
35 The only issue I have found with this idea relates to eclasses - overriding
36 in-tree eclasses to be precise. The problem there is that it invalidates in-
37 tree metadata and potentially affects other overlays too. So that's a bit of a
38 bummer, but then I wonder how common that case is.
39
40 For performance, the difference is noticeable. As a very rough pointer it
41 takes me ~15 minutes for "emerge -puNDv world" with three overlays and no
42 metadata cache and about 75 seconds with metadata cache. That's of course a
43 "worst case" scenario.
44
45 Generating the metadata cache isn't that expensive - it took about 45 minutes
46 to initially check out almost everything layman provided and then about an
47 hour for the first run. Consecutive runs should be much faster and can be run
48 in parallel per overlay (at least in theory). So unless I missed something
49 really big really obvious it should be "small enough" to be run every hour or
50 even faster.
51
52 Advantages are:
53 - less deps for layman (if it is adapted)
54 - less complexity client-side
55 - faster sync performance - especially svn and git transfer way too much, the
56 initial checkout of one overlay was >35M data for a few dozen ebuilds
57 - less load server-side. Rsync is easy to replicate and relatively cheap.
58 Popular overlays will appreciate the reduced traffic :)
59 - faster dependency calculation
60 and a few I have already forgotten.
61
62 Disadvantages are:
63 - syncing the main tree can invalidate most of the metadata cache (changed
64 eclasses etc), so you need to sync the overlays at the same time
65 - the eclass override situation I mentioned earlier
66 - slower update time (right now users can checkout immediately after a commit,
67 with this indirection it'd be 30min+ delay)
68
69 If I don't get distracted I might set up a proof of concept public rsync
70 server providing the main repo plus all overlays I can throw in, but it'd have
71 a low initial update frequency (6h to daily).
72
73 Your thoughts, opinions and other input is appreciated.
74
75 Patrick

Replies

Subject Author
Re: [gentoo-dev] [RFC] Overlays and Metadata Cache Fabian Groffen <grobian@g.o>
Re: [gentoo-dev] [RFC] Overlays and Metadata Cache Zac Medico <zmedico@g.o>
Re: [gentoo-dev] [RFC] Overlays and Metadata Cache Ciaran McCreesh <ciaran.mccreesh@××××××××××.com>