Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: Tim Harder <radhermit@g.o>
Cc: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it)
Date: Mon, 15 Sep 2014 07:23:02
Message-Id: 20140915091935.42dd26da@pomiot.lan
In Reply to: Re: [gentoo-dev] My masterplan for git migration (+ looking for infra to test it) by Tim Harder
1 Dnia 2014-09-14, o godz. 21:30:36
2 Tim Harder <radhermit@g.o> napisał(a):
3
4 > On 2014-09-14 10:46, Michał Górny wrote:
5 > > Dnia 2014-09-14, o godz. 15:40:06
6 > > Davide Pesavento <pesa@g.o> napisał(a):
7 > > > How long does the md5-cache regeneration process take? Are you sure it
8 > > > will be able to keep up with the rate of pushes to the repo during
9 > > > "peak hours"? If not, maybe we could use a time-based thing similar to
10 > > > the current cvs->rsync synchronization.
11 > >
12 > > This strongly depends on how much data is there to update. A few
13 > > ebuilds are quite fast, eclass change isn't ;). I was thinking of
14 > > something along the lines of, in pseudo-code speaking:
15 > >
16 > > systemctl restart cache-regen
17 > >
18 > > That is, we start the regen on every update. If it finishes in time, it
19 > > commits the new metadata. If another update occurs during regen, we
20 > > just restart it to let it catch the new data.
21 > >
22 > > Of course, if we can't spare the resources to do intermediate updates,
23 > > we may as well switch to cron-based update method.
24 >
25 > I don't see per push metadata regen working entirely well in this case
26 > if this is the only way we're generating the metadata cache for users to
27 > sync. It's easy to imagine a plausible situation where a widely used
28 > eclass change is made followed by commits less than a minute apart (or
29 > shorter than however long it would take for metadata regen to occur) for
30 > at least 30 minutes (rsync refresh period for most user-facing mirrors)
31 > during a time of high activity.
32
33 For a metadata recheck (that is, egencache run with no changes):
34
35 a. cold cache ext4:
36
37 real 3m54.321s
38 user 0m44.413s
39 sys 0m13.497s
40
41 b. warm cache ext4:
42
43 real 0m40.672s
44 user 0m35.087s
45 sys 0m 4.687s
46
47 I will try to re-run that on btrfs or reiserfs to get a more meaningful
48 numbers.
49
50 Now, that results back up your claims. However, if we can get that to
51 <10s, I doubt we would have a major issue. My idea works like this:
52
53 1. first update is pushed,
54 1a. egencache starts rechecking and updating cache,
55 2. second update is pushed,
56 2a. previous egencache is terminated,
57 2b. egencache starts rechecking and updating cache,
58 2c. egencache finishes in time and commits.
59
60 The point is, nothing gets committed to the user-reachable location
61 before egencache finishes. And it goes quasi-incrementally, so if
62 another update happens before egencache finished, it only does
63 the 'slow' regen on changed metadata.
64
65 I will come back with more results soon.
66
67 --
68 Best regards,
69 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature