1 |
Dnia 2014-09-14, o godz. 21:30:36 |
2 |
Tim Harder <radhermit@g.o> napisał(a): |
3 |
|
4 |
> On 2014-09-14 10:46, Michał Górny wrote: |
5 |
> > Dnia 2014-09-14, o godz. 15:40:06 |
6 |
> > Davide Pesavento <pesa@g.o> napisał(a): |
7 |
> > > How long does the md5-cache regeneration process take? Are you sure it |
8 |
> > > will be able to keep up with the rate of pushes to the repo during |
9 |
> > > "peak hours"? If not, maybe we could use a time-based thing similar to |
10 |
> > > the current cvs->rsync synchronization. |
11 |
> > |
12 |
> > This strongly depends on how much data is there to update. A few |
13 |
> > ebuilds are quite fast, eclass change isn't ;). I was thinking of |
14 |
> > something along the lines of, in pseudo-code speaking: |
15 |
> > |
16 |
> > systemctl restart cache-regen |
17 |
> > |
18 |
> > That is, we start the regen on every update. If it finishes in time, it |
19 |
> > commits the new metadata. If another update occurs during regen, we |
20 |
> > just restart it to let it catch the new data. |
21 |
> > |
22 |
> > Of course, if we can't spare the resources to do intermediate updates, |
23 |
> > we may as well switch to cron-based update method. |
24 |
> |
25 |
> I don't see per push metadata regen working entirely well in this case |
26 |
> if this is the only way we're generating the metadata cache for users to |
27 |
> sync. It's easy to imagine a plausible situation where a widely used |
28 |
> eclass change is made followed by commits less than a minute apart (or |
29 |
> shorter than however long it would take for metadata regen to occur) for |
30 |
> at least 30 minutes (rsync refresh period for most user-facing mirrors) |
31 |
> during a time of high activity. |
32 |
|
33 |
For a metadata recheck (that is, egencache run with no changes): |
34 |
|
35 |
a. cold cache ext4: |
36 |
|
37 |
real 3m54.321s |
38 |
user 0m44.413s |
39 |
sys 0m13.497s |
40 |
|
41 |
b. warm cache ext4: |
42 |
|
43 |
real 0m40.672s |
44 |
user 0m35.087s |
45 |
sys 0m 4.687s |
46 |
|
47 |
I will try to re-run that on btrfs or reiserfs to get a more meaningful |
48 |
numbers. |
49 |
|
50 |
Now, that results back up your claims. However, if we can get that to |
51 |
<10s, I doubt we would have a major issue. My idea works like this: |
52 |
|
53 |
1. first update is pushed, |
54 |
1a. egencache starts rechecking and updating cache, |
55 |
2. second update is pushed, |
56 |
2a. previous egencache is terminated, |
57 |
2b. egencache starts rechecking and updating cache, |
58 |
2c. egencache finishes in time and commits. |
59 |
|
60 |
The point is, nothing gets committed to the user-reachable location |
61 |
before egencache finishes. And it goes quasi-incrementally, so if |
62 |
another update happens before egencache finished, it only does |
63 |
the 'slow' regen on changed metadata. |
64 |
|
65 |
I will come back with more results soon. |
66 |
|
67 |
-- |
68 |
Best regards, |
69 |
Michał Górny |