Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Using emerge-webrsync to simplify the handbook
Date: Fri, 30 Nov 2012 17:31:27
Message-Id: CAAr7Pr-6TZ9Zo46pqf9hP7gbg+us11Q4t2hLqRPGkbJFikQ0Ew@mail.gmail.com
In Reply to: Re: [gentoo-dev] Using emerge-webrsync to simplify the handbook by Ian Stakenvicius
1 On Fri, Nov 30, 2012 at 8:59 AM, Ian Stakenvicius <axs@g.o> wrote:
2 > -----BEGIN PGP SIGNED MESSAGE-----
3 > Hash: SHA256
4 >
5 > On 30/11/12 11:15 AM, Michael Mol wrote:
6 >> On Fri, Nov 30, 2012 at 10:57 AM, Richard Yao <ryao@g.o>
7 >> wrote:
8 >>> On 11/28/2012 11:08 AM, Matthew Thode wrote:
9 >>>> On 11/28/2012 09:05 AM, Richard Yao wrote:
10 >>>>> On 11/28/2012 09:17 AM, Maxim Kammerer wrote:
11 >>>>>> On Wed, Nov 28, 2012 at 3:54 PM, Richard Yao
12 >>>>>> <ryao@g.o> wrote:
13 >>>>>>> We could slightly simplify the handbook installation
14 >>>>>>> procedure if we told people to use emerge-webrsync to
15 >>>>>>> fetch the initial snapshot.
16 >>>>>>
17 >>>>>> Using emerge-webrsync also makes the installation process
18 >>>>>> more robust, since it only requires HTTP access (whereas
19 >>>>>> many firewalls restrict RSYNC). Besides, emerge-webrsync
20 >>>>>> can check PGP signatures, so I think that it should be the
21 >>>>>> primary recommended portage tree synchronization method.
22 >>>>>>
23 >>>>>
24 >>>>> The only downside of which I am aware is increased network
25 >>>>> traffic. However, we could redesign emerge-webrsync to take
26 >>>>> advantage of GNU Tar's incremental archive functionality.
27 >>>>>
28 >>>>> That would permit us to mirror compressed diffs in addition
29 >>>>> to regular portage snapshots. Doing that well could reduce
30 >>>>> bandwidth requirements.
31 >>>>>
32 >>>> weekly fulls and daily diffs?
33 >>>>
34 >>>
35 >>> Determining what is right here probably requires calculus, but
36 >>> this scheme does not seem like a bad choice to me. My main
37 >>> concern is that maintaining weekly full snapshots would require
38 >>> too much space for the mirrors. It might be better go monthly,
39 >>> with diffs on the following intervals:
40 >>>
41 >>> 1 week 1 day 30 minutes
42 >>>
43 >>> Doing that would eliminate the benefit of rsync entirely, with
44 >>> the caveat that we now need to mirror a ton of diffs. This would
45 >>> make it easy for us to provide the ability to obtain historical
46 >>> snapshots, which would be nice.
47 >>
48 >> Worth noting that all this moves us nicely in the direction of
49 >> allowing HTTP proxies to cache data, reducing load on mirrors. And
50 >> moves us in the direction of implementing mirrors themselves as a
51 >> network of caching proxies.
52 >>
53 >
54 > Idea makes sense, I wonder if implementation would be better served by
55 > leveraging the fact that we already produce daily full snapshots:
56 >
57 > 1 - continue to provide the daily snapshots we do now
58 > 2 - provide two weeks (more?) of daily diffs, such that a daily
59 > snapshot from up to two weeks ago can be updated to present day
60 > 3 - provide hourly or 30-minute update diffs to get latest changes.
61 >
62 > If the tree is more than two weeks old, emerge-webrsync would just
63 > grab the latest daily plus the hourly diffs.
64 >
65 > If the tree is less than two weeks old, grab the daily diffs and
66 > hourly diffs. The local copy of the tree itself would need to be
67 > rolled back to the best-available daily diff before these diff updates
68 > could be applied; this may mean that a local cache of the latest
69 > full-day snapshot needs to be kept and/or generated. Also if said
70 > cache doesn't exist, then the whole full-day snapshot would be grabbed.
71 >
72 > The advantage to this would be significantly fewer distfiles, although
73 > the logic in emerge-webrsync would possibly be more complex.
74 >
75 > Regarding rolling back the local tree to a known-good state, I think
76 > that would be required regardless of the method as any local changes
77 > made to the tree by users would need to be discarded, right?
78
79 How about we not change the docs until someone eagerly implements all
80 the stuff you just said?
81
82 Note that from an infra POV our existing system works fairly well and
83 requires no day-to-day tinkering.
84 I'm always happy to consider new options, but work needs to put in to
85 make it feasible. I'm sure if we switched to http with zsync or
86 something, we could make it feasible. I want to see a prototype, etc.
87
88 -A
89
90 >
91 > -----BEGIN PGP SIGNATURE-----
92 > Version: GnuPG v2.0.19 (GNU/Linux)
93 >
94 > iF4EAREIAAYFAlC45f0ACgkQ2ugaI38ACPChKgD9GOBptQ9jJ1/eYyq1NEl5Oq1E
95 > dVy9UOab80bG5FZB9LwBAKwsifnT+iE3n/4d/ljnuT2qCnbtXNYr7yBjF/VcEpkq
96 > =y9eB
97 > -----END PGP SIGNATURE-----
98 >

Replies