Gentoo Archives: gentoo-project

From: Alec Warner <antarus@g.o>
To: gentoo-project <gentoo-project@l.g.o>
Subject: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11
Date: Fri, 09 Apr 2021 04:04:18
Message-Id: CAAr7Pr-BG5f9yv41y23mV3bzbDOe+N+YsSd5SbEdrFZ3MygJ+A@mail.gmail.com
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11 by Joonas Niilola
1 On Thu, Apr 8, 2021 at 10:22 AM Joonas Niilola <juippis@g.o> wrote:
2 >
3 >
4 >
5 > On 4/8/21 7:37 PM, Alec Warner wrote:
6 > >
7 > > It's admittedly a grey area here. We use CDNs for various web
8 > > components (packages.gentoo.org for example) and we use a CDN for
9 > > distfiles.gentoo.org. Is Github simply a CDN for gentoo.git? Its an
10 > > open question we have discussed on the gentoo-nfp list.
11 > >
12 > > In general I'm not really sold on the benefits of git as a replication
13 > > protocol; while I dislike running a global rsync network the
14 > > maintenance of the network is basically nil from infra's end and so I
15 > > don't feel significant pressure to move to git. Could you perhaps
16 > > articulate why you think it's important for clients to move to git?
17 > >
18 > > -A
19 > >
20 >
21
22 So I want to discuss two thrusts here then. One is that your argument
23 below is not very convincing (because it has no data that I can use to
24 verify anything you said) and the second is just the social contract;
25 like we are basically saying "hey we can't make git work, so we are
26 going to use a non-free solution" and is that OK; I happen to think it
27 is not, but see more below.
28
29 > It provides much better user experience with continuously faster
30 > sync-times. Also there's error posts frequently in the forums when using
31 > rsync, even recently.
32
33 I don't want to dig too deep here, but I'm looking again for more
34 engineering focussed stuff. If we are going to make a technical
35 argument, make a technical argument.
36 "it provides a much better user experience" and "it's faster" are not
37 arguments; or, they are insufficient to convince me of much. I did do
38 some experimentation.
39
40 For example on my box:
41 rsync takes about 5s for an up-to-date-check.
42 rsync takes about 1m for a daily-like sync (includes up-to-date
43 check, incremental delivery, and GPG verification with WKD.)
44 rsync takes longer for a full sync, but I tend to use web-rsync for that task.
45 Github seems to have taken about 4s for an empty sync (e.g. up to date)
46 Github seems to have taken about 6s for a small sync (e.g. daily)
47 Github seems to have taken 20s for a full sync.
48
49 My git-sync verification doesn't seem to be working (isn't it supposed
50 to check tip-of-free for a sig? It errors out on me.)
51
52 I'm skeptical people care deeply about how long syncing takes (60s and
53 5s are both fast enough for me; but I sync once a day or less.) I
54 agree prima facie that Github is likely more reliable than rsync for
55 most users as it's a better maintained product with a scalable
56 interface.
57
58 I also make a repo for anongit.g.o and it's very slow; probably 10-20x
59 slower than github. We know we have a bunch of tuning to do on the
60 git-serving side but no staff to do it.
61
62
63 >
64 > The way I see it, utilizing Github, it can already be implemented
65 > world-wide (wrt rsync mirrors). And those disliking Github can still
66 > keep using rsync, or pick the Gentoo-infra hosted sync-repo (until it
67 > breaks). And regarding that, didn't you say you have a lot of money
68 > needing to be used? ;)
69
70 So two things here. One is that we should care about the social
71 contract and I think bi-furcating the core parts of gentoo into free
72 and non-free is a non-trivial change. If we move people to git syncing
73 and github makes some changes, or goes away, or whatever...it's not
74 like we have equivalent free setup; our git hosting is literally not
75 up to the task of serving those users. I think this is the difference
76 between the core offering (e.g. "gentoo") and the non-free software in
77 the tree (the add-ons or additional functionality, or whatever.)
78 Keeping that logic, we could keep rsync as the default sync method and
79 tell people to use github if they want (syncing from github works
80 today just fine.) I think that is different from making github the
81 default git sync provider. I think this is similar in concept to say,
82 having only free software by default; which is a change we made
83 somewhat recently, iirc.
84
85 This is less true for our CDN stuff, where we could just turn off the
86 CDN and be fine[0].
87
88 >
89 > Now I don't know if it's actually *doable* already, or if this is one of
90 > those things nobody brought up yet. That's why I made the post. (infra,
91 > releng, handbook). If it *is* doable, I don't see why the defaults can't
92 > be updated to use git. At some point at least, once we figure out the
93 > global requirements.
94 >
95 > -- juippis
96 >
97
98 [0] This is an engineering argument, is the CDN a latency cache (make
99 things fast) or a capacity cache (make it so our origin servers do not
100 melt.) I assert the former. If it's the latter we are in the same
101 situation as with github; if the CDN is gone and our origin servers
102 melt, it means we failed.

Replies