Gentoo Archives: gentoo-project

From: Alec Warner <antarus@g.o>
To: gentoo-project <gentoo-project@l.g.o>
Subject: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11
Date: Fri, 09 Apr 2021 04:20:09
Message-Id: CAAr7Pr87GsrR-sC7vPjNSK0diCUoxUAKt6_724u-Ym+ErQFkAA@mail.gmail.com
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11 by Alec Warner
1 On Thu, Apr 8, 2021 at 9:03 PM Alec Warner <antarus@g.o> wrote:
2 >
3 > On Thu, Apr 8, 2021 at 10:22 AM Joonas Niilola <juippis@g.o> wrote:
4 > >
5 > >
6 > >
7 > > On 4/8/21 7:37 PM, Alec Warner wrote:
8 > > >
9 > > > It's admittedly a grey area here. We use CDNs for various web
10 > > > components (packages.gentoo.org for example) and we use a CDN for
11 > > > distfiles.gentoo.org. Is Github simply a CDN for gentoo.git? Its an
12 > > > open question we have discussed on the gentoo-nfp list.
13 > > >
14 > > > In general I'm not really sold on the benefits of git as a replication
15 > > > protocol; while I dislike running a global rsync network the
16 > > > maintenance of the network is basically nil from infra's end and so I
17 > > > don't feel significant pressure to move to git. Could you perhaps
18 > > > articulate why you think it's important for clients to move to git?
19 > > >
20 > > > -A
21 > > >
22 > >
23 >
24 > So I want to discuss two thrusts here then. One is that your argument
25 > below is not very convincing (because it has no data that I can use to
26 > verify anything you said) and the second is just the social contract;
27 > like we are basically saying "hey we can't make git work, so we are
28 > going to use a non-free solution" and is that OK; I happen to think it
29 > is not, but see more below.
30 >
31 > > It provides much better user experience with continuously faster
32 > > sync-times. Also there's error posts frequently in the forums when using
33 > > rsync, even recently.
34 >
35 > I don't want to dig too deep here, but I'm looking again for more
36 > engineering focussed stuff. If we are going to make a technical
37 > argument, make a technical argument.
38 > "it provides a much better user experience" and "it's faster" are not
39 > arguments; or, they are insufficient to convince me of much. I did do
40 > some experimentation.
41 >
42 > For example on my box:
43 > rsync takes about 5s for an up-to-date-check.
44 > rsync takes about 1m for a daily-like sync (includes up-to-date
45 > check, incremental delivery, and GPG verification with WKD.)
46 > rsync takes longer for a full sync, but I tend to use web-rsync for that task.
47 > Github seems to have taken about 4s for an empty sync (e.g. up to date)
48 > Github seems to have taken about 6s for a small sync (e.g. daily)
49 > Github seems to have taken 20s for a full sync.
50 >
51 > My git-sync verification doesn't seem to be working (isn't it supposed
52 > to check tip-of-free for a sig? It errors out on me.)
53 >
54 > I'm skeptical people care deeply about how long syncing takes (60s and
55 > 5s are both fast enough for me; but I sync once a day or less.) I
56 > agree prima facie that Github is likely more reliable than rsync for
57 > most users as it's a better maintained product with a scalable
58 > interface.
59 >
60 > I also make a repo for anongit.g.o and it's very slow; probably 10-20x
61 > slower than github. We know we have a bunch of tuning to do on the
62 > git-serving side but no staff to do it.
63
64 Ah I had a bug here; anongit is:
65 4s for empty update.
66 5m for empty sync.
67 I have not done an incremental (because I just fixed my bug.)
68
69 So I think if we can make it scale it might reach an rsync level of
70 performance, but we need to do the necessary tuning and buildout.
71
72 -A
73
74 >
75 >
76 > >
77 > > The way I see it, utilizing Github, it can already be implemented
78 > > world-wide (wrt rsync mirrors). And those disliking Github can still
79 > > keep using rsync, or pick the Gentoo-infra hosted sync-repo (until it
80 > > breaks). And regarding that, didn't you say you have a lot of money
81 > > needing to be used? ;)
82 >
83 > So two things here. One is that we should care about the social
84 > contract and I think bi-furcating the core parts of gentoo into free
85 > and non-free is a non-trivial change. If we move people to git syncing
86 > and github makes some changes, or goes away, or whatever...it's not
87 > like we have equivalent free setup; our git hosting is literally not
88 > up to the task of serving those users. I think this is the difference
89 > between the core offering (e.g. "gentoo") and the non-free software in
90 > the tree (the add-ons or additional functionality, or whatever.)
91 > Keeping that logic, we could keep rsync as the default sync method and
92 > tell people to use github if they want (syncing from github works
93 > today just fine.) I think that is different from making github the
94 > default git sync provider. I think this is similar in concept to say,
95 > having only free software by default; which is a change we made
96 > somewhat recently, iirc.
97 >
98 > This is less true for our CDN stuff, where we could just turn off the
99 > CDN and be fine[0].
100 >
101 > >
102 > > Now I don't know if it's actually *doable* already, or if this is one of
103 > > those things nobody brought up yet. That's why I made the post. (infra,
104 > > releng, handbook). If it *is* doable, I don't see why the defaults can't
105 > > be updated to use git. At some point at least, once we figure out the
106 > > global requirements.
107 > >
108 > > -- juippis
109 > >
110 >
111 > [0] This is an engineering argument, is the CDN a latency cache (make
112 > things fast) or a capacity cache (make it so our origin servers do not
113 > melt.) I assert the former. If it's the latter we are in the same
114 > situation as with github; if the CDN is gone and our origin servers
115 > melt, it means we failed.