1 |
On Thu, Apr 8, 2021 at 10:22 AM Joonas Niilola <juippis@g.o> wrote: |
2 |
> |
3 |
> |
4 |
> |
5 |
> On 4/8/21 7:37 PM, Alec Warner wrote: |
6 |
> > |
7 |
> > It's admittedly a grey area here. We use CDNs for various web |
8 |
> > components (packages.gentoo.org for example) and we use a CDN for |
9 |
> > distfiles.gentoo.org. Is Github simply a CDN for gentoo.git? Its an |
10 |
> > open question we have discussed on the gentoo-nfp list. |
11 |
> > |
12 |
> > In general I'm not really sold on the benefits of git as a replication |
13 |
> > protocol; while I dislike running a global rsync network the |
14 |
> > maintenance of the network is basically nil from infra's end and so I |
15 |
> > don't feel significant pressure to move to git. Could you perhaps |
16 |
> > articulate why you think it's important for clients to move to git? |
17 |
> > |
18 |
> > -A |
19 |
> > |
20 |
> |
21 |
|
22 |
So I want to discuss two thrusts here then. One is that your argument |
23 |
below is not very convincing (because it has no data that I can use to |
24 |
verify anything you said) and the second is just the social contract; |
25 |
like we are basically saying "hey we can't make git work, so we are |
26 |
going to use a non-free solution" and is that OK; I happen to think it |
27 |
is not, but see more below. |
28 |
|
29 |
> It provides much better user experience with continuously faster |
30 |
> sync-times. Also there's error posts frequently in the forums when using |
31 |
> rsync, even recently. |
32 |
|
33 |
I don't want to dig too deep here, but I'm looking again for more |
34 |
engineering focussed stuff. If we are going to make a technical |
35 |
argument, make a technical argument. |
36 |
"it provides a much better user experience" and "it's faster" are not |
37 |
arguments; or, they are insufficient to convince me of much. I did do |
38 |
some experimentation. |
39 |
|
40 |
For example on my box: |
41 |
rsync takes about 5s for an up-to-date-check. |
42 |
rsync takes about 1m for a daily-like sync (includes up-to-date |
43 |
check, incremental delivery, and GPG verification with WKD.) |
44 |
rsync takes longer for a full sync, but I tend to use web-rsync for that task. |
45 |
Github seems to have taken about 4s for an empty sync (e.g. up to date) |
46 |
Github seems to have taken about 6s for a small sync (e.g. daily) |
47 |
Github seems to have taken 20s for a full sync. |
48 |
|
49 |
My git-sync verification doesn't seem to be working (isn't it supposed |
50 |
to check tip-of-free for a sig? It errors out on me.) |
51 |
|
52 |
I'm skeptical people care deeply about how long syncing takes (60s and |
53 |
5s are both fast enough for me; but I sync once a day or less.) I |
54 |
agree prima facie that Github is likely more reliable than rsync for |
55 |
most users as it's a better maintained product with a scalable |
56 |
interface. |
57 |
|
58 |
I also make a repo for anongit.g.o and it's very slow; probably 10-20x |
59 |
slower than github. We know we have a bunch of tuning to do on the |
60 |
git-serving side but no staff to do it. |
61 |
|
62 |
|
63 |
> |
64 |
> The way I see it, utilizing Github, it can already be implemented |
65 |
> world-wide (wrt rsync mirrors). And those disliking Github can still |
66 |
> keep using rsync, or pick the Gentoo-infra hosted sync-repo (until it |
67 |
> breaks). And regarding that, didn't you say you have a lot of money |
68 |
> needing to be used? ;) |
69 |
|
70 |
So two things here. One is that we should care about the social |
71 |
contract and I think bi-furcating the core parts of gentoo into free |
72 |
and non-free is a non-trivial change. If we move people to git syncing |
73 |
and github makes some changes, or goes away, or whatever...it's not |
74 |
like we have equivalent free setup; our git hosting is literally not |
75 |
up to the task of serving those users. I think this is the difference |
76 |
between the core offering (e.g. "gentoo") and the non-free software in |
77 |
the tree (the add-ons or additional functionality, or whatever.) |
78 |
Keeping that logic, we could keep rsync as the default sync method and |
79 |
tell people to use github if they want (syncing from github works |
80 |
today just fine.) I think that is different from making github the |
81 |
default git sync provider. I think this is similar in concept to say, |
82 |
having only free software by default; which is a change we made |
83 |
somewhat recently, iirc. |
84 |
|
85 |
This is less true for our CDN stuff, where we could just turn off the |
86 |
CDN and be fine[0]. |
87 |
|
88 |
> |
89 |
> Now I don't know if it's actually *doable* already, or if this is one of |
90 |
> those things nobody brought up yet. That's why I made the post. (infra, |
91 |
> releng, handbook). If it *is* doable, I don't see why the defaults can't |
92 |
> be updated to use git. At some point at least, once we figure out the |
93 |
> global requirements. |
94 |
> |
95 |
> -- juippis |
96 |
> |
97 |
|
98 |
[0] This is an engineering argument, is the CDN a latency cache (make |
99 |
things fast) or a capacity cache (make it so our origin servers do not |
100 |
melt.) I assert the former. If it's the latter we are in the same |
101 |
situation as with github; if the CDN is gone and our origin servers |
102 |
melt, it means we failed. |