1 |
On Thu, Apr 8, 2021 at 9:03 PM Alec Warner <antarus@g.o> wrote: |
2 |
> |
3 |
> On Thu, Apr 8, 2021 at 10:22 AM Joonas Niilola <juippis@g.o> wrote: |
4 |
> > |
5 |
> > |
6 |
> > |
7 |
> > On 4/8/21 7:37 PM, Alec Warner wrote: |
8 |
> > > |
9 |
> > > It's admittedly a grey area here. We use CDNs for various web |
10 |
> > > components (packages.gentoo.org for example) and we use a CDN for |
11 |
> > > distfiles.gentoo.org. Is Github simply a CDN for gentoo.git? Its an |
12 |
> > > open question we have discussed on the gentoo-nfp list. |
13 |
> > > |
14 |
> > > In general I'm not really sold on the benefits of git as a replication |
15 |
> > > protocol; while I dislike running a global rsync network the |
16 |
> > > maintenance of the network is basically nil from infra's end and so I |
17 |
> > > don't feel significant pressure to move to git. Could you perhaps |
18 |
> > > articulate why you think it's important for clients to move to git? |
19 |
> > > |
20 |
> > > -A |
21 |
> > > |
22 |
> > |
23 |
> |
24 |
> So I want to discuss two thrusts here then. One is that your argument |
25 |
> below is not very convincing (because it has no data that I can use to |
26 |
> verify anything you said) and the second is just the social contract; |
27 |
> like we are basically saying "hey we can't make git work, so we are |
28 |
> going to use a non-free solution" and is that OK; I happen to think it |
29 |
> is not, but see more below. |
30 |
> |
31 |
> > It provides much better user experience with continuously faster |
32 |
> > sync-times. Also there's error posts frequently in the forums when using |
33 |
> > rsync, even recently. |
34 |
> |
35 |
> I don't want to dig too deep here, but I'm looking again for more |
36 |
> engineering focussed stuff. If we are going to make a technical |
37 |
> argument, make a technical argument. |
38 |
> "it provides a much better user experience" and "it's faster" are not |
39 |
> arguments; or, they are insufficient to convince me of much. I did do |
40 |
> some experimentation. |
41 |
> |
42 |
> For example on my box: |
43 |
> rsync takes about 5s for an up-to-date-check. |
44 |
> rsync takes about 1m for a daily-like sync (includes up-to-date |
45 |
> check, incremental delivery, and GPG verification with WKD.) |
46 |
> rsync takes longer for a full sync, but I tend to use web-rsync for that task. |
47 |
> Github seems to have taken about 4s for an empty sync (e.g. up to date) |
48 |
> Github seems to have taken about 6s for a small sync (e.g. daily) |
49 |
> Github seems to have taken 20s for a full sync. |
50 |
> |
51 |
> My git-sync verification doesn't seem to be working (isn't it supposed |
52 |
> to check tip-of-free for a sig? It errors out on me.) |
53 |
> |
54 |
> I'm skeptical people care deeply about how long syncing takes (60s and |
55 |
> 5s are both fast enough for me; but I sync once a day or less.) I |
56 |
> agree prima facie that Github is likely more reliable than rsync for |
57 |
> most users as it's a better maintained product with a scalable |
58 |
> interface. |
59 |
> |
60 |
> I also make a repo for anongit.g.o and it's very slow; probably 10-20x |
61 |
> slower than github. We know we have a bunch of tuning to do on the |
62 |
> git-serving side but no staff to do it. |
63 |
|
64 |
Ah I had a bug here; anongit is: |
65 |
4s for empty update. |
66 |
5m for empty sync. |
67 |
I have not done an incremental (because I just fixed my bug.) |
68 |
|
69 |
So I think if we can make it scale it might reach an rsync level of |
70 |
performance, but we need to do the necessary tuning and buildout. |
71 |
|
72 |
-A |
73 |
|
74 |
> |
75 |
> |
76 |
> > |
77 |
> > The way I see it, utilizing Github, it can already be implemented |
78 |
> > world-wide (wrt rsync mirrors). And those disliking Github can still |
79 |
> > keep using rsync, or pick the Gentoo-infra hosted sync-repo (until it |
80 |
> > breaks). And regarding that, didn't you say you have a lot of money |
81 |
> > needing to be used? ;) |
82 |
> |
83 |
> So two things here. One is that we should care about the social |
84 |
> contract and I think bi-furcating the core parts of gentoo into free |
85 |
> and non-free is a non-trivial change. If we move people to git syncing |
86 |
> and github makes some changes, or goes away, or whatever...it's not |
87 |
> like we have equivalent free setup; our git hosting is literally not |
88 |
> up to the task of serving those users. I think this is the difference |
89 |
> between the core offering (e.g. "gentoo") and the non-free software in |
90 |
> the tree (the add-ons or additional functionality, or whatever.) |
91 |
> Keeping that logic, we could keep rsync as the default sync method and |
92 |
> tell people to use github if they want (syncing from github works |
93 |
> today just fine.) I think that is different from making github the |
94 |
> default git sync provider. I think this is similar in concept to say, |
95 |
> having only free software by default; which is a change we made |
96 |
> somewhat recently, iirc. |
97 |
> |
98 |
> This is less true for our CDN stuff, where we could just turn off the |
99 |
> CDN and be fine[0]. |
100 |
> |
101 |
> > |
102 |
> > Now I don't know if it's actually *doable* already, or if this is one of |
103 |
> > those things nobody brought up yet. That's why I made the post. (infra, |
104 |
> > releng, handbook). If it *is* doable, I don't see why the defaults can't |
105 |
> > be updated to use git. At some point at least, once we figure out the |
106 |
> > global requirements. |
107 |
> > |
108 |
> > -- juippis |
109 |
> > |
110 |
> |
111 |
> [0] This is an engineering argument, is the CDN a latency cache (make |
112 |
> things fast) or a capacity cache (make it so our origin servers do not |
113 |
> melt.) I assert the former. If it's the latter we are in the same |
114 |
> situation as with github; if the CDN is gone and our origin servers |
115 |
> melt, it means we failed. |