1 |
On Fri, Apr 9, 2021 at 12:03 AM Alec Warner <antarus@g.o> wrote: |
2 |
> |
3 |
> For example on my box: |
4 |
> rsync takes about 5s for an up-to-date-check. |
5 |
> rsync takes about 1m for a daily-like sync (includes up-to-date |
6 |
> check, incremental delivery, and GPG verification with WKD.) |
7 |
> rsync takes longer for a full sync, but I tend to use web-rsync for that task. |
8 |
> Github seems to have taken about 4s for an empty sync (e.g. up to date) |
9 |
> Github seems to have taken about 6s for a small sync (e.g. daily) |
10 |
> Github seems to have taken 20s for a full sync. |
11 |
|
12 |
I think that type of storage and cache is going to matter a LOT here, |
13 |
as does update frequency. A hard drive is going to perform far worse |
14 |
for rsync, but probably not all that differently for git. |
15 |
|
16 |
The issue is that rsync has to walk the entire repository to figure |
17 |
out what changed (the cost of which depends on IOPS and cache state). |
18 |
Git has an index in the form of the commit linked list and also the |
19 |
tree content hashes. |
20 |
|
21 |
On the other hand, git by default stores a whole lot of history that |
22 |
is either a pro or a con depending on your requirements. If you're |
23 |
syncing once every few months git by default probably ends up sending |
24 |
a whole bunch of intermediate stuff you don't care about, while rsync |
25 |
just jumps you to the present. (Ie, if a package is revised 5 times |
26 |
in six months, rsync gets you from v1 to v5, while git sends you v2-4 |
27 |
and the metadata that tells you not to use them (but they're still |
28 |
there in case you want to check out those commits)). |
29 |
|
30 |
So, I think any technical comparison needs to pay a lot of attention |
31 |
to how the repo is used and synced, and how it is stored, and how |
32 |
stale the cache is when it is synced. |
33 |
|
34 |
Really though I think the hosting issue is the bigger concern. |
35 |
|
36 |
> So two things here. One is that we should care about the social |
37 |
> contract and I think bi-furcating the core parts of gentoo into free |
38 |
> and non-free is a non-trivial change. If we move people to git syncing |
39 |
> and github makes some changes, or goes away, or whatever...it's not |
40 |
> like we have equivalent free setup; our git hosting is literally not |
41 |
> up to the task of serving those users. |
42 |
|
43 |
I think hosting diversity is the much larger issue here: we have a ton |
44 |
of http/rsync mirrors, and there aren't really a lot of options for |
45 |
git mirroring out there. |
46 |
|
47 |
If we were using a single CDN vendor for http or rsync to run our |
48 |
entire mirror network I'd have the same concerns there. As you say, |
49 |
if github changes their TOS/whatever we'd be up the creek. If one of |
50 |
our bazillion http mirrors told infra they can't host us anymore it |
51 |
wouldn't be a big deal because we have a ton of mirrors and they're |
52 |
largely independent. If we were using CloudFlare as our sole http |
53 |
provider and CloudFlare decided to change their TOS or something, then |
54 |
again would be up the creek. |
55 |
|
56 |
So, I'd suggest that if we wanted to consider git as a primary |
57 |
recommended syncing system, the main focus should be on a diversity of |
58 |
mirroring providers. I think that whether they run FOSS is a |
59 |
nice-to-have as in the end git itself is FOSS and the protocol is what |
60 |
matters there. Just as we don't require http mirrors to run coreboot |
61 |
I don't think we should care all that much which git implementation |
62 |
they're running. However, a diversity of providers would matter more. |
63 |
|
64 |
However, for those who do think that FOSS-only is critical I'd say the |
65 |
diversity angle would solve that. If you're going to have 50 git |
66 |
mirroring providers, it seems very likely that some would be FOSS |
67 |
top-to-bottom. I'm guessing that at most 2% of our http mirrors are |
68 |
FOSS top-to-bottom (including firmware), but just having so many |
69 |
ensures that people who care about that can do so, and those who don't |
70 |
mind if the http mirror runs IIS as long as it speaks http can use |
71 |
whatever they want. |
72 |
|
73 |
-- |
74 |
Rich |