Gentoo Archives: gentoo-project

From: Rich Freeman <rich0@g.o>
To: gentoo-project <gentoo-project@l.g.o>
Subject: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11
Date: Fri, 09 Apr 2021 12:47:50
Message-Id: CAGfcS_nZM7zWT2zEc7v305rpOwmG6+-w4zB_9aiY3SzfZZ_+Qw@mail.gmail.com
In Reply to: Re: [gentoo-project] Call for agenda items - Council meeting 2021-04-11 by Alec Warner
1 On Fri, Apr 9, 2021 at 12:03 AM Alec Warner <antarus@g.o> wrote:
2 >
3 > For example on my box:
4 > rsync takes about 5s for an up-to-date-check.
5 > rsync takes about 1m for a daily-like sync (includes up-to-date
6 > check, incremental delivery, and GPG verification with WKD.)
7 > rsync takes longer for a full sync, but I tend to use web-rsync for that task.
8 > Github seems to have taken about 4s for an empty sync (e.g. up to date)
9 > Github seems to have taken about 6s for a small sync (e.g. daily)
10 > Github seems to have taken 20s for a full sync.
11
12 I think that type of storage and cache is going to matter a LOT here,
13 as does update frequency. A hard drive is going to perform far worse
14 for rsync, but probably not all that differently for git.
15
16 The issue is that rsync has to walk the entire repository to figure
17 out what changed (the cost of which depends on IOPS and cache state).
18 Git has an index in the form of the commit linked list and also the
19 tree content hashes.
20
21 On the other hand, git by default stores a whole lot of history that
22 is either a pro or a con depending on your requirements. If you're
23 syncing once every few months git by default probably ends up sending
24 a whole bunch of intermediate stuff you don't care about, while rsync
25 just jumps you to the present. (Ie, if a package is revised 5 times
26 in six months, rsync gets you from v1 to v5, while git sends you v2-4
27 and the metadata that tells you not to use them (but they're still
28 there in case you want to check out those commits)).
29
30 So, I think any technical comparison needs to pay a lot of attention
31 to how the repo is used and synced, and how it is stored, and how
32 stale the cache is when it is synced.
33
34 Really though I think the hosting issue is the bigger concern.
35
36 > So two things here. One is that we should care about the social
37 > contract and I think bi-furcating the core parts of gentoo into free
38 > and non-free is a non-trivial change. If we move people to git syncing
39 > and github makes some changes, or goes away, or whatever...it's not
40 > like we have equivalent free setup; our git hosting is literally not
41 > up to the task of serving those users.
42
43 I think hosting diversity is the much larger issue here: we have a ton
44 of http/rsync mirrors, and there aren't really a lot of options for
45 git mirroring out there.
46
47 If we were using a single CDN vendor for http or rsync to run our
48 entire mirror network I'd have the same concerns there. As you say,
49 if github changes their TOS/whatever we'd be up the creek. If one of
50 our bazillion http mirrors told infra they can't host us anymore it
51 wouldn't be a big deal because we have a ton of mirrors and they're
52 largely independent. If we were using CloudFlare as our sole http
53 provider and CloudFlare decided to change their TOS or something, then
54 again would be up the creek.
55
56 So, I'd suggest that if we wanted to consider git as a primary
57 recommended syncing system, the main focus should be on a diversity of
58 mirroring providers. I think that whether they run FOSS is a
59 nice-to-have as in the end git itself is FOSS and the protocol is what
60 matters there. Just as we don't require http mirrors to run coreboot
61 I don't think we should care all that much which git implementation
62 they're running. However, a diversity of providers would matter more.
63
64 However, for those who do think that FOSS-only is critical I'd say the
65 diversity angle would solve that. If you're going to have 50 git
66 mirroring providers, it seems very likely that some would be FOSS
67 top-to-bottom. I'm guessing that at most 2% of our http mirrors are
68 FOSS top-to-bottom (including firmware), but just having so many
69 ensures that people who care about that can do so, and those who don't
70 mind if the http mirror runs IIS as long as it speaks http can use
71 whatever they want.
72
73 --
74 Rich