Gentoo Archives: gentoo-project

From: Andrew Savchenko <bircoph@g.o>
To: gentoo-project@l.g.o
Subject: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method
Date: Tue, 18 Dec 2018 17:14:25
Message-Id: 20181218201415.27c7a5f2cd59fba9102a0156@gentoo.org
In Reply to: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method by Raymond Jennings
1 On Tue, 18 Dec 2018 03:36:14 -0800 Raymond Jennings wrote:
2 > On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@g.o> wrote:
3 > > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote:
4 > > > Hi,
5 > > >
6 > > > I am currently embarking on a plan to redo our existing rsync[0] mirror
7 > > > network. The current network has aged a bit. Its likely too large and is
8 > > > under-maintained. I think in the ideal case we would instead pivot this
9 > > > project to scaling out our git mirror capabilities and slowly migrate all
10 > > > consumers to pulling the git tree directly. To that end, I'm looking for
11 > > > blockers as to why various customers cannot switch to pulling the gentoo
12 > > > ebuild repository from git[1] instead of rsync.
13 > > >
14 > > > So for example:
15 > > >
16 > > > - bandwidth concerns (preferably with documentation / data.)
17 > > > - Firewall concerns
18 > > > - CPU concerns (e.g. rsync is great for tiny systems?)
19 > > > - Disk usage for git vs rsync
20 > > > - Other things i have not thought of.
21 > >
22 > > My main concern with git is downlink fault tolerance. If rsync
23 > > connection is broken, it can be easily restored without much data
24 > > retransmission. If git download connection is broken, it has to
25 > > start all over again. So there are cases where rsync will be always
26 > > much more preferable than git.
27 >
28 > Are you talking about in comparison to the initial clone?
29 > If so, would having the clone default to shallow mitigate this?
30 >
31 > For the curious, I ran a benchmark.
32 >
33 > With a completely purged /usr/portage:
34 >
35 > emerge-webrsync took 30.302s
36 > emerge-sync (with git clone --depth 1) took 33.902s
37 > emerge-sync (with regular rsync) took a whoping 1m25.863s
38 >
39 > After a fresh sync:
40 >
41 > emerge-sync (with regular rsync) took 7.564s
42 > emerge-sync (with git fetch --depth 1, and after priming the repo with
43 > a full clone) took 2.086s
44 >
45 >
46 >
47 > Up front, webrsync seems to be a small winner for initial setups, with
48 > git clone a close second, and regular rsync is 3 fold worse
49 >
50 > Routine syncs would seem to prefer git, especially if they are done
51 > with presistent regularity which IMO would amortize things. My
52 > opinion is that over time git would also place less stress on the
53 > servers since it only has to look at the commit chain instead of
54 > checksumming every single file.
55 >
56 >
57 >
58 > That said, would I be correct to surmise that you're advancing a
59 > robustness issue and not simply a performance issue?
60
61 Yes, my interest here is in robustness, not performance. Sometimes I
62 have to use unreliable uplink and other users may face the same
63 problem.
64
65 I agree that in most cases git should be a preferred way to go, but
66 there are exceptions. So it would be nice to have rsync backup just
67 in case.
68
69 Daily or weekly portage snapshots available via rsync should be a
70 solution as well.
71
72 Best regards,
73 Andrew Savchenko

Replies