Gentoo Archives: gentoo-project

From: Alec Warner <antarus@g.o>
To: gentoo-project <gentoo-project@l.g.o>
Subject: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method
Date: Tue, 18 Dec 2018 18:00:50
Message-Id: CAAr7Pr-yMbsxJjCVYyYZh6XcTFzAp0C_Ah+screBguSfCzb22g@mail.gmail.com
In Reply to: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method by Andrew Savchenko
1 On Tue, Dec 18, 2018 at 12:14 PM Andrew Savchenko <bircoph@g.o>
2 wrote:
3
4 > On Tue, 18 Dec 2018 03:36:14 -0800 Raymond Jennings wrote:
5 > > On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@g.o>
6 > wrote:
7 > > > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote:
8 > > > > Hi,
9 > > > >
10 > > > > I am currently embarking on a plan to redo our existing rsync[0]
11 > mirror
12 > > > > network. The current network has aged a bit. Its likely too large
13 > and is
14 > > > > under-maintained. I think in the ideal case we would instead pivot
15 > this
16 > > > > project to scaling out our git mirror capabilities and slowly
17 > migrate all
18 > > > > consumers to pulling the git tree directly. To that end, I'm looking
19 > for
20 > > > > blockers as to why various customers cannot switch to pulling the
21 > gentoo
22 > > > > ebuild repository from git[1] instead of rsync.
23 > > > >
24 > > > > So for example:
25 > > > >
26 > > > > - bandwidth concerns (preferably with documentation / data.)
27 > > > > - Firewall concerns
28 > > > > - CPU concerns (e.g. rsync is great for tiny systems?)
29 > > > > - Disk usage for git vs rsync
30 > > > > - Other things i have not thought of.
31 > > >
32 > > > My main concern with git is downlink fault tolerance. If rsync
33 > > > connection is broken, it can be easily restored without much data
34 > > > retransmission. If git download connection is broken, it has to
35 > > > start all over again. So there are cases where rsync will be always
36 > > > much more preferable than git.
37 > >
38 > > Are you talking about in comparison to the initial clone?
39 > > If so, would having the clone default to shallow mitigate this?
40 > >
41 > > For the curious, I ran a benchmark.
42 > >
43 > > With a completely purged /usr/portage:
44 > >
45 > > emerge-webrsync took 30.302s
46 > > emerge-sync (with git clone --depth 1) took 33.902s
47 > > emerge-sync (with regular rsync) took a whoping 1m25.863s
48 > >
49 > > After a fresh sync:
50 > >
51 > > emerge-sync (with regular rsync) took 7.564s
52 > > emerge-sync (with git fetch --depth 1, and after priming the repo with
53 > > a full clone) took 2.086s
54 > >
55 > >
56 > >
57 > > Up front, webrsync seems to be a small winner for initial setups, with
58 > > git clone a close second, and regular rsync is 3 fold worse
59 > >
60 > > Routine syncs would seem to prefer git, especially if they are done
61 > > with presistent regularity which IMO would amortize things. My
62 > > opinion is that over time git would also place less stress on the
63 > > servers since it only has to look at the commit chain instead of
64 > > checksumming every single file.
65 > >
66 > >
67 > >
68 > > That said, would I be correct to surmise that you're advancing a
69 > > robustness issue and not simply a performance issue?
70 >
71 > Yes, my interest here is in robustness, not performance. Sometimes I
72 > have to use unreliable uplink and other users may face the same
73 > problem.
74 >
75 > I agree that in most cases git should be a preferred way to go, but
76 > there are exceptions. So it would be nice to have rsync backup just
77 > in case.
78
79
80 > Daily or weekly portage snapshots available via rsync should be a
81 > solution as well.
82 >
83
84 Two things here. One is that in an ideal world we would run no rsync
85 service and any design should keep that outcome in mind. Operationally we
86 should continue to offer rsync until these types of problems are addressed
87 by the new system.
88
89 The second is that in this case I think the plan is to, as Robin mentioned,
90 offer "git bundles" that are over raw http and support resume-able
91 downloads. So instead of downloading an "rsync snapshot" you download a git
92 bundle over http. Infra would offer these git bundles in a similar way to
93 existing rsync snapshot offerings[0]. These bundles would be applied to a
94 machine local clone of a git repo. Does this conceptually address your
95 problem? I agree it will be difficult to know outside of actual practical
96 testing.
97
98 -A
99
100 [0] http://gentoo.ussg.indiana.edu/snapshots/ is one example of the current
101 system. Instead of tarballs of an 'rsync tree' these would be git
102 bundles[1] that you fetch and apply locally. We would support a worldwide
103 mirror network for these bundles.
104 [1] https://git-scm.com/docs/git-bundle
105
106
107 >
108 > Best regards,
109 > Andrew Savchenko
110 >

Replies

Subject Author
Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method "M. J. Everitt" <m.j.everitt@×××.org>