1 |
On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@g.o> wrote: |
2 |
> On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: |
3 |
> > Hi, |
4 |
> > |
5 |
> > I am currently embarking on a plan to redo our existing rsync[0] mirror |
6 |
> > network. The current network has aged a bit. Its likely too large and is |
7 |
> > under-maintained. I think in the ideal case we would instead pivot this |
8 |
> > project to scaling out our git mirror capabilities and slowly migrate all |
9 |
> > consumers to pulling the git tree directly. To that end, I'm looking for |
10 |
> > blockers as to why various customers cannot switch to pulling the gentoo |
11 |
> > ebuild repository from git[1] instead of rsync. |
12 |
> > |
13 |
> > So for example: |
14 |
> > |
15 |
> > - bandwidth concerns (preferably with documentation / data.) |
16 |
> > - Firewall concerns |
17 |
> > - CPU concerns (e.g. rsync is great for tiny systems?) |
18 |
> > - Disk usage for git vs rsync |
19 |
> > - Other things i have not thought of. |
20 |
> |
21 |
> My main concern with git is downlink fault tolerance. If rsync |
22 |
> connection is broken, it can be easily restored without much data |
23 |
> retransmission. If git download connection is broken, it has to |
24 |
> start all over again. So there are cases where rsync will be always |
25 |
> much more preferable than git. |
26 |
|
27 |
Are you talking about in comparison to the initial clone? |
28 |
If so, would having the clone default to shallow mitigate this? |
29 |
|
30 |
For the curious, I ran a benchmark. |
31 |
|
32 |
With a completely purged /usr/portage: |
33 |
|
34 |
emerge-webrsync took 30.302s |
35 |
emerge-sync (with git clone --depth 1) took 33.902s |
36 |
emerge-sync (with regular rsync) took a whoping 1m25.863s |
37 |
|
38 |
After a fresh sync: |
39 |
|
40 |
emerge-sync (with regular rsync) took 7.564s |
41 |
emerge-sync (with git fetch --depth 1, and after priming the repo with |
42 |
a full clone) took 2.086s |
43 |
|
44 |
|
45 |
|
46 |
Up front, webrsync seems to be a small winner for initial setups, with |
47 |
git clone a close second, and regular rsync is 3 fold worse |
48 |
|
49 |
Routine syncs would seem to prefer git, especially if they are done |
50 |
with presistent regularity which IMO would amortize things. My |
51 |
opinion is that over time git would also place less stress on the |
52 |
servers since it only has to look at the commit chain instead of |
53 |
checksumming every single file. |
54 |
|
55 |
|
56 |
|
57 |
That said, would I be correct to surmise that you're advancing a |
58 |
robustness issue and not simply a performance issue? |
59 |
|
60 |
|
61 |
> Best regards, |
62 |
> Andrew Savchenko |