1 |
On Tue, 18 Dec 2018 03:36:14 -0800 Raymond Jennings wrote: |
2 |
> On Tue, Dec 18, 2018 at 1:56 AM Andrew Savchenko <bircoph@g.o> wrote: |
3 |
> > On Sat, 15 Dec 2018 23:15:47 -0500 Alec Warner wrote: |
4 |
> > > Hi, |
5 |
> > > |
6 |
> > > I am currently embarking on a plan to redo our existing rsync[0] mirror |
7 |
> > > network. The current network has aged a bit. Its likely too large and is |
8 |
> > > under-maintained. I think in the ideal case we would instead pivot this |
9 |
> > > project to scaling out our git mirror capabilities and slowly migrate all |
10 |
> > > consumers to pulling the git tree directly. To that end, I'm looking for |
11 |
> > > blockers as to why various customers cannot switch to pulling the gentoo |
12 |
> > > ebuild repository from git[1] instead of rsync. |
13 |
> > > |
14 |
> > > So for example: |
15 |
> > > |
16 |
> > > - bandwidth concerns (preferably with documentation / data.) |
17 |
> > > - Firewall concerns |
18 |
> > > - CPU concerns (e.g. rsync is great for tiny systems?) |
19 |
> > > - Disk usage for git vs rsync |
20 |
> > > - Other things i have not thought of. |
21 |
> > |
22 |
> > My main concern with git is downlink fault tolerance. If rsync |
23 |
> > connection is broken, it can be easily restored without much data |
24 |
> > retransmission. If git download connection is broken, it has to |
25 |
> > start all over again. So there are cases where rsync will be always |
26 |
> > much more preferable than git. |
27 |
> |
28 |
> Are you talking about in comparison to the initial clone? |
29 |
> If so, would having the clone default to shallow mitigate this? |
30 |
> |
31 |
> For the curious, I ran a benchmark. |
32 |
> |
33 |
> With a completely purged /usr/portage: |
34 |
> |
35 |
> emerge-webrsync took 30.302s |
36 |
> emerge-sync (with git clone --depth 1) took 33.902s |
37 |
> emerge-sync (with regular rsync) took a whoping 1m25.863s |
38 |
> |
39 |
> After a fresh sync: |
40 |
> |
41 |
> emerge-sync (with regular rsync) took 7.564s |
42 |
> emerge-sync (with git fetch --depth 1, and after priming the repo with |
43 |
> a full clone) took 2.086s |
44 |
> |
45 |
> |
46 |
> |
47 |
> Up front, webrsync seems to be a small winner for initial setups, with |
48 |
> git clone a close second, and regular rsync is 3 fold worse |
49 |
> |
50 |
> Routine syncs would seem to prefer git, especially if they are done |
51 |
> with presistent regularity which IMO would amortize things. My |
52 |
> opinion is that over time git would also place less stress on the |
53 |
> servers since it only has to look at the commit chain instead of |
54 |
> checksumming every single file. |
55 |
> |
56 |
> |
57 |
> |
58 |
> That said, would I be correct to surmise that you're advancing a |
59 |
> robustness issue and not simply a performance issue? |
60 |
|
61 |
Yes, my interest here is in robustness, not performance. Sometimes I |
62 |
have to use unreliable uplink and other users may face the same |
63 |
problem. |
64 |
|
65 |
I agree that in most cases git should be a preferred way to go, but |
66 |
there are exceptions. So it would be nice to have rsync backup just |
67 |
in case. |
68 |
|
69 |
Daily or weekly portage snapshots available via rsync should be a |
70 |
solution as well. |
71 |
|
72 |
Best regards, |
73 |
Andrew Savchenko |