Gentoo Archives: gentoo-project

From: Alec Warner <antarus@g.o>
To: gentoo-project <gentoo-project@l.g.o>
Subject: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method
Date: Tue, 18 Dec 2018 20:30:08
Message-Id: CAAr7Pr-EYuR02MQS5SE0pNRrDWurAmS8XRnS3P+ijCYH810cnw@mail.gmail.com
In Reply to: Re: [gentoo-project] RFC: Dropping rsync as a tree distribution method by Raymond Jennings
1 On Tue, Dec 18, 2018 at 1:39 PM Raymond Jennings <shentino@×××××.com> wrote:
2
3 > What if as a first step, rsync was only dropped as the default?
4 >
5 > If you change the default from rsync to git, you'd be closer to
6 > removing rsync, but it's not as drastic as a sudden removal. Would
7 > give time to make sure it works properly without the risk of breaking
8 > everything.
9 >
10
11 To clarify, my proposal is not a sudden removal of the rsync network.
12 Cost-wise it is cheap to operate.
13 Operationally, I'd prefer to operate fewer systems out of human concerns
14 (fewer moving parts are better.)
15
16 I'm trying to ascertain what use cases need to be taken into account before
17 rsync is discontinued, hence this thread.
18
19 -A
20
21
22 >
23 > On Tue, Dec 18, 2018 at 10:37 AM Alec Warner <antarus@g.o> wrote:
24 > >
25 > >
26 > >
27 > > On Tue, Dec 18, 2018 at 1:15 PM Brian Evans <grknight@g.o> wrote:
28 > >>
29 > >> On 12/15/2018 11:15 PM, Alec Warner wrote:
30 > >> > Hi,
31 > >> >
32 > >> > I am currently embarking on a plan to redo our existing rsync[0]
33 > mirror
34 > >> > network. The current network has aged a bit. Its likely too large and
35 > is
36 > >> > under-maintained. I think in the ideal case we would instead pivot
37 > this
38 > >> > project to scaling out our git mirror capabilities and slowly migrate
39 > >> > all consumers to pulling the git tree directly. To that end, I'm
40 > looking
41 > >> > for blockers as to why various customers cannot switch to pulling the
42 > >> > gentoo ebuild repository from git[1] instead of rsync.
43 > >> >
44 > >> > So for example:
45 > >> >
46 > >> > - bandwidth concerns (preferably with documentation / data.)
47 > >> > - Firewall concerns
48 > >> > - CPU concerns (e.g. rsync is great for tiny systems?)
49 > >> > - Disk usage for git vs rsync
50 > >> > - Other things i have not thought of.
51 > >> >
52 > >> > -A
53 > >> >
54 > >> > [0] This excludes emerge-webrsync; which I don't plan on touching.
55 > >> > [1] Rich talked about some downsides earlier
56 > >> > at https://lwn.net/Articles/759539/; but while these are challenges
57 > >> > (some fixable) they are not necessarily blockers.
58 > >>
59 > >> I personally would be sad to see rsync go as I use the git developer
60 > >> tree as my main repository on 2 machines. This is so I can develop and
61 > >> update from the single source. These have no news or md5-cache and it
62 > >> can be painful to generate metadata on one of them.
63 > >
64 > >
65 > > So my strawperson response is that you should have 2 repos.
66 > >
67 > > PORTDIR=https://gitweb.gentoo.org/repo/sync/gentoo.git/log/?h=master #
68 > a local copy of this thing.
69 > > PORTDIR_OVERLAY=/path/to/your/checkout/of/gentoo.git
70 > >
71 > > I suspect however that this likely performs ...poorly, particularly in
72 > worst case situations as the 'overlay' would of course be massive in this
73 > configuration.
74 > >
75 > >>
76 > >>
77 > >> I rely on scripts to pull down the rsync metadata to expedite this
78 > >> process. eg. rsync <host>/gentoo-portage/metadata/md5-cache/. Git has
79 > >> no easy sub-tree download equivalent that I know of.
80 > >
81 > >
82 > > So I think overlaying the news and GSLA bits are easy (you have a
83 > post-sync script that cd's into various directories and clones the news and
84 > GSLA repos.) The costly bit is likely the metadata regeneration for your
85 > development branch of the tree. I'd be curious to see how much this costs
86 > (both cold and hot) for you to generate locally.
87 > >
88 > > -A
89 > >
90 > >>
91 > >>
92 > >> Brian
93 > >>
94 >
95 >