Gentoo Archives: gentoo-mirrors

From: "Michał Górny" <mgorny@g.o>
To: gentoo-mirrors@l.g.o, SoEasyTo Mirrors Manager <mirrors@××××××××.com>
Subject: Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
Date: Tue, 08 Oct 2019 09:24:49
Message-Id: 31E9163B-F50D-4620-A3AF-15B09126B643@gentoo.org
In Reply to: Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? by SoEasyTo Mirrors Manager
1 Dnia October 8, 2019 7:13:03 AM UTC, SoEasyTo Mirrors Manager <mirrors@××××××××.com> napisał(a):
2 >On 2019-10-08 05:33, Michał Górny wrote :
3 >
4 >> Hello, everyone.
5 >>
6 >> TL;DR: shortly, distfiles will need to be present under two paths for
7 >> the transitional period. Would you prefer us using hardlinks or
8 >> symlinks for that?
9 >>
10 >> We're planning to start deploying a new GLEP 75-based [1] mirror
11 >layout
12 >> to our mirrors soonish. This implies a transitional period during
13 >> which
14 >> we'll be using both old and new layouts, so all file entries will be
15 >> duplicated. The plan is roughly to:
16 >>
17 >> 1. Enable new split layout in emirrordist, and start using both
18 >> simultaneously for newly-mirrored files.
19 >>
20 >> 2. Duplicate the existing distfiles to new layout.
21 >>
22 >> 3. Live with both layouts for some longish time, to support people
23 >> using
24 >> old Portage versions.
25 >>
26 >> 4. Eventually disable the old (flat) layout and start removing files.
27 >>
28 >> The basic problem is whether to use hardlinks or symlinks
29 >> for the duplicate files. I've elaborate more on both solutions in
30 >[2]
31 >> but I'll summarize shortly here.
32 >>
33 >> Hardlinks have the advantage that for mirrors enabling -H, they avoid
34 >> extra space usage and extra traffic. However, we don't really know
35 >how
36 >> many mirrors enable that, and I suspect it's around half of them.
37 >> At initial deployment time, rsync will just hardlink files in new
38 >> layout
39 >> to existing entries, and at cleanup time it will just unlink old
40 >> entries.
41 >>
42 >> For mirrors not enabling -H, hardlinks will mean all distfiles being
43 >> transferred again during deployment time. Furthermore, through all
44 >> transitional period all files will be duplicated, and so duplicated
45 >> will
46 >> be space usage. Cleanup should be lightweight though.
47 >>
48 >> Symlinks have the advantage that we know that all or almost all
49 >mirrors
50 >> enable them. They are lightweight at deployment time since it's just
51 >> a matter of rsync copying symlinks, and they definitely won't cause
52 >> double space usage. However, they will cause all files being
53 >> retransferred at cleanup time -- due to symlinks being replaced by
54 >real
55 >> files.
56 >>
57 >> Technically, I suppose we could avoid that by splitting that into two
58 >> stages, repeated for smaller groups of files. Firstly, replace
59 >> symlinks
60 >> with hardlinks which will make it light for at least some of the
61 >> errors.
62 >> Then, remove old files and jump over to the next group. For mirrors
63 >> not
64 >> using -H, this will still mean double transfer but we'd limit double
65 >> space usage to one group at a time, and only for a short period.
66 >>
67 >> If any mirrors sync over rsync without using -l (talking about
68 >private
69 >> mirrors here), they will not get the new layout at all which is going
70 >
71 >> to
72 >> suck for their users.
73 >>
74 >> Which way do you prefer?
75 >
76 >For soeasyto mirror, we are already using both -H and --links, and the
77 >mirror
78 >is hosted on a single partition, so, in order to preserve bandwith as
79 >you
80 >suggested, it's better to use hardlinks, keeping in mind that could
81 >cause
82 >server "overload" as per [1], but it is not an issue here.
83 >
84 >One question remains though: how will the layout.conf be created ? Is
85 >it
86 >by the
87 >mirror maintainer, or only by the master distfiles, and then all
88 >mirrors
89 >will
90 >automatically replicate it ? Because it could be interesting to let the
91 >
92 >mirror
93 >maintainer decide whether to use split or flat layout depending on
94 >their
95 >usage
96 >of hardlinks / symlinks, and leave the choice by providing a master for
97 >
98 >flat,
99 >hybrid, and split layouts ?
100
101 We are replicating layout.conf along with distfiles from the master mirror. This makes sense for the majority of the mirrors since they're going to replicate the structure of master mirror as well.
102
103 You can technically override this locally but you'd also have to adjust fetch procedure to account for the changed layout, i.e. run your custom tooling.
104
105 If you do not rsync from master mirror and e.g. use emirrordist locally, you'd have to create layout.conf yourself. The future version of emirrordist will respect your setting there (the patches are not merged yet).
106
107
108 >
109 >> [1] https://www.gentoo.org/glep/glep-0075.html
110 >> [2] https://bugs.gentoo.org/534528#c38
111
112
113 --
114 Best regards,
115 Michał Górny