Gentoo Archives: gentoo-mirrors

From: SoEasyTo Mirrors Manager <mirrors@××××××××.com>
To: gentoo-mirrors@l.g.o
Subject: Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
Date: Tue, 08 Oct 2019 07:13:11
Message-Id: 9e5ad79b3917c714e0813501f6349f0b@soeasyto.com
In Reply to: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? by "Michał Górny"
1 On 2019-10-08 05:33, Michał Górny wrote :
2
3 > Hello, everyone.
4 >
5 > TL;DR: shortly, distfiles will need to be present under two paths for
6 > the transitional period. Would you prefer us using hardlinks or
7 > symlinks for that?
8 >
9 > We're planning to start deploying a new GLEP 75-based [1] mirror layout
10 > to our mirrors soonish. This implies a transitional period during
11 > which
12 > we'll be using both old and new layouts, so all file entries will be
13 > duplicated. The plan is roughly to:
14 >
15 > 1. Enable new split layout in emirrordist, and start using both
16 > simultaneously for newly-mirrored files.
17 >
18 > 2. Duplicate the existing distfiles to new layout.
19 >
20 > 3. Live with both layouts for some longish time, to support people
21 > using
22 > old Portage versions.
23 >
24 > 4. Eventually disable the old (flat) layout and start removing files.
25 >
26 > The basic problem is whether to use hardlinks or symlinks
27 > for the duplicate files. I've elaborate more on both solutions in [2]
28 > but I'll summarize shortly here.
29 >
30 > Hardlinks have the advantage that for mirrors enabling -H, they avoid
31 > extra space usage and extra traffic. However, we don't really know how
32 > many mirrors enable that, and I suspect it's around half of them.
33 > At initial deployment time, rsync will just hardlink files in new
34 > layout
35 > to existing entries, and at cleanup time it will just unlink old
36 > entries.
37 >
38 > For mirrors not enabling -H, hardlinks will mean all distfiles being
39 > transferred again during deployment time. Furthermore, through all
40 > transitional period all files will be duplicated, and so duplicated
41 > will
42 > be space usage. Cleanup should be lightweight though.
43 >
44 > Symlinks have the advantage that we know that all or almost all mirrors
45 > enable them. They are lightweight at deployment time since it's just
46 > a matter of rsync copying symlinks, and they definitely won't cause
47 > double space usage. However, they will cause all files being
48 > retransferred at cleanup time -- due to symlinks being replaced by real
49 > files.
50 >
51 > Technically, I suppose we could avoid that by splitting that into two
52 > stages, repeated for smaller groups of files. Firstly, replace
53 > symlinks
54 > with hardlinks which will make it light for at least some of the
55 > errors.
56 > Then, remove old files and jump over to the next group. For mirrors
57 > not
58 > using -H, this will still mean double transfer but we'd limit double
59 > space usage to one group at a time, and only for a short period.
60 >
61 > If any mirrors sync over rsync without using -l (talking about private
62 > mirrors here), they will not get the new layout at all which is going
63 > to
64 > suck for their users.
65 >
66 > Which way do you prefer?
67
68 For soeasyto mirror, we are already using both -H and --links, and the
69 mirror
70 is hosted on a single partition, so, in order to preserve bandwith as
71 you
72 suggested, it's better to use hardlinks, keeping in mind that could
73 cause
74 server "overload" as per [1], but it is not an issue here.
75
76 One question remains though: how will the layout.conf be created ? Is it
77 by the
78 mirror maintainer, or only by the master distfiles, and then all mirrors
79 will
80 automatically replicate it ? Because it could be interesting to let the
81 mirror
82 maintainer decide whether to use split or flat layout depending on their
83 usage
84 of hardlinks / symlinks, and leave the choice by providing a master for
85 flat,
86 hybrid, and split layouts ?
87
88 > [1] https://www.gentoo.org/glep/glep-0075.html
89 > [2] https://bugs.gentoo.org/534528#c38

Replies

Subject Author
Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? "Martin Kubiak (TUBS)" <martin.kubiak@×××××.de>
Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? "Michał Górny" <mgorny@g.o>