Gentoo Archives: gentoo-mirrors

From: "Martin Kubiak (TUBS)" <martin.kubiak@×××××.de>
To: gentoo-mirrors@l.g.o
Subject: Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
Date: Tue, 08 Oct 2019 07:23:51
Message-Id: 1091E737-1A2E-4EC4-9EE9-A2A10D87CBDB@tu-bs.de
In Reply to: Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? by SoEasyTo Mirrors Manager
1 Hi,
2
3 We do both @ ftp.rz.tu-bs.de
4
5 Martin
6
7
8 Von unterwegs gesendet.
9
10 > Am 08.10.2019 um 09:13 schrieb SoEasyTo Mirrors Manager <mirrors@××××××××.com>:
11 >
12 >> On 2019-10-08 05:33, Michał Górny wrote :
13 >>
14 >> Hello, everyone.
15 >> TL;DR: shortly, distfiles will need to be present under two paths for
16 >> the transitional period. Would you prefer us using hardlinks or
17 >> symlinks for that?
18 >> We're planning to start deploying a new GLEP 75-based [1] mirror layout
19 >> to our mirrors soonish. This implies a transitional period during which
20 >> we'll be using both old and new layouts, so all file entries will be
21 >> duplicated. The plan is roughly to:
22 >> 1. Enable new split layout in emirrordist, and start using both
23 >> simultaneously for newly-mirrored files.
24 >> 2. Duplicate the existing distfiles to new layout.
25 >> 3. Live with both layouts for some longish time, to support people using
26 >> old Portage versions.
27 >> 4. Eventually disable the old (flat) layout and start removing files.
28 >> The basic problem is whether to use hardlinks or symlinks
29 >> for the duplicate files. I've elaborate more on both solutions in [2]
30 >> but I'll summarize shortly here.
31 >> Hardlinks have the advantage that for mirrors enabling -H, they avoid
32 >> extra space usage and extra traffic. However, we don't really know how
33 >> many mirrors enable that, and I suspect it's around half of them.
34 >> At initial deployment time, rsync will just hardlink files in new layout
35 >> to existing entries, and at cleanup time it will just unlink old
36 >> entries.
37 >> For mirrors not enabling -H, hardlinks will mean all distfiles being
38 >> transferred again during deployment time. Furthermore, through all
39 >> transitional period all files will be duplicated, and so duplicated will
40 >> be space usage. Cleanup should be lightweight though.
41 >> Symlinks have the advantage that we know that all or almost all mirrors
42 >> enable them. They are lightweight at deployment time since it's just
43 >> a matter of rsync copying symlinks, and they definitely won't cause
44 >> double space usage. However, they will cause all files being
45 >> retransferred at cleanup time -- due to symlinks being replaced by real
46 >> files.
47 >> Technically, I suppose we could avoid that by splitting that into two
48 >> stages, repeated for smaller groups of files. Firstly, replace symlinks
49 >> with hardlinks which will make it light for at least some of the errors.
50 >> Then, remove old files and jump over to the next group. For mirrors not
51 >> using -H, this will still mean double transfer but we'd limit double
52 >> space usage to one group at a time, and only for a short period.
53 >> If any mirrors sync over rsync without using -l (talking about private
54 >> mirrors here), they will not get the new layout at all which is going to
55 >> suck for their users.
56 >> Which way do you prefer?
57 >
58 > For soeasyto mirror, we are already using both -H and --links, and the mirror
59 > is hosted on a single partition, so, in order to preserve bandwith as you
60 > suggested, it's better to use hardlinks, keeping in mind that could cause
61 > server "overload" as per [1], but it is not an issue here.
62 >
63 > One question remains though: how will the layout.conf be created ? Is it by the
64 > mirror maintainer, or only by the master distfiles, and then all mirrors will
65 > automatically replicate it ? Because it could be interesting to let the mirror
66 > maintainer decide whether to use split or flat layout depending on their usage
67 > of hardlinks / symlinks, and leave the choice by providing a master for flat,
68 > hybrid, and split layouts ?
69 >
70 >> [1] https://www.gentoo.org/glep/glep-0075.html
71 >> [2] https://bugs.gentoo.org/534528#c38
72 >