Gentoo Archives: gentoo-mirrors

From: "Michał Górny" <mgorny@g.o>
To: gentoo-mirrors@l.g.o
Cc: infrastructure <infrastructure@g.o>
Subject: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks?
Date: Tue, 08 Oct 2019 03:33:59
Message-Id: 0a281500f949432666cda1e948db6b062e99ced5.camel@gentoo.org
1 Hello, everyone.
2
3 TL;DR: shortly, distfiles will need to be present under two paths for
4 the transitional period. Would you prefer us using hardlinks or
5 symlinks for that?
6
7
8 We're planning to start deploying a new GLEP 75-based [1] mirror layout
9 to our mirrors soonish. This implies a transitional period during which
10 we'll be using both old and new layouts, so all file entries will be
11 duplicated. The plan is roughly to:
12
13 1. Enable new split layout in emirrordist, and start using both
14 simultaneously for newly-mirrored files.
15
16 2. Duplicate the existing distfiles to new layout.
17
18 3. Live with both layouts for some longish time, to support people using
19 old Portage versions.
20
21 4. Eventually disable the old (flat) layout and start removing files.
22
23
24 The basic problem is whether to use hardlinks or symlinks
25 for the duplicate files. I've elaborate more on both solutions in [2]
26 but I'll summarize shortly here.
27
28 Hardlinks have the advantage that for mirrors enabling -H, they avoid
29 extra space usage and extra traffic. However, we don't really know how
30 many mirrors enable that, and I suspect it's around half of them.
31 At initial deployment time, rsync will just hardlink files in new layout
32 to existing entries, and at cleanup time it will just unlink old
33 entries.
34
35 For mirrors not enabling -H, hardlinks will mean all distfiles being
36 transferred again during deployment time. Furthermore, through all
37 transitional period all files will be duplicated, and so duplicated will
38 be space usage. Cleanup should be lightweight though.
39
40 Symlinks have the advantage that we know that all or almost all mirrors
41 enable them. They are lightweight at deployment time since it's just
42 a matter of rsync copying symlinks, and they definitely won't cause
43 double space usage. However, they will cause all files being
44 retransferred at cleanup time -- due to symlinks being replaced by real
45 files.
46
47 Technically, I suppose we could avoid that by splitting that into two
48 stages, repeated for smaller groups of files. Firstly, replace symlinks
49 with hardlinks which will make it light for at least some of the errors.
50 Then, remove old files and jump over to the next group. For mirrors not
51 using -H, this will still mean double transfer but we'd limit double
52 space usage to one group at a time, and only for a short period.
53
54 If any mirrors sync over rsync without using -l (talking about private
55 mirrors here), they will not get the new layout at all which is going to
56 suck for their users.
57
58
59 Which way do you prefer?
60
61
62 [1] https://www.gentoo.org/glep/glep-0075.html
63 [2] https://bugs.gentoo.org/534528#c38
64
65 --
66 Best regards,
67 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? SoEasyTo Mirrors Manager <mirrors@××××××××.com>
Re: [gentoo-mirrors] New distfile mirror deployment: hardlinks or symlinks? Carlos Carvalho <carlos@×××××××××××.br>