1 |
On 2019-10-08 05:33, Michał Górny wrote : |
2 |
|
3 |
> Hello, everyone. |
4 |
> |
5 |
> TL;DR: shortly, distfiles will need to be present under two paths for |
6 |
> the transitional period. Would you prefer us using hardlinks or |
7 |
> symlinks for that? |
8 |
> |
9 |
> We're planning to start deploying a new GLEP 75-based [1] mirror layout |
10 |
> to our mirrors soonish. This implies a transitional period during |
11 |
> which |
12 |
> we'll be using both old and new layouts, so all file entries will be |
13 |
> duplicated. The plan is roughly to: |
14 |
> |
15 |
> 1. Enable new split layout in emirrordist, and start using both |
16 |
> simultaneously for newly-mirrored files. |
17 |
> |
18 |
> 2. Duplicate the existing distfiles to new layout. |
19 |
> |
20 |
> 3. Live with both layouts for some longish time, to support people |
21 |
> using |
22 |
> old Portage versions. |
23 |
> |
24 |
> 4. Eventually disable the old (flat) layout and start removing files. |
25 |
> |
26 |
> The basic problem is whether to use hardlinks or symlinks |
27 |
> for the duplicate files. I've elaborate more on both solutions in [2] |
28 |
> but I'll summarize shortly here. |
29 |
> |
30 |
> Hardlinks have the advantage that for mirrors enabling -H, they avoid |
31 |
> extra space usage and extra traffic. However, we don't really know how |
32 |
> many mirrors enable that, and I suspect it's around half of them. |
33 |
> At initial deployment time, rsync will just hardlink files in new |
34 |
> layout |
35 |
> to existing entries, and at cleanup time it will just unlink old |
36 |
> entries. |
37 |
> |
38 |
> For mirrors not enabling -H, hardlinks will mean all distfiles being |
39 |
> transferred again during deployment time. Furthermore, through all |
40 |
> transitional period all files will be duplicated, and so duplicated |
41 |
> will |
42 |
> be space usage. Cleanup should be lightweight though. |
43 |
> |
44 |
> Symlinks have the advantage that we know that all or almost all mirrors |
45 |
> enable them. They are lightweight at deployment time since it's just |
46 |
> a matter of rsync copying symlinks, and they definitely won't cause |
47 |
> double space usage. However, they will cause all files being |
48 |
> retransferred at cleanup time -- due to symlinks being replaced by real |
49 |
> files. |
50 |
> |
51 |
> Technically, I suppose we could avoid that by splitting that into two |
52 |
> stages, repeated for smaller groups of files. Firstly, replace |
53 |
> symlinks |
54 |
> with hardlinks which will make it light for at least some of the |
55 |
> errors. |
56 |
> Then, remove old files and jump over to the next group. For mirrors |
57 |
> not |
58 |
> using -H, this will still mean double transfer but we'd limit double |
59 |
> space usage to one group at a time, and only for a short period. |
60 |
> |
61 |
> If any mirrors sync over rsync without using -l (talking about private |
62 |
> mirrors here), they will not get the new layout at all which is going |
63 |
> to |
64 |
> suck for their users. |
65 |
> |
66 |
> Which way do you prefer? |
67 |
|
68 |
For soeasyto mirror, we are already using both -H and --links, and the |
69 |
mirror |
70 |
is hosted on a single partition, so, in order to preserve bandwith as |
71 |
you |
72 |
suggested, it's better to use hardlinks, keeping in mind that could |
73 |
cause |
74 |
server "overload" as per [1], but it is not an issue here. |
75 |
|
76 |
One question remains though: how will the layout.conf be created ? Is it |
77 |
by the |
78 |
mirror maintainer, or only by the master distfiles, and then all mirrors |
79 |
will |
80 |
automatically replicate it ? Because it could be interesting to let the |
81 |
mirror |
82 |
maintainer decide whether to use split or flat layout depending on their |
83 |
usage |
84 |
of hardlinks / symlinks, and leave the choice by providing a master for |
85 |
flat, |
86 |
hybrid, and split layouts ? |
87 |
|
88 |
> [1] https://www.gentoo.org/glep/glep-0075.html |
89 |
> [2] https://bugs.gentoo.org/534528#c38 |