Gentoo Archives: gentoo-dev

From: Joshua Kinard <kumba@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] New distfile mirror layout
Date: Sun, 20 Oct 2019 09:21:51
Message-Id: d6964a7b-e085-521b-dfba-f80b7985f414@gentoo.org
In Reply to: Re: [gentoo-dev] New distfile mirror layout by "Michał Górny"
1 On 10/20/2019 04:32, Michał Górny wrote:
2 > On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote:
3 >> On 10/20/2019 02:51, Michał Górny wrote:
4 >>> On Sat, 2019-10-19 at 19:24 -0400, Joshua Kinard wrote:
5 >>>> On 10/18/2019 09:41, Michał Górny wrote:
6 >>>>> Hi, everybody.
7 >>>>>
8 >>>>> It is my pleasure to announce that yesterday (EU) evening we've switched
9 >>>>> to a new distfile mirror layout. Users will be switching to the new
10 >>>>> layout either as they upgrade Portage to 2.3.77 or -- if they upgraded
11 >>>>> already -- as their caches expire (24hrs).
12 >>>>>
13 >>>>> The new layout is mostly a bow towards mirror admins, for some of whom
14 >>>>> having a 60000+ files in a single directory have been a problem.
15 >>>>> However, I suppose some of you also found e.g. the directory index
16 >>>>> hardly usable due to its size.
17 >>>>>
18 >>>>> Throughout a transitional period (whose exact length hasn't been decided
19 >>>>> yet), both layouts will be available. Afterwards, the old layout will
20 >>>>> be removed from mirrors. This has a few implications:
21 >>>>>
22 >>>>> 1. Users who don't upgrade their package managers in time will lose
23 >>>>> the ability of fetching from Gentoo mirrors. This shouldn't be that
24 >>>>> much of a problem given that the core software needed to upgrade Portage
25 >>>>> should all have reliable upstream SRC_URIs.
26 >>>>>
27 >>>>> 2. mirror://gentoo/file URIs will stop working. While technically you
28 >>>>> could use mirror://gentoo/XX/file, I'd rather recommend finally
29 >>>>> discarding its usage and moving distfiles to devspace.
30 >>>>>
31 >>>>> 3. Directly fetching files from distfiles.gentoo.org will become
32 >>>>> a little harder. To fetch a distfile named 'foo-1.tar.gz', you'd have
33 >>>>> to use something like:
34 >>>>>
35 >>>>> $ printf '%s' foo-1.tar.gz | b2sum | cut -c1-2
36 >>>>> 1b
37 >>>>> $ wget http://distfiles.gentoo.org/distfiles/1b/foo-1.tar.gz
38 >>>>> ...
39 >>>>>
40 >>>>>
41 >>>>> Alternatively, you can:
42 >>>>>
43 >>>>> $ wget http://distfiles.gentoo.org/distfiles/INDEX
44 >>>>>
45 >>>>> and grep for the right path there. This INDEX is also a more
46 >>>>> lightweight alternative to HTML indexes generated by the servers.
47 >>>>>
48 >>>>>
49 >>>>> If you're interested in more background details and some plots, see [1].
50 >>>>>
51 >>>>> [1] https://dev.gentoo.org/~mgorny/articles/improving-distfile-mirror-structure.html
52 >>>>>
53 >>>>
54 >>>> So the answer I didn't really see directly stated here is, where do new
55 >>>> distfiles need to go //now//? E.g., if on woodpecker, I currently cp a
56 >>>> distfile to /space/distfiles-local. What is the new directory I need to
57 >>>> use? And if mirror://gentoo/${FOO} is going away, for the new distfiles
58 >>>> target, what would be the applicable prefix to use?
59 >>>>
60 >>>> Directly using devspace seems like a bad idea, IMHO. Once long ago, we all
61 >>>> got chastised for doing exactly that. Too much possibility of fragmentation
62 >>>> as devs retire or package maintainership changes hands.
63 >>>
64 >>> Today you get chastised for using /space/distfiles-local and not
65 >>> following policy changes. The devmanual states that it's deprecated
66 >>> since at least 2011, and talks of using d.g.o [1].
67 >>
68 >> I don't recall this change being added as far back as 2011. Maybe my memory
69 >> is bad, but if it was done that long ago, it was done quietly, and it was
70 >> not enforced. I checked my local mailing list archives for gentoo-dev and
71 >> don't see any mention of distfiles-local being deprecated back then. Why
72 >> has it taken 8 years for this to get addressed?
73 >
74 > Don't ask me. I think I was already taught to use d.g.o back when I was
75 > recruited.
76 >
77 >> In any event, I still think using devspace is a bad idea. A centralized
78 >> distfiles repo is what most other distros use, and it's what we should use.
79 >
80 > Talking doesn't make things happen. Coming up with good proposals that
81 > address all the problems (e.g. those listed in devmanual) does.
82
83 Proposing changes when a direction has already been decided, the rudder
84 position changed, and engines put to full power is equally as pointless.
85 You're the defacto captain of this ship lately. I expect you to not rock
86 the boat too hard. This change is a pretty hard jolt, IMHO.
87
88
89 >>>> I looked at the whitepaper'ish-like writeup, and I kinda don't like using a
90 >>>> hash-based naming scheme on the new distfiles layout. I really kind prefer
91 >>>> breaking the directories up based on the first letter of the distfiles in
92 >>>> question, factoring case-sensitivity in (so you'd have 52 top-level
93 >>>> directories for A-Z and a-z, plus 10 more for 0-9). Under each of those
94 >>>> directories, additional subdirectories for the next few letters (say,
95 >>>> letters 2-3). Yes, this leads to some orphan cases where a distfile might
96 >>>> live on its own, but from a direct navigation standpoint, it's easy to find
97 >>>> for someone browsing the distfiles server and easy to predict where a
98 >>>> distfile is at.
99 >>>>
100 >>>> No math, statistical analysis, or deep-rooted knowledge of filesystems
101 >>>> behind that paragraph. Just a plain old unfiltered opinion. Sometimes, I
102 >>>> need to go get a distfile off the Gentoo mirrors, and being able to quickly
103 >>>> find it in the mirror root is great. Having to do hash calculations to work
104 >>>> out the file path will be *really* annoying.
105 >>>
106 >>> Your solution still doesn't solve the problem of having 8k-24k files
107 >>> in a single directory, even if you use 7 letters of prefix. So it just
108 >>> creates a lot of tiny directory noise for no practical gain.
109 >>
110 >> Why is having a max ~24k files in a directory a bad idea? Modern
111 >> filesystems are more than capable of handling that.
112 >>
113 >> - ext4: unlimited files in a directory
114 >> - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume)
115 >> - ntfs: 4,294,967,295
116 >>
117 >> And 24k is a bit more than 1/3rd of all distfiles that we currently have.
118 >
119 > For the same reason having ~60k files in a directory was a problem.
120 > There is really no point in changing anything if you change BIG_NUMBER
121 > to SMALLER_BIG_NUMBER.
122
123 That doesn't answer my question. Why is it a problem? What criteria are
124 you using to decide that 24k is a "smaller big number"? Is there some issue
125 highlighted by the mirror admins where having 24k files in a single
126 directory offers no significant relief versus the current 60k files?
127
128
129 >> Under which scenario do you wind up with 24k files in a single directory? I
130 >> consider the tex package an outlier in this case (one package should not be
131 >> the sole dictator of policy).
132 >
133 > Three versions of TeXLive living simultaneously. If one package falls
134 > completely out of bounds, no problem is solved by the change, so what's
135 > the point of making it?
136
137 The problem in this case is with texlive, not our current, or future,
138 distfiles methodology. Has anyone looked at how other distros deal with
139 texlive? Has anyone complained or filed a bug to texlive developers
140 upstream about their excessive amount of distfiles and the burden it places
141 on distro maintainers?
142
143 --
144 Joshua Kinard
145 Gentoo/MIPS
146 kumba@g.o
147 rsa6144/5C63F4E3F5C6C943 2015-04-27
148 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943
149
150 "The past tempts us, the present confuses us, the future frightens us. And
151 our lives slip away, moment by moment, lost in that vast, terrible in-between."
152
153 --Emperor Turhan, Centauri Republic

Replies

Subject Author
Re: [gentoo-dev] New distfile mirror layout "Michał Górny" <mgorny@g.o>