1 |
On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote: |
2 |
> On 10/20/2019 02:51, Michał Górny wrote: |
3 |
> > On Sat, 2019-10-19 at 19:24 -0400, Joshua Kinard wrote: |
4 |
> > > On 10/18/2019 09:41, Michał Górny wrote: |
5 |
> > > > Hi, everybody. |
6 |
> > > > |
7 |
> > > > It is my pleasure to announce that yesterday (EU) evening we've switched |
8 |
> > > > to a new distfile mirror layout. Users will be switching to the new |
9 |
> > > > layout either as they upgrade Portage to 2.3.77 or -- if they upgraded |
10 |
> > > > already -- as their caches expire (24hrs). |
11 |
> > > > |
12 |
> > > > The new layout is mostly a bow towards mirror admins, for some of whom |
13 |
> > > > having a 60000+ files in a single directory have been a problem. |
14 |
> > > > However, I suppose some of you also found e.g. the directory index |
15 |
> > > > hardly usable due to its size. |
16 |
> > > > |
17 |
> > > > Throughout a transitional period (whose exact length hasn't been decided |
18 |
> > > > yet), both layouts will be available. Afterwards, the old layout will |
19 |
> > > > be removed from mirrors. This has a few implications: |
20 |
> > > > |
21 |
> > > > 1. Users who don't upgrade their package managers in time will lose |
22 |
> > > > the ability of fetching from Gentoo mirrors. This shouldn't be that |
23 |
> > > > much of a problem given that the core software needed to upgrade Portage |
24 |
> > > > should all have reliable upstream SRC_URIs. |
25 |
> > > > |
26 |
> > > > 2. mirror://gentoo/file URIs will stop working. While technically you |
27 |
> > > > could use mirror://gentoo/XX/file, I'd rather recommend finally |
28 |
> > > > discarding its usage and moving distfiles to devspace. |
29 |
> > > > |
30 |
> > > > 3. Directly fetching files from distfiles.gentoo.org will become |
31 |
> > > > a little harder. To fetch a distfile named 'foo-1.tar.gz', you'd have |
32 |
> > > > to use something like: |
33 |
> > > > |
34 |
> > > > $ printf '%s' foo-1.tar.gz | b2sum | cut -c1-2 |
35 |
> > > > 1b |
36 |
> > > > $ wget http://distfiles.gentoo.org/distfiles/1b/foo-1.tar.gz |
37 |
> > > > ... |
38 |
> > > > |
39 |
> > > > |
40 |
> > > > Alternatively, you can: |
41 |
> > > > |
42 |
> > > > $ wget http://distfiles.gentoo.org/distfiles/INDEX |
43 |
> > > > |
44 |
> > > > and grep for the right path there. This INDEX is also a more |
45 |
> > > > lightweight alternative to HTML indexes generated by the servers. |
46 |
> > > > |
47 |
> > > > |
48 |
> > > > If you're interested in more background details and some plots, see [1]. |
49 |
> > > > |
50 |
> > > > [1] https://dev.gentoo.org/~mgorny/articles/improving-distfile-mirror-structure.html |
51 |
> > > > |
52 |
> > > |
53 |
> > > So the answer I didn't really see directly stated here is, where do new |
54 |
> > > distfiles need to go //now//? E.g., if on woodpecker, I currently cp a |
55 |
> > > distfile to /space/distfiles-local. What is the new directory I need to |
56 |
> > > use? And if mirror://gentoo/${FOO} is going away, for the new distfiles |
57 |
> > > target, what would be the applicable prefix to use? |
58 |
> > > |
59 |
> > > Directly using devspace seems like a bad idea, IMHO. Once long ago, we all |
60 |
> > > got chastised for doing exactly that. Too much possibility of fragmentation |
61 |
> > > as devs retire or package maintainership changes hands. |
62 |
> > |
63 |
> > Today you get chastised for using /space/distfiles-local and not |
64 |
> > following policy changes. The devmanual states that it's deprecated |
65 |
> > since at least 2011, and talks of using d.g.o [1]. |
66 |
> |
67 |
> I don't recall this change being added as far back as 2011. Maybe my memory |
68 |
> is bad, but if it was done that long ago, it was done quietly, and it was |
69 |
> not enforced. I checked my local mailing list archives for gentoo-dev and |
70 |
> don't see any mention of distfiles-local being deprecated back then. Why |
71 |
> has it taken 8 years for this to get addressed? |
72 |
|
73 |
Don't ask me. I think I was already taught to use d.g.o back when I was |
74 |
recruited. |
75 |
|
76 |
> In any event, I still think using devspace is a bad idea. A centralized |
77 |
> distfiles repo is what most other distros use, and it's what we should use. |
78 |
|
79 |
Talking doesn't make things happen. Coming up with good proposals that |
80 |
address all the problems (e.g. those listed in devmanual) does. |
81 |
|
82 |
> > > I looked at the whitepaper'ish-like writeup, and I kinda don't like using a |
83 |
> > > hash-based naming scheme on the new distfiles layout. I really kind prefer |
84 |
> > > breaking the directories up based on the first letter of the distfiles in |
85 |
> > > question, factoring case-sensitivity in (so you'd have 52 top-level |
86 |
> > > directories for A-Z and a-z, plus 10 more for 0-9). Under each of those |
87 |
> > > directories, additional subdirectories for the next few letters (say, |
88 |
> > > letters 2-3). Yes, this leads to some orphan cases where a distfile might |
89 |
> > > live on its own, but from a direct navigation standpoint, it's easy to find |
90 |
> > > for someone browsing the distfiles server and easy to predict where a |
91 |
> > > distfile is at. |
92 |
> > > |
93 |
> > > No math, statistical analysis, or deep-rooted knowledge of filesystems |
94 |
> > > behind that paragraph. Just a plain old unfiltered opinion. Sometimes, I |
95 |
> > > need to go get a distfile off the Gentoo mirrors, and being able to quickly |
96 |
> > > find it in the mirror root is great. Having to do hash calculations to work |
97 |
> > > out the file path will be *really* annoying. |
98 |
> > |
99 |
> > Your solution still doesn't solve the problem of having 8k-24k files |
100 |
> > in a single directory, even if you use 7 letters of prefix. So it just |
101 |
> > creates a lot of tiny directory noise for no practical gain. |
102 |
> |
103 |
> Why is having a max ~24k files in a directory a bad idea? Modern |
104 |
> filesystems are more than capable of handling that. |
105 |
> |
106 |
> - ext4: unlimited files in a directory |
107 |
> - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume) |
108 |
> - ntfs: 4,294,967,295 |
109 |
> |
110 |
> And 24k is a bit more than 1/3rd of all distfiles that we currently have. |
111 |
|
112 |
For the same reason having ~60k files in a directory was a problem. |
113 |
There is really no point in changing anything if you change BIG_NUMBER |
114 |
to SMALLER_BIG_NUMBER. |
115 |
|
116 |
> Under which scenario do you wind up with 24k files in a single directory? I |
117 |
> consider the tex package an outlier in this case (one package should not be |
118 |
> the sole dictator of policy). |
119 |
|
120 |
Three versions of TeXLive living simultaneously. If one package falls |
121 |
completely out of bounds, no problem is solved by the change, so what's |
122 |
the point of making it? |
123 |
|
124 |
-- |
125 |
Best regards, |
126 |
Michał Górny |