Gentoo Archives: gentoo-dev

From: Joshua Kinard <kumba@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] New distfile mirror layout
Date: Sun, 20 Oct 2019 20:58:03
Message-Id: c5bcd085-b040-6ba0-3867-6eedff6fc8f3@gentoo.org
In Reply to: Re: [gentoo-dev] New distfile mirror layout by "Michał Górny"
1 On 10/20/2019 05:44, Michał Górny wrote:
2 > On Sun, 2019-10-20 at 05:21 -0400, Joshua Kinard wrote:
3 >> On 10/20/2019 04:32, Michał Górny wrote:
4 >>> On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote:
5 >>>> Why is having a max ~24k files in a directory a bad idea? Modern
6 >>>> filesystems are more than capable of handling that.
7 >>>>
8 >>>> - ext4: unlimited files in a directory
9 >>>> - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume)
10 >>>> - ntfs: 4,294,967,295
11 >>>>
12 >>>> And 24k is a bit more than 1/3rd of all distfiles that we currently have.
13 >>>
14 >>> For the same reason having ~60k files in a directory was a problem.
15 >>> There is really no point in changing anything if you change BIG_NUMBER
16 >>> to SMALLER_BIG_NUMBER.
17 >>
18 >> That doesn't answer my question. Why is it a problem? What criteria are
19 >> you using to decide that 24k is a "smaller big number"? Is there some issue
20 >> highlighted by the mirror admins where having 24k files in a single
21 >> directory offers no significant relief versus the current 60k files?
22 >
23 > IIRC Robin set the goal as:
24 >
25 > | the number of files in a single directory should not exceed 1000, [1]
26 >
27 > I don't recall how that number was chosen but it's probably pretty
28 > arbitrary. In any case, I can notice the difference between working
29 > with a listing of 1k files and 24k files, on the hardware running
30 > masterdist.
31
32 I think it would be prudent then to get some data to help underpin why that
33 number was chosen and add that to the GLEP, possibly as one of the
34 references at the bottom. Your personal observations of a system
35 (masterdist) that few of us have access to is not good enough, especially
36 for future developers who may revisit this topic long after you or I are gone.
37
38
39 >
40 >>>> Under which scenario do you wind up with 24k files in a single directory? I
41 >>>> consider the tex package an outlier in this case (one package should not be
42 >>>> the sole dictator of policy).
43 >>>
44 >>> Three versions of TeXLive living simultaneously. If one package falls
45 >>> completely out of bounds, no problem is solved by the change, so what's
46 >>> the point of making it?
47 >>
48 >> The problem in this case is with texlive, not our current, or future,
49 >> distfiles methodology.
50 >
51 > Is it? Are you suggesting we should ban upstream from using multiple
52 > distfiles with similar prefix? What about other potential packages that
53 > may suffer from the same problem in the future? Go packages have a good
54 > potential, given that majority of them starts with 'github.com'.
55
56 Please highlight which of my words imply in any way that I want to ban
57 something. I simply said texlive's significant number of distfiles is a
58 problem. That doesn't mean that I want to resolve the problem by banning
59 it, or future packages that employ that method.
60
61 My concern is that out of the tens of thousands of packages we have, we're
62 allowing ONE package to dictate how we shape a major piece of Gentoo
63 infrastructure, and I don't feel that the proposed solution seeks to address
64 it. Rather, it seeks to band-aid it by wrapping the entire distro up like a
65 mummy.
66
67
68 >> Has anyone looked at how other distros deal with texlive?
69 >
70 > Other distros don't mirror original distfiles.
71
72 Has thought be given to doing the same? This is arguably a better approach
73 than mirroring original distfiles in devspace. This would significantly
74 reduce the infrastructure burden on the project.
75
76
77 >> Has anyone complained or filed a bug to texlive developers
78 >> upstream about their excessive amount of distfiles and the burden it places
79 >> on distro maintainers?
80 >
81 > You believe it to be a problem. Don't expect others to bother upstream
82 > with your preferences.
83
84 Hah. So you consider texlive having 16k+ distfiles to be completely within
85 operating norms then?
86
87 I did a quick look, and it looks like the TeX project has a fairly
88 comprehensive mirroring system distributed around the world. In fact, it
89 looks like they emulate Perl's CPAN system with "CTAN":
90
91 https://ctan.org/
92
93 I don't know the history of the texlive and other associated tex packages in
94 Gentoo, but my guess is instead of doing what our Perl packages do, someone
95 just decided to mirror the CTAN archive directly on the Gentoo distfiles
96 system. It seems to me that what should actually happen is that we leverage
97 CTAN itself, much like CPAN, and use their mirroring system instead of
98 burdening our infrastructure as an unofficial CTAN archive.
99
100 I know we've got a ton of Perl packages for the core set of Perl modules,
101 but doesn't the CPAN eclass also have the capability to auto-generate an
102 ebuild package for virtually any Perl package distributed via CPAN? Can
103 that logic be used with the CTAN system in its own eclass and then we remove
104 the 16k+ texlive modules off of our mirrors completely? Or at the worst, we
105 might just have to generate ebuilds for texlive modules and treat them as
106 discrete, installed packages.
107
108 --
109 Joshua Kinard
110 Gentoo/MIPS
111 kumba@g.o
112 rsa6144/5C63F4E3F5C6C943 2015-04-27
113 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943
114
115 "The past tempts us, the present confuses us, the future frightens us. And
116 our lives slip away, moment by moment, lost in that vast, terrible in-between."
117
118 --Emperor Turhan, Centauri Republic

Replies

Subject Author
Re: [gentoo-dev] New distfile mirror layout Joshua Kinard <kumba@g.o>
Re: [gentoo-dev] New distfile mirror layout Kent Fredric <kentnl@g.o>