1 |
On 10/20/2019 05:44, Michał Górny wrote: |
2 |
> On Sun, 2019-10-20 at 05:21 -0400, Joshua Kinard wrote: |
3 |
>> On 10/20/2019 04:32, Michał Górny wrote: |
4 |
>>> On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote: |
5 |
>>>> Why is having a max ~24k files in a directory a bad idea? Modern |
6 |
>>>> filesystems are more than capable of handling that. |
7 |
>>>> |
8 |
>>>> - ext4: unlimited files in a directory |
9 |
>>>> - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume) |
10 |
>>>> - ntfs: 4,294,967,295 |
11 |
>>>> |
12 |
>>>> And 24k is a bit more than 1/3rd of all distfiles that we currently have. |
13 |
>>> |
14 |
>>> For the same reason having ~60k files in a directory was a problem. |
15 |
>>> There is really no point in changing anything if you change BIG_NUMBER |
16 |
>>> to SMALLER_BIG_NUMBER. |
17 |
>> |
18 |
>> That doesn't answer my question. Why is it a problem? What criteria are |
19 |
>> you using to decide that 24k is a "smaller big number"? Is there some issue |
20 |
>> highlighted by the mirror admins where having 24k files in a single |
21 |
>> directory offers no significant relief versus the current 60k files? |
22 |
> |
23 |
> IIRC Robin set the goal as: |
24 |
> |
25 |
> | the number of files in a single directory should not exceed 1000, [1] |
26 |
> |
27 |
> I don't recall how that number was chosen but it's probably pretty |
28 |
> arbitrary. In any case, I can notice the difference between working |
29 |
> with a listing of 1k files and 24k files, on the hardware running |
30 |
> masterdist. |
31 |
|
32 |
I think it would be prudent then to get some data to help underpin why that |
33 |
number was chosen and add that to the GLEP, possibly as one of the |
34 |
references at the bottom. Your personal observations of a system |
35 |
(masterdist) that few of us have access to is not good enough, especially |
36 |
for future developers who may revisit this topic long after you or I are gone. |
37 |
|
38 |
|
39 |
> |
40 |
>>>> Under which scenario do you wind up with 24k files in a single directory? I |
41 |
>>>> consider the tex package an outlier in this case (one package should not be |
42 |
>>>> the sole dictator of policy). |
43 |
>>> |
44 |
>>> Three versions of TeXLive living simultaneously. If one package falls |
45 |
>>> completely out of bounds, no problem is solved by the change, so what's |
46 |
>>> the point of making it? |
47 |
>> |
48 |
>> The problem in this case is with texlive, not our current, or future, |
49 |
>> distfiles methodology. |
50 |
> |
51 |
> Is it? Are you suggesting we should ban upstream from using multiple |
52 |
> distfiles with similar prefix? What about other potential packages that |
53 |
> may suffer from the same problem in the future? Go packages have a good |
54 |
> potential, given that majority of them starts with 'github.com'. |
55 |
|
56 |
Please highlight which of my words imply in any way that I want to ban |
57 |
something. I simply said texlive's significant number of distfiles is a |
58 |
problem. That doesn't mean that I want to resolve the problem by banning |
59 |
it, or future packages that employ that method. |
60 |
|
61 |
My concern is that out of the tens of thousands of packages we have, we're |
62 |
allowing ONE package to dictate how we shape a major piece of Gentoo |
63 |
infrastructure, and I don't feel that the proposed solution seeks to address |
64 |
it. Rather, it seeks to band-aid it by wrapping the entire distro up like a |
65 |
mummy. |
66 |
|
67 |
|
68 |
>> Has anyone looked at how other distros deal with texlive? |
69 |
> |
70 |
> Other distros don't mirror original distfiles. |
71 |
|
72 |
Has thought be given to doing the same? This is arguably a better approach |
73 |
than mirroring original distfiles in devspace. This would significantly |
74 |
reduce the infrastructure burden on the project. |
75 |
|
76 |
|
77 |
>> Has anyone complained or filed a bug to texlive developers |
78 |
>> upstream about their excessive amount of distfiles and the burden it places |
79 |
>> on distro maintainers? |
80 |
> |
81 |
> You believe it to be a problem. Don't expect others to bother upstream |
82 |
> with your preferences. |
83 |
|
84 |
Hah. So you consider texlive having 16k+ distfiles to be completely within |
85 |
operating norms then? |
86 |
|
87 |
I did a quick look, and it looks like the TeX project has a fairly |
88 |
comprehensive mirroring system distributed around the world. In fact, it |
89 |
looks like they emulate Perl's CPAN system with "CTAN": |
90 |
|
91 |
https://ctan.org/ |
92 |
|
93 |
I don't know the history of the texlive and other associated tex packages in |
94 |
Gentoo, but my guess is instead of doing what our Perl packages do, someone |
95 |
just decided to mirror the CTAN archive directly on the Gentoo distfiles |
96 |
system. It seems to me that what should actually happen is that we leverage |
97 |
CTAN itself, much like CPAN, and use their mirroring system instead of |
98 |
burdening our infrastructure as an unofficial CTAN archive. |
99 |
|
100 |
I know we've got a ton of Perl packages for the core set of Perl modules, |
101 |
but doesn't the CPAN eclass also have the capability to auto-generate an |
102 |
ebuild package for virtually any Perl package distributed via CPAN? Can |
103 |
that logic be used with the CTAN system in its own eclass and then we remove |
104 |
the 16k+ texlive modules off of our mirrors completely? Or at the worst, we |
105 |
might just have to generate ebuilds for texlive modules and treat them as |
106 |
discrete, installed packages. |
107 |
|
108 |
-- |
109 |
Joshua Kinard |
110 |
Gentoo/MIPS |
111 |
kumba@g.o |
112 |
rsa6144/5C63F4E3F5C6C943 2015-04-27 |
113 |
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 |
114 |
|
115 |
"The past tempts us, the present confuses us, the future frightens us. And |
116 |
our lives slip away, moment by moment, lost in that vast, terrible in-between." |
117 |
|
118 |
--Emperor Turhan, Centauri Republic |