Gentoo Archives: gentoo-dev

From:	Rich Freeman <rich0@g.o>
To:	gentoo-dev <gentoo-dev@l.g.o>
Subject:	Re: [gentoo-dev] New distfile mirror layout
Date:	Wed, 23 Oct 2019 01:21:34
Message-Id:	`CAGfcS_=_dVopi1BNNGgvTQNSkM_yLC7ZTUbQ43uPHCwgB=Oxhw@mail.gmail.com`
In Reply to:	Re: [gentoo-dev] New distfile mirror layout by Richard Yao

1	On Mon, Oct 21, 2019 at 12:42 PM Richard Yao <ryao@g.o> wrote:
2	>
3	> Also, another idea is to use a cheap hash function (e.g. fletcher) and just have the mirrors do the hashing behind the scenes. Then we would have the best of both worlds.
4
5	I think something that is getting missed in this discussion is that we
6	don't control all of our mirrors, and they're generally donated
7	resources. Somebody has some webserver, and they stick a Debian
8	mirror in one directory tree, and an Arch one in another, and they're
9	kind enough to give us one too.
10
11	That is why we're seeing odder situations like ntfs and so on being
12	mentioned. They're not necessarily even running Linux, let alone zfs
13	or some other optimized filesystem. And their webserver might be set
14	up to do browsable directory indexes which could perform terribly even
15	if the filesystem itself is fine with direct filename lookups. It
16	doesn't matter if you have hashed b-trees or whatever for filename
17	lookups if you're going to ask the filesystem to give you a list of
18	every file in a large directory - it is going to have to traverse
19	whatever data structure it uses entirely to do so.
20
21	If we want to start putting requirements on hosting a mirror, then
22	we'll end up with less mirrors, and with mirrors more is usually
23	better. Ideally a mirror should just be a black box to us - we don't
24	really care what they're running because we don't depend on any mirror
25	individually. Likewise if we negatively impact mirror hosts we'll end
26	up with less mirrors. Sure, maybe those hosts have odd
27	configurations, but we're still better off with them than without.
28	That said we do seem to have a lot of mirrors so it probably isn't the
29	end of the world if we lose a limited number.
30
31	And there is nothing to say that we can't have some infra mirror set
32	up more for interactive browsing that we don't have people fetch from
33	but which dispenses with all the hashing or which bins by the first
34	letter of the filename/etc. It seems like most of the use cases where
35	hashing is inconvenient are for more casual use.
36
37	To avoid another reply, people are talking about having utilities that
38	can fetch distfiles using the new scheme. I'd think that "ebuild
39	foo.ebuild fetch" is probably the simplest solution for this. Chances
40	are that you're dealing with SRC_URI strings that have variable
41	substitution in them anyway, so just letting ebuild do the fetching
42	means you're not substituting ${PV} and so on, let alone all the stuff
43	versionator and its ilk do. And of course you can always just fetch
44	from upstream anyway if you do have a clean URI.
45
46	--
47	Rich

Report Message

Find on MARC Find on Google Groups