Gentoo Archives: gentoo-dev

From: Rich Freeman <rich0@g.o>
To: gentoo-dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] New distfile mirror layout
Date: Wed, 23 Oct 2019 01:21:34
Message-Id: CAGfcS_=_dVopi1BNNGgvTQNSkM_yLC7ZTUbQ43uPHCwgB=Oxhw@mail.gmail.com
In Reply to: Re: [gentoo-dev] New distfile mirror layout by Richard Yao
1 On Mon, Oct 21, 2019 at 12:42 PM Richard Yao <ryao@g.o> wrote:
2 >
3 > Also, another idea is to use a cheap hash function (e.g. fletcher) and just have the mirrors do the hashing behind the scenes. Then we would have the best of both worlds.
4
5 I think something that is getting missed in this discussion is that we
6 don't control all of our mirrors, and they're generally donated
7 resources. Somebody has some webserver, and they stick a Debian
8 mirror in one directory tree, and an Arch one in another, and they're
9 kind enough to give us one too.
10
11 That is why we're seeing odder situations like ntfs and so on being
12 mentioned. They're not necessarily even running Linux, let alone zfs
13 or some other optimized filesystem. And their webserver might be set
14 up to do browsable directory indexes which could perform terribly even
15 if the filesystem itself is fine with direct filename lookups. It
16 doesn't matter if you have hashed b-trees or whatever for filename
17 lookups if you're going to ask the filesystem to give you a list of
18 every file in a large directory - it is going to have to traverse
19 whatever data structure it uses entirely to do so.
20
21 If we want to start putting requirements on hosting a mirror, then
22 we'll end up with less mirrors, and with mirrors more is usually
23 better. Ideally a mirror should just be a black box to us - we don't
24 really care what they're running because we don't depend on any mirror
25 individually. Likewise if we negatively impact mirror hosts we'll end
26 up with less mirrors. Sure, maybe those hosts have odd
27 configurations, but we're still better off with them than without.
28 That said we do seem to have a lot of mirrors so it probably isn't the
29 end of the world if we lose a limited number.
30
31 And there is nothing to say that we can't have some infra mirror set
32 up more for interactive browsing that we don't have people fetch from
33 but which dispenses with all the hashing or which bins by the first
34 letter of the filename/etc. It seems like most of the use cases where
35 hashing is inconvenient are for more casual use.
36
37 To avoid another reply, people are talking about having utilities that
38 can fetch distfiles using the new scheme. I'd think that "ebuild
39 foo.ebuild fetch" is probably the simplest solution for this. Chances
40 are that you're dealing with SRC_URI strings that have variable
41 substitution in them anyway, so just letting ebuild do the fetching
42 means you're not substituting ${PV} and so on, let alone all the stuff
43 versionator and its ilk do. And of course you can always just fetch
44 from upstream anyway if you do have a clean URI.
45
46 --
47 Rich