1 |
On Mon, Oct 21, 2019 at 12:42 PM Richard Yao <ryao@g.o> wrote: |
2 |
> |
3 |
> Also, another idea is to use a cheap hash function (e.g. fletcher) and just have the mirrors do the hashing behind the scenes. Then we would have the best of both worlds. |
4 |
|
5 |
I think something that is getting missed in this discussion is that we |
6 |
don't control all of our mirrors, and they're generally donated |
7 |
resources. Somebody has some webserver, and they stick a Debian |
8 |
mirror in one directory tree, and an Arch one in another, and they're |
9 |
kind enough to give us one too. |
10 |
|
11 |
That is why we're seeing odder situations like ntfs and so on being |
12 |
mentioned. They're not necessarily even running Linux, let alone zfs |
13 |
or some other optimized filesystem. And their webserver might be set |
14 |
up to do browsable directory indexes which could perform terribly even |
15 |
if the filesystem itself is fine with direct filename lookups. It |
16 |
doesn't matter if you have hashed b-trees or whatever for filename |
17 |
lookups if you're going to ask the filesystem to give you a list of |
18 |
every file in a large directory - it is going to have to traverse |
19 |
whatever data structure it uses entirely to do so. |
20 |
|
21 |
If we want to start putting requirements on hosting a mirror, then |
22 |
we'll end up with less mirrors, and with mirrors more is usually |
23 |
better. Ideally a mirror should just be a black box to us - we don't |
24 |
really care what they're running because we don't depend on any mirror |
25 |
individually. Likewise if we negatively impact mirror hosts we'll end |
26 |
up with less mirrors. Sure, maybe those hosts have odd |
27 |
configurations, but we're still better off with them than without. |
28 |
That said we do seem to have a lot of mirrors so it probably isn't the |
29 |
end of the world if we lose a limited number. |
30 |
|
31 |
And there is nothing to say that we can't have some infra mirror set |
32 |
up more for interactive browsing that we don't have people fetch from |
33 |
but which dispenses with all the hashing or which bins by the first |
34 |
letter of the filename/etc. It seems like most of the use cases where |
35 |
hashing is inconvenient are for more casual use. |
36 |
|
37 |
To avoid another reply, people are talking about having utilities that |
38 |
can fetch distfiles using the new scheme. I'd think that "ebuild |
39 |
foo.ebuild fetch" is probably the simplest solution for this. Chances |
40 |
are that you're dealing with SRC_URI strings that have variable |
41 |
substitution in them anyway, so just letting ebuild do the fetching |
42 |
means you're not substituting ${PV} and so on, let alone all the stuff |
43 |
versionator and its ilk do. And of course you can always just fetch |
44 |
from upstream anyway if you do have a clean URI. |
45 |
|
46 |
-- |
47 |
Rich |