1 |
On 01/28/2018 02:00 PM, Andrew Barchuk wrote: |
2 |
>> To the contrary, that would not remain balanced, because your |
3 |
>> boundaries are entirely dependent on exactly what is in the tree at |
4 |
>> the moment you run your script. Now the package manager has to perform |
5 |
>> directory listing, sort and find the file name that's closest, open |
6 |
>> that directory, find the next closest filename (assuming multiple |
7 |
>> levels of hierarchy), and so on, or you have to store yet another |
8 |
>> index that duplicates information and takes additional space. Locating |
9 |
>> by distfile name hash is effectively free. |
10 |
> |
11 |
> Sure, the tree won't be perfectly balanced but it will be pretty close. |
12 |
> E.g. if texlive-* dominates the tree today it will likely continue |
13 |
> dominating it for another 5 years. Statistical distribution of distfile |
14 |
> names will likely be changing very slowly. |
15 |
> |
16 |
> Doing a binary search through a list of couple of hundred of directories |
17 |
> is really cheap. I don't see a reason to organize distfiles in a |
18 |
> multi-level hierarchy: e.g. if the goal is to keep no more than 1000 |
19 |
> files in a folder than the limit of single level hierarchy is a million |
20 |
> which is more than enough for foreseeable future. The list of 500 |
21 |
> directories takes 15kB when using full file names and will be couple of |
22 |
> times smaller when using only unique prefixes. |
23 |
|
24 |
In order to use that for distfiles mirrors, such that clients could know |
25 |
where to fetch the files from, you'd need the mirror's http server to |
26 |
redirect the request to the appropriate location (since the location |
27 |
would not be predictable from the client side). |
28 |
-- |
29 |
Thanks, |
30 |
Zac |