1 |
> To the contrary, that would not remain balanced, because your |
2 |
> boundaries are entirely dependent on exactly what is in the tree at |
3 |
> the moment you run your script. Now the package manager has to perform |
4 |
> directory listing, sort and find the file name that's closest, open |
5 |
> that directory, find the next closest filename (assuming multiple |
6 |
> levels of hierarchy), and so on, or you have to store yet another |
7 |
> index that duplicates information and takes additional space. Locating |
8 |
> by distfile name hash is effectively free. |
9 |
|
10 |
Sure, the tree won't be perfectly balanced but it will be pretty close. |
11 |
E.g. if texlive-* dominates the tree today it will likely continue |
12 |
dominating it for another 5 years. Statistical distribution of distfile |
13 |
names will likely be changing very slowly. |
14 |
|
15 |
Doing a binary search through a list of couple of hundred of directories |
16 |
is really cheap. I don't see a reason to organize distfiles in a |
17 |
multi-level hierarchy: e.g. if the goal is to keep no more than 1000 |
18 |
files in a folder than the limit of single level hierarchy is a million |
19 |
which is more than enough for foreseeable future. The list of 500 |
20 |
directories takes 15kB when using full file names and will be couple of |
21 |
times smaller when using only unique prefixes. |
22 |
|
23 |
--- |
24 |
Andrew |