1 |
On Sun, Jan 28, 2018 at 2:43 PM, Andrew Barchuk <andrew@×××××××.io> wrote: |
2 |
> There's another option to use character ranges for each directory |
3 |
> computed in a way to have the files distributed evenly. One way to do |
4 |
> that is to use filename prefix of dynamic length so that each range |
5 |
> holds the same number of files. E.g. we would have Ab/, Ap/, Ar/ but |
6 |
> texlive-module-te/, texlive-module-th/, texlive-module-ti/. A similar |
7 |
> but simpler option is to use file names as range bounds (the same way |
8 |
> dictionaries use words to demarcate page bounds): each directory will |
9 |
> have a name of the first file located inside. This way files will be |
10 |
> distributed evenly and it's still easy to pick a correct directory where |
11 |
> a file will be located manually. |
12 |
> |
13 |
> ...snip... |
14 |
> |
15 |
> Using the approach above the files will distributed evenly among the |
16 |
> directories keeping the possibility to determine the directory for a |
17 |
> specific file by hand. It's possible if necessary to keep the directory |
18 |
> structure unchanged for very long time and it will likely stay |
19 |
> well-balanced. Picking a directory for a file is very cheap. The only |
20 |
> obvious downside I see is that it's necessary to know list of |
21 |
> directories to pick the correct one (can be mitigated by caching the |
22 |
> list of directories if important). If it's desirable to make directory |
23 |
> names shorter or to look less like file names it's fairly easy to |
24 |
> achieve by keeping only unique prefixes of directories. For example: |
25 |
|
26 |
To the contrary, that would not remain balanced, because your |
27 |
boundaries are entirely dependent on exactly what is in the tree at |
28 |
the moment you run your script. Now the package manager has to perform |
29 |
directory listing, sort and find the file name that's closest, open |
30 |
that directory, find the next closest filename (assuming multiple |
31 |
levels of hierarchy), and so on, or you have to store yet another |
32 |
index that duplicates information and takes additional space. Locating |
33 |
by distfile name hash is effectively free. |