1 |
On Sun, Jan 28, 2018 at 09:30:31PM +0100, Andrew Barchuk wrote: |
2 |
> Hi everyone, |
3 |
> |
4 |
> > three possible solutions for splitting distfiles were listed: |
5 |
> There's another option to use character ranges for each directory |
6 |
> computed in a way to have the files distributed evenly. One way to do |
7 |
> that is to use filename prefix of dynamic length so that each range |
8 |
> holds the same number of files. E.g. we would have Ab/, Ap/, Ar/ but |
9 |
> texlive-module-te/, texlive-module-th/, texlive-module-ti/. A similar |
10 |
> but simpler option is to use file names as range bounds (the same way |
11 |
> dictionaries use words to demarcate page bounds): each directory will |
12 |
> have a name of the first file located inside. This way files will be |
13 |
> distributed evenly and it's still easy to pick a correct directory where |
14 |
> a file will be located manually. |
15 |
This was discussed early on, but thank you for the reminder, as it got |
16 |
dropped from later discussions. |
17 |
|
18 |
> [snip code] |
19 |
> Using the approach above the files will distributed evenly among the |
20 |
> directories keeping the possibility to determine the directory for a |
21 |
> specific file by hand. It's possible if necessary to keep the directory |
22 |
> structure unchanged for very long time and it will likely stay |
23 |
> well-balanced. Picking a directory for a file is very cheap. The only |
24 |
> obvious downside I see is that it's necessary to know list of |
25 |
> directories to pick the correct one (can be mitigated by caching the |
26 |
> list of directories if important). If it's desirable to make directory |
27 |
> names shorter or to look less like file names it's fairly easy to |
28 |
> achieve by keeping only unique prefixes of directories. For example: |
29 |
As for the problem you describe, one of the requirements in the |
30 |
discussion is that given ONLY the file or filename, and NOTHING ELSE, it |
31 |
should be possible to determine where in a hierarchy it should go. No |
32 |
prior knowledge about the hierarchy was permitted. Some parties might |
33 |
answer that you just need an index file then, but that means you have to |
34 |
keep the index file in sync often. |
35 |
|
36 |
It's a superbly readable result (in the general class of perfect hashes |
37 |
based on lots of well-known input). The class of solution suffers |
38 |
another problem in addition the one you noted: if input changes |
39 |
sufficiently, then rebalancing is expensive/hard. |
40 |
|
41 |
As a concrete example, say we add a new category for something something |
42 |
with lots of common prefixes in distfiles. |
43 |
dev-scratch/ as an example, where all distfiles start with 'scratch-'. |
44 |
Unless we know up-front that we're going to add a thousand distfiles |
45 |
here (not unreasonable, dev-python is ~1800 packages), they might start |
46 |
by going into the 'sc' directory, but later we want them to be in |
47 |
'scratch', as the tree is unweighted otherwise. |
48 |
|
49 |
-- |
50 |
Robin Hugh Johnson |
51 |
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer |
52 |
E-Mail : robbat2@g.o |
53 |
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |
54 |
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 |