Gentoo Archives: gentoo-dev

From: Gordon Pettey <petteyg359@×××××.com>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [pre-GLEP] Split distfile mirror directory structure
Date: Sun, 28 Jan 2018 21:18:00
Message-Id: CAHY5Med=UkgdNy4qFwv8dfM2VakTA3PGexNbHJ8pTn0dj+hVLg@mail.gmail.com
In Reply to: Re: [gentoo-dev] [pre-GLEP] Split distfile mirror directory structure by Andrew Barchuk
1 On Sun, Jan 28, 2018 at 2:43 PM, Andrew Barchuk <andrew@×××××××.io> wrote:
2 > There's another option to use character ranges for each directory
3 > computed in a way to have the files distributed evenly. One way to do
4 > that is to use filename prefix of dynamic length so that each range
5 > holds the same number of files. E.g. we would have Ab/, Ap/, Ar/ but
6 > texlive-module-te/, texlive-module-th/, texlive-module-ti/. A similar
7 > but simpler option is to use file names as range bounds (the same way
8 > dictionaries use words to demarcate page bounds): each directory will
9 > have a name of the first file located inside. This way files will be
10 > distributed evenly and it's still easy to pick a correct directory where
11 > a file will be located manually.
12 >
13 > ...snip...
14 >
15 > Using the approach above the files will distributed evenly among the
16 > directories keeping the possibility to determine the directory for a
17 > specific file by hand. It's possible if necessary to keep the directory
18 > structure unchanged for very long time and it will likely stay
19 > well-balanced. Picking a directory for a file is very cheap. The only
20 > obvious downside I see is that it's necessary to know list of
21 > directories to pick the correct one (can be mitigated by caching the
22 > list of directories if important). If it's desirable to make directory
23 > names shorter or to look less like file names it's fairly easy to
24 > achieve by keeping only unique prefixes of directories. For example:
25
26 To the contrary, that would not remain balanced, because your
27 boundaries are entirely dependent on exactly what is in the tree at
28 the moment you run your script. Now the package manager has to perform
29 directory listing, sort and find the file name that's closest, open
30 that directory, find the next closest filename (assuming multiple
31 levels of hierarchy), and so on, or you have to store yet another
32 index that duplicates information and takes additional space. Locating
33 by distfile name hash is effectively free.

Replies

Subject Author
Re: [gentoo-dev] [pre-GLEP] Split distfile mirror directory structure Andrew Barchuk <andrew@×××××××.io>