1 |
On 12/17/17 14:21, Michał Górny wrote: |
2 |
> ... |
3 |
|
4 |
> Rationale |
5 |
> ========= |
6 |
> |
7 |
> At this moment, syncing the repository implies fetching 'files' |
8 |
> directories of all packages, even though the relevant files are used |
9 |
> only when a ebuild referencing them is being built. This means that our |
10 |
> users fetch many files that they will never use -- either because they |
11 |
> don't need the package in question, or because the file belongs |
12 |
> to an old version. |
13 |
> |
14 |
> For example, 'du -h app-shells/bash/files' states 232K while only three |
15 |
> of those files are used by the newest version, and everything else are |
16 |
> patches for old versions. And in case of bash, we're keeping those |
17 |
> versions pretty much 'forever'. |
18 |
> |
19 |
> The new policy mostly targets large patchsets and files relevant to old |
20 |
> package versions. By removing them from the repository, we're hoping to |
21 |
> reduce the growth of its size a bit and reduce the amount of data |
22 |
> transferred via rsync. |
23 |
|
24 |
Evaluating transfer size, since on-disk size is different and the latter |
25 |
will vary |
26 |
|
27 |
The numbers are interesting: |
28 |
- Total size of the tree: 224509 KiB #1 |
29 |
- Total size of files in files/: 27809 KiB #2 |
30 |
- Cumulative files/ >= 32KiB : 3289 KiB #2 |
31 |
|
32 |
Some simple math later and we discover that removing _all_ files from |
33 |
the offending packages would give only a 1,5% reduction in transfer size. |
34 |
Removing _all_ files/ directory would spare 12,4% or 1/8 |
35 |
|
36 |
I don't have numbers for the past, but if I recall correctly currently |
37 |
the situation is greener than 10 years ago. |
38 |
This to point that _some_ policy is _beneficial_ to avoid an explosion |
39 |
of the repo size. |
40 |
However restricting it further IMO would give very little benefit and |
41 |
(looking at the packages involved) make life harder for no good reason. |
42 |
|
43 |
It would be interesting instead to evaluate ways to remove _all_ files/ |
44 |
dirs from the tree, keeping ebuilds separated from data. |
45 |
a different tree for files/ can seen a cleaner approach, give all |
46 |
ebuilds the same mechanism to personalize patches & co, remove limits in |
47 |
size (well not all limits) |
48 |
Obviously the cost of such an operation is order of magnitude higher |
49 |
than putting some policies in place. |
50 |
|
51 |
#1 obtained with: find * -type f -exec cat {} + | wc -c |
52 |
|
53 |
#2 list obtained with: |
54 |
cd $PORTDIR |
55 |
for files in $(find * -type d -name files) ; do |
56 |
echo -n $(find ${files} -type f -exec cat {} + | wc -c) |
57 |
echo ",${files%/files}" |
58 |
done |
59 |
|
60 |
Best Regards, |
61 |
Francesco |