Gentoo Archives: gentoo-dev

From: Francesco Riosa <vivo75@×××××.com>
To: gentoo-dev@l.g.o, "Michał Górny" <mgorny@g.o>, gentoo-dev-announce <gentoo-dev-announce@l.g.o>
Subject: Re: [gentoo-dev] [QA] New policy: 'files' directory must not be larger than 32 KiB
Date: Mon, 18 Dec 2017 12:45:54
Message-Id: 05aeb982-ece6-31e1-0969-36d1c4f9b6d7@gmail.com
In Reply to: [gentoo-dev] [QA] New policy: 'files' directory must not be larger than 32 KiB by "Michał Górny"
1 On 12/17/17 14:21, Michał Górny wrote:
2 > ...
3
4 > Rationale
5 > =========
6 >
7 > At this moment, syncing the repository implies fetching 'files'
8 > directories of all packages, even though the relevant files are used
9 > only when a ebuild referencing them is being built. This means that our
10 > users fetch many files that they will never use -- either because they
11 > don't need the package in question, or because the file belongs
12 > to an old version.
13 >
14 > For example, 'du -h app-shells/bash/files' states 232K while only three
15 > of those files are used by the newest version, and everything else are
16 > patches for old versions. And in case of bash, we're keeping those
17 > versions pretty much 'forever'.
18 >
19 > The new policy mostly targets large patchsets and files relevant to old
20 > package versions. By removing them from the repository, we're hoping to
21 > reduce the growth of its size a bit and reduce the amount of data
22 > transferred via rsync.
23
24 Evaluating transfer size, since on-disk size is different and the latter
25 will vary
26
27 The numbers are interesting:
28 - Total size of the tree: 224509 KiB #1
29 - Total size of files in files/: 27809 KiB #2
30 - Cumulative files/ >= 32KiB : 3289 KiB #2
31
32 Some simple math later and we discover that removing _all_ files from
33 the offending packages would give only a 1,5% reduction in transfer size.
34 Removing _all_ files/ directory would spare 12,4% or 1/8
35
36 I don't have numbers for the past, but if I recall correctly currently
37 the situation is greener than 10 years ago.
38 This to point that _some_ policy is _beneficial_ to avoid an explosion
39 of the repo size.
40 However restricting it further IMO would give very little benefit and
41 (looking at the packages involved) make life harder for no good reason.
42
43 It would be interesting instead to evaluate ways to remove _all_ files/
44 dirs from the tree, keeping ebuilds separated from data.
45 a different tree for files/ can seen a cleaner approach, give all
46 ebuilds the same mechanism to personalize patches & co, remove limits in
47 size (well not all limits)
48 Obviously the cost of such an operation is order of magnitude higher
49 than putting some policies in place.
50
51 #1 obtained with: find * -type f -exec cat {} + | wc -c
52
53 #2 list obtained with:
54 cd $PORTDIR
55 for files in $(find * -type d -name files) ; do
56     echo -n $(find ${files} -type f -exec cat {} + | wc -c)
57     echo ",${files%/files}"
58 done
59
60 Best Regards,
61 Francesco

Replies