Gentoo Archives: gentoo-dev

From: "Gregory M. Turner" <gmt@×××××.us>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Re: Questions about SystemD and OpenRC
Date: Sat, 18 Aug 2012 20:52:16
Message-Id: 50300017.20803@malth.us
In Reply to: Re: [gentoo-dev] Re: Questions about SystemD and OpenRC by Rich Freeman
1 On 8/16/2012 6:26 PM, Rich Freeman wrote:
2 > On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@×××××.com> wrote:
3 >> The limited-visibility build feature discussed a week or so ago would
4 >> go a long way in detecting unexpressed build dependencies.
5
6 [snip]
7
8 > If portage has the
9 > dependency tree in RAM then you just need to dump all the edb listings
10 > for those packages plus @system and feed those into sandbox.
11
12 > That just requires reading a bunch of text files and no searching, so it
13 > should be pretty quick.
14
15 Portage could hypothetically compile such a list while it crawls the
16 package dependency tree, but I suspect the cost will not be small as you
17 predict.
18
19 > As far as I can tell the relevant calls to
20 > check for read access are already being made in sandbox already, and
21 > obviously they aren't taking forever. We just have to see if the
22 > search gets slow if the access list has tens of thousands of entries
23 > (if it does, that is just a simple matter of optimization, but being
24 > in-RAM I can't see how tens of thousands of entries is going to slow
25 > down a modern CPU even if it is just an unsorted list).
26
27 I appreciate your optimism but I think you're underestimating the cost.
28 Can't speak for others, but my portage db's churn too much for comfort
29 as is. Once we start multiplying per-package-dependency iteration by
30 the files-per-package iteration, that's going to be O(a-shit-load).
31
32 Of course, where there's a will there's a way. I'd be surprised if some
33 kind of delayed-evaluation + caching scheme wouldn't suffice, or,
34 barring that, perhaps it's time to create an indexed-database-based
35 drop-in replacement for the current portage db code.
36
37 I've enclosed some scripts you may find helpful in looking at the
38 numbers. They are kind-of kludgey (originally intended for
39 in-house-only use and modified for present purposes) but may help shed
40 some light, if they aren't too buggy, that is...
41
42 "dumpworld" slices and dices "emerge -ep" output to provide a list of
43 atoms in the complete dependency tree of a given list of atoms (add
44 '@system' to get the complete tree, dumpworld won't do so).
45
46 "dumpfiles" operates only on packages installed in the local system
47 (non-installed atoms are silently dropped), and requires/assumes that
48 'emerge -ep world' would not change anything if it is to give accurate
49 information. It takes a list of atoms, transforms them into the
50 complete lists of atoms in their dependency tree via dumpworld, merges
51 the lists together, and finds the number of files associated with each
52 atom in portage. Any collisions will be counted twice, since it doesn't
53 keep track. It also doesn't add '@system' unless you do. By default it
54 emits:
55
56 o A list of package atoms and the files owned by each atom (stderr)
57 o total atoms and files
58 o average filename length
59
60 What is, perhaps, more discouraging than the numbers it reports is how
61 long it takes to run (note: although I suspect an optimized python
62 implementation could be made to do this faster by a moderate constant
63 factor, I'm not sure if the big-oh performance characteristics can be
64 significantly improved without database structure changes like the ones
65 mentioned above).
66
67 My disturbingly bloated and slow workstation gives these answers (note:
68 here it's even slower because it's running in an emulator):
69
70 greg@fedora64vmw ~ $ time bash -c 'dumpfiles @system 2>/dev/null'
71 TOTAL: 402967 files (in 816 ebuilds, average path length: 66)
72
73
74 real 15m33.719s
75 user 13m18.909s
76 sys 2m8.436s
77 greg@fedora64vmw ~ $ time bash -c 'dumpfiles chromium 2>/dev/null'
78 TOTAL: 401300 files (in 807 ebuilds, average path length: 66)
79
80
81 real 15m28.900s
82 user 13m15.126s
83 sys 2m8.088s
84
85 My workstation is surely an "outlier" as I have a lot of dependencies
86 and files due to multilib, split-debug, and USE+=$( a lot ). It's also
87 got slow hardware Raid6 and the emulator only gives it 2G of ram to work
88 with. But I'm a real portage user; I'm sure there's other ones out
89 there, if not many, with similar constraints.
90
91 -gmt

Attachments

File name MIME type
dumpfiles text/plain
dumpworld text/plain