1 |
On 8/16/2012 6:26 PM, Rich Freeman wrote: |
2 |
> On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@×××××.com> wrote: |
3 |
>> The limited-visibility build feature discussed a week or so ago would |
4 |
>> go a long way in detecting unexpressed build dependencies. |
5 |
|
6 |
[snip] |
7 |
|
8 |
> If portage has the |
9 |
> dependency tree in RAM then you just need to dump all the edb listings |
10 |
> for those packages plus @system and feed those into sandbox. |
11 |
|
12 |
> That just requires reading a bunch of text files and no searching, so it |
13 |
> should be pretty quick. |
14 |
|
15 |
Portage could hypothetically compile such a list while it crawls the |
16 |
package dependency tree, but I suspect the cost will not be small as you |
17 |
predict. |
18 |
|
19 |
> As far as I can tell the relevant calls to |
20 |
> check for read access are already being made in sandbox already, and |
21 |
> obviously they aren't taking forever. We just have to see if the |
22 |
> search gets slow if the access list has tens of thousands of entries |
23 |
> (if it does, that is just a simple matter of optimization, but being |
24 |
> in-RAM I can't see how tens of thousands of entries is going to slow |
25 |
> down a modern CPU even if it is just an unsorted list). |
26 |
|
27 |
I appreciate your optimism but I think you're underestimating the cost. |
28 |
Can't speak for others, but my portage db's churn too much for comfort |
29 |
as is. Once we start multiplying per-package-dependency iteration by |
30 |
the files-per-package iteration, that's going to be O(a-shit-load). |
31 |
|
32 |
Of course, where there's a will there's a way. I'd be surprised if some |
33 |
kind of delayed-evaluation + caching scheme wouldn't suffice, or, |
34 |
barring that, perhaps it's time to create an indexed-database-based |
35 |
drop-in replacement for the current portage db code. |
36 |
|
37 |
I've enclosed some scripts you may find helpful in looking at the |
38 |
numbers. They are kind-of kludgey (originally intended for |
39 |
in-house-only use and modified for present purposes) but may help shed |
40 |
some light, if they aren't too buggy, that is... |
41 |
|
42 |
"dumpworld" slices and dices "emerge -ep" output to provide a list of |
43 |
atoms in the complete dependency tree of a given list of atoms (add |
44 |
'@system' to get the complete tree, dumpworld won't do so). |
45 |
|
46 |
"dumpfiles" operates only on packages installed in the local system |
47 |
(non-installed atoms are silently dropped), and requires/assumes that |
48 |
'emerge -ep world' would not change anything if it is to give accurate |
49 |
information. It takes a list of atoms, transforms them into the |
50 |
complete lists of atoms in their dependency tree via dumpworld, merges |
51 |
the lists together, and finds the number of files associated with each |
52 |
atom in portage. Any collisions will be counted twice, since it doesn't |
53 |
keep track. It also doesn't add '@system' unless you do. By default it |
54 |
emits: |
55 |
|
56 |
o A list of package atoms and the files owned by each atom (stderr) |
57 |
o total atoms and files |
58 |
o average filename length |
59 |
|
60 |
What is, perhaps, more discouraging than the numbers it reports is how |
61 |
long it takes to run (note: although I suspect an optimized python |
62 |
implementation could be made to do this faster by a moderate constant |
63 |
factor, I'm not sure if the big-oh performance characteristics can be |
64 |
significantly improved without database structure changes like the ones |
65 |
mentioned above). |
66 |
|
67 |
My disturbingly bloated and slow workstation gives these answers (note: |
68 |
here it's even slower because it's running in an emulator): |
69 |
|
70 |
greg@fedora64vmw ~ $ time bash -c 'dumpfiles @system 2>/dev/null' |
71 |
TOTAL: 402967 files (in 816 ebuilds, average path length: 66) |
72 |
|
73 |
|
74 |
real 15m33.719s |
75 |
user 13m18.909s |
76 |
sys 2m8.436s |
77 |
greg@fedora64vmw ~ $ time bash -c 'dumpfiles chromium 2>/dev/null' |
78 |
TOTAL: 401300 files (in 807 ebuilds, average path length: 66) |
79 |
|
80 |
|
81 |
real 15m28.900s |
82 |
user 13m15.126s |
83 |
sys 2m8.088s |
84 |
|
85 |
My workstation is surely an "outlier" as I have a lot of dependencies |
86 |
and files due to multilib, split-debug, and USE+=$( a lot ). It's also |
87 |
got slow hardware Raid6 and the emulator only gives it 2G of ram to work |
88 |
with. But I'm a real portage user; I'm sure there's other ones out |
89 |
there, if not many, with similar constraints. |
90 |
|
91 |
-gmt |