Gentoo Archives: gentoo-portage-dev

From: Brian Harring <ferringb@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Cache rewrite backport
Date: Wed, 12 Oct 2005 04:07:03
Message-Id: 20051012040631.GB8851@nightcrawler
In Reply to: Re: [gentoo-portage-dev] Cache rewrite backport by Bastian Balthazar Bux
1 On Wed, Oct 12, 2005 at 03:49:44AM +0200, Bastian Balthazar Bux wrote:
2 > Brian Harring ha scritto:
3 > > On Wed, Oct 12, 2005 at 12:01:12AM +0200, Bastian Balthazar Bux wrote:
4 > >
5 > >>Sorry, but here the results are not those expected:
6 > >
7 > > .51.22 vs .53_rc5... try with a vanilla .53_rc5 please
8 >
9 > here they are, also added a test with a dirty trick to precharge the
10 > portage dir and see what happen. Look like there is a small improvement.
11 > Now it's late.
12 >
13 > ==== time emerge --metadata; 1st run; 2.0.53_rc5 vanilla
14 > real 9m44.449s
15 > user 4m51.034s
16 > sys 0m24.754s
17 >
18 > ==== time emerge --metadata; 2nd run; 2.0.53_rc5 vanilla
19 > real 2m50.932s
20 > user 0m12.597s
21 > sys 0m3.836s
22 >
23 > ==== time emerge --metadata; 3rd run; 2.0.53_rc5 vanilla
24 > real 1m55.445s
25 > user 0m12.501s
26 > sys 0m3.416s
27 >
28 > ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
29 > ==== ; 4th run; 2.0.53_rc5 vanilla
30 > real 1m10.275s
31 > user 0m13.377s
32 > sys 0m4.740s
33 >
34 >
35 > ==== time emerge --metadata; 1st run; 2.0.53_rc5 patched
36 > real 4m30.186s
37 > user 0m12.757s
38 > sys 0m9.921s
39 >
40 > ==== time emerge --metadata; 2nd run; 2.0.53_rc5 patched
41 > real 4m41.021s
42 > user 0m12.597s
43 > sys 0m9.297s
44 >
45 > ==== time emerge --metadata; 3rd run; 2.0.53_rc5 patched
46 > real 4m44.544s
47 > user 0m12.521s
48 > sys 0m9.457s
49 >
50 > ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
51 > ==== ; 4th run; 2.0.53_rc5 patched
52 > real 4m12.131s
53 > user 0m13.661s
54 > sys 0m10.329s
55 >
56 > >
57 > >
58 > >
59 > >>==== time emerge --metadata; 1st run; 2.0.51.22-r3
60 > >>real 2m24.419s
61 > >>user 0m12.329s
62 > >>sys 0m3.644s
63 > >>
64 > >>==== time emerge --metadata; 2nd run; 2.0.51.22-r3
65 > >>real 1m17.700s
66 > >>user 0m12.257s
67 > >>sys 0m2.976s
68 > >>
69 >
70 > [snip]
71 > the 2.0.51.22-r3 ones are still much faster on "real", please shade a
72 > light into my ignorance
73 Cache had to have been mostly full already; note the 4m51 for .53_rc5;
74 .22-r3 would display the same if the cache was invalid (going from
75 cache rewrite patch to .53_rc5 vanilla invalidates the local cache).
76
77 So... pretty much I'm ignoring the first .53_rc5 run, 2nd/3rd match up
78 somewhat with .53.22-r3; main difference that comes to mind is that
79 .53_rc5's --metadata code had a collection of extra checks/steps
80 thrown in to protect against a lot of annoying tracebacks that were
81 rearing there heads, and EAPI was added which would result in
82 rewriting the cache entry on the fly.
83
84 Don't think it's the case though due to no matching user increase;
85 difference in sys pretty much points at some extra IO occuring
86 somewhere.
87
88
89 > > Meanwhile, thanks for testing; contrary to other results, but _any_
90 > > regression I'm after.
91 > > ~harring
92 >
93 > No, thank you to work on this, every time I've tryed to dive in portage
94 > I needed some day of hospital.
95
96 I'd suggest hitting the jim bean personally. Replace the pounding
97 headache with something you at least control... ;)
98
99 > look also at this additional try:
100 >
101 > ==== cp -a cache/* /dev/shm/
102 > ==== mount -obind /dev/shm /usr/portage/metadata/cache/
103 > ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
104 > ==== ; Nth run; 2.0.53_rc5 patched
105 >
106 > real 3m43.653s
107 > user 0m12.937s
108 > sys 0m9.817s
109
110 The copy instead of update accounts for that, which shouldn't occur
111 with the experimental-4 patch in the other email.
112
113 > IMHO the "real" time of an emerge --metadata could be improved acting in
114 > two ways:
115 > 1) preload as much as possible data (stats included) from disk before to
116 > parse it
117 > 2) separating disk read from disk writes (i.e many disk read followed
118 > by many disk writes followed ...)
119
120 I actually tried threading the bugger a while back... the improvement
121 wasn't quite what I was hoping for, partially due to hitting issues
122 with the global interpretter lock in python.
123 That said, could attempt it again, code for it is mostly
124 straightforward
125
126 > Fetching virtual information from a cache save thousand of disk reads
127 > # find /var/db/pkg/ -name PROVIDE \
128 > \( -exec echo -n {}\: \; -and -exec cat {} \; \) \
129 > | egrep -v "PROVIDE:$"
130
131 a central cache of providers for the vdb would certainly make
132 python -c'import portage' a helluva lot faster I'd expect.
133
134 Any takers to prototype it?
135 ~harring