Re: [gentoo-portage-dev] Cache rewrite backport - gentoo-portage-dev

From:	Brian Harring <ferringb@g.o>
To:	gentoo-portage-dev@l.g.o
Subject:	Re: [gentoo-portage-dev] Cache rewrite backport
Date:	Wed, 12 Oct 2005 04:07:03
Message-Id:	`20051012040631.GB8851@nightcrawler`
In Reply to:	Re: [gentoo-portage-dev] Cache rewrite backport by Bastian Balthazar Bux

1

On Wed, Oct 12, 2005 at 03:49:44AM +0200, Bastian Balthazar Bux wrote:

2

> Brian Harring ha scritto:

3

> > On Wed, Oct 12, 2005 at 12:01:12AM +0200, Bastian Balthazar Bux wrote:

4

> >

5

> >>Sorry, but here the results are not those expected:

6

> >

7

> > .51.22 vs .53_rc5... try with a vanilla .53_rc5 please

8

>

9

> here they are, also added a test with a dirty trick to precharge the

10

> portage dir and see what happen. Look like there is a small improvement.

11

> Now it's late.

12

>

13

> ==== time emerge --metadata; 1st run; 2.0.53_rc5 vanilla

14

> real    9m44.449s

15

> user    4m51.034s

16

> sys     0m24.754s

17

>

18

> ==== time emerge --metadata; 2nd run; 2.0.53_rc5 vanilla

19

> real    2m50.932s

20

> user    0m12.597s

21

> sys     0m3.836s

22

>

23

> ==== time emerge --metadata; 3rd run; 2.0.53_rc5 vanilla

24

> real    1m55.445s

25

> user    0m12.501s

26

> sys     0m3.416s

27

>

28

> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata

29

> ==== ; 4th run; 2.0.53_rc5 vanilla

30

> real 1m10.275s

31

> user 0m13.377s

32

> sys  0m4.740s

33

>

34

>

35

> ==== time emerge --metadata; 1st run; 2.0.53_rc5 patched

36

> real 4m30.186s

37

> user 0m12.757s

38

> sys  0m9.921s

39

>

40

> ==== time emerge --metadata; 2nd run; 2.0.53_rc5 patched

41

> real 4m41.021s

42

> user 0m12.597s

43

> sys  0m9.297s

44

>

45

> ==== time emerge --metadata; 3rd run; 2.0.53_rc5 patched

46

> real 4m44.544s

47

> user 0m12.521s

48

> sys  0m9.457s

49

>

50

> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata

51

> ==== ; 4th run; 2.0.53_rc5 patched

52

> real 4m12.131s

53

> user 0m13.661s

54

> sys  0m10.329s

55

>

56

> >

57

> >

58

> >

59

> >>==== time emerge --metadata; 1st run; 2.0.51.22-r3

60

> >>real    2m24.419s

61

> >>user    0m12.329s

62

> >>sys     0m3.644s

63

> >>

64

> >>==== time emerge --metadata; 2nd run; 2.0.51.22-r3

65

> >>real    1m17.700s

66

> >>user    0m12.257s

67

> >>sys     0m2.976s

68

> >>

69

>

70

> [snip]

71

> the 2.0.51.22-r3 ones are still much faster on "real", please shade a

72

> light into my ignorance

73

Cache had to have been mostly full already; note the 4m51 for .53_rc5; 

74

.22-r3 would display the same if the cache was invalid (going from 

75

cache rewrite patch to .53_rc5 vanilla invalidates the local cache).

76

77

So... pretty much I'm ignoring the first .53_rc5 run, 2nd/3rd match up 

78

somewhat with .53.22-r3; main difference that comes to mind is that 

79

.53_rc5's --metadata code had a collection of extra checks/steps 

80

thrown in to protect against a lot of annoying tracebacks that were 

81

rearing there heads, and EAPI was added which would result in 

82

rewriting the cache entry on the fly.

83

84

Don't think it's the case though due to no matching user increase; 

85

difference in sys pretty much points at some extra IO occuring 

86

somewhere.

87

88

89

> > Meanwhile, thanks for testing; contrary to other results, but _any_

90

> > regression I'm after.

91

> > ~harring

92

>

93

> No, thank you to work on this, every time I've tryed to dive in portage

94

> I needed some day of hospital.

95

96

I'd suggest hitting the jim bean personally.  Replace the pounding 

97

headache with something you at least control... ;)

98

99

> look also at this additional try:

100

>

101

> ==== cp -a cache/* /dev/shm/

102

> ==== mount -obind /dev/shm /usr/portage/metadata/cache/

103

> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata

104

> ==== ; Nth run; 2.0.53_rc5 patched

105

>

106

> real 3m43.653s

107

> user 0m12.937s

108

> sys  0m9.817s

109

110

The copy instead of update accounts for that, which shouldn't occur 

111

with the experimental-4 patch in the other email.

112

113

> IMHO the "real" time of an emerge --metadata could be improved acting in

114

> two ways:

115

> 1) preload as much as possible data (stats included) from disk before to

116

> parse it

117

> 2) separating disk read  from disk writes (i.e many disk read followed

118

> by many disk writes followed ...)

119

120

I actually tried threading the bugger a while back... the improvement 

121

wasn't quite what I was hoping for, partially due to hitting issues 

122

with the global interpretter lock in python.

123

That said, could attempt it again, code for it is mostly 

124

straightforward

125

126

> Fetching virtual information from a cache save thousand of disk reads

127

> # find /var/db/pkg/ -name PROVIDE \

128

>   \( -exec echo -n {}\: \; -and -exec cat {} \; \) \

129

>   | egrep -v "PROVIDE:$"

130

131

a central cache of providers for the vdb would certainly make 

132

python -c'import portage' a helluva lot faster I'd expect.

133

134

Any takers to prototype it?

135

~harring

1	On Wed, Oct 12, 2005 at 03:49:44AM +0200, Bastian Balthazar Bux wrote:
2	> Brian Harring ha scritto:
3	> > On Wed, Oct 12, 2005 at 12:01:12AM +0200, Bastian Balthazar Bux wrote:
4	> >
5	> >>Sorry, but here the results are not those expected:
6	> >
7	> > .51.22 vs .53_rc5... try with a vanilla .53_rc5 please
8	>
9	> here they are, also added a test with a dirty trick to precharge the
10	> portage dir and see what happen. Look like there is a small improvement.
11	> Now it's late.
12	>
13	> ==== time emerge --metadata; 1st run; 2.0.53_rc5 vanilla
14	> real 9m44.449s
15	> user 4m51.034s
16	> sys 0m24.754s
17	>
18	> ==== time emerge --metadata; 2nd run; 2.0.53_rc5 vanilla
19	> real 2m50.932s
20	> user 0m12.597s
21	> sys 0m3.836s
22	>
23	> ==== time emerge --metadata; 3rd run; 2.0.53_rc5 vanilla
24	> real 1m55.445s
25	> user 0m12.501s
26	> sys 0m3.416s
27	>
28	> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
29	> ==== ; 4th run; 2.0.53_rc5 vanilla
30	> real 1m10.275s
31	> user 0m13.377s
32	> sys 0m4.740s
33	>
34	>
35	> ==== time emerge --metadata; 1st run; 2.0.53_rc5 patched
36	> real 4m30.186s
37	> user 0m12.757s
38	> sys 0m9.921s
39	>
40	> ==== time emerge --metadata; 2nd run; 2.0.53_rc5 patched
41	> real 4m41.021s
42	> user 0m12.597s
43	> sys 0m9.297s
44	>
45	> ==== time emerge --metadata; 3rd run; 2.0.53_rc5 patched
46	> real 4m44.544s
47	> user 0m12.521s
48	> sys 0m9.457s
49	>
50	> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
51	> ==== ; 4th run; 2.0.53_rc5 patched
52	> real 4m12.131s
53	> user 0m13.661s
54	> sys 0m10.329s
55	>
56	> >
57	> >
58	> >
59	> >>==== time emerge --metadata; 1st run; 2.0.51.22-r3
60	> >>real 2m24.419s
61	> >>user 0m12.329s
62	> >>sys 0m3.644s
63	> >>
64	> >>==== time emerge --metadata; 2nd run; 2.0.51.22-r3
65	> >>real 1m17.700s
66	> >>user 0m12.257s
67	> >>sys 0m2.976s
68	> >>
69	>
70	> [snip]
71	> the 2.0.51.22-r3 ones are still much faster on "real", please shade a
72	> light into my ignorance
73	Cache had to have been mostly full already; note the 4m51 for .53_rc5;
74	.22-r3 would display the same if the cache was invalid (going from
75	cache rewrite patch to .53_rc5 vanilla invalidates the local cache).
76
77	So... pretty much I'm ignoring the first .53_rc5 run, 2nd/3rd match up
78	somewhat with .53.22-r3; main difference that comes to mind is that
79	.53_rc5's --metadata code had a collection of extra checks/steps
80	thrown in to protect against a lot of annoying tracebacks that were
81	rearing there heads, and EAPI was added which would result in
82	rewriting the cache entry on the fly.
83
84	Don't think it's the case though due to no matching user increase;
85	difference in sys pretty much points at some extra IO occuring
86	somewhere.
87
88
89	> > Meanwhile, thanks for testing; contrary to other results, but _any_
90	> > regression I'm after.
91	> > ~harring
92	>
93	> No, thank you to work on this, every time I've tryed to dive in portage
94	> I needed some day of hospital.
95
96	I'd suggest hitting the jim bean personally. Replace the pounding
97	headache with something you at least control... ;)
98
99	> look also at this additional try:
100	>
101	> ==== cp -a cache/* /dev/shm/
102	> ==== mount -obind /dev/shm /usr/portage/metadata/cache/
103	> ==== tar -c /usr/portage/* >/dev/null & time emerge --metadata
104	> ==== ; Nth run; 2.0.53_rc5 patched
105	>
106	> real 3m43.653s
107	> user 0m12.937s
108	> sys 0m9.817s
109
110	The copy instead of update accounts for that, which shouldn't occur
111	with the experimental-4 patch in the other email.
112
113	> IMHO the "real" time of an emerge --metadata could be improved acting in
114	> two ways:
115	> 1) preload as much as possible data (stats included) from disk before to
116	> parse it
117	> 2) separating disk read from disk writes (i.e many disk read followed
118	> by many disk writes followed ...)
119
120	I actually tried threading the bugger a while back... the improvement
121	wasn't quite what I was hoping for, partially due to hitting issues
122	with the global interpretter lock in python.
123	That said, could attempt it again, code for it is mostly
124	straightforward
125
126	> Fetching virtual information from a cache save thousand of disk reads
127	> # find /var/db/pkg/ -name PROVIDE \
128	> \( -exec echo -n {}\: \; -and -exec cat {} \; \) \
129	> \| egrep -v "PROVIDE:$"
130
131	a central cache of providers for the vdb would certainly make
132	python -c'import portage' a helluva lot faster I'd expect.
133
134	Any takers to prototype it?
135	~harring

Gentoo Archives: gentoo-portage-dev