Re: [gentoo-dev] Packages up for grabs - gentoo-dev

From:	Tom Wijsman <TomWij@g.o>
To:	gentoo-dev@l.g.o
Cc:	pacho@g.o
Subject:	Re: [gentoo-dev] Packages up for grabs
Date:	Sun, 16 Jun 2013 18:26:18
Message-Id:	`20130616202324.45cb3262@TOMWIJ-GENTOO`
In Reply to:	Re: [gentoo-dev] Packages up for grabs by Pacho Ramos

1

On Sun, 16 Jun 2013 19:21:38 +0200

2

Pacho Ramos <pacho@g.o> wrote:

3

4

> El dom, 16-06-2013 a las 10:09 -0700, Brian Dolbec escribió:

5

> [...]

6

> > Thank you for considering helping.  I have stayed away form the

7

> > intricate details of package management in the past, but I also do

8

> > not like how long portage is taking now for dep calculations.

9

>

10

> And, cannot that efforts be put in enhancing portage instead?

11

12

To make you see the problems and decisions, I'm going to elaborate a

13

little and would like you to ask yourself some questions.

14

15

Is it possible to reasonable enhance the Portage code to improve dep

16

calculations in a reasonable amount of time?

17

18

Let's take a call graph, demonstrating the amount of calls on the

19

arrows and the amount of ticks spend in the call in the boxes:

20

21

http://i.imgur.com/A93CdNR.png

22

23

Which part do you think is problematic? What can we do to get an

24

improvement in time that you can actually benefit from? Which

25

improvements are reasonable to implement? ...?

26

27

Ignoring that call graph, you could look at what has recently been

28

introduced to increase the amount of time needed to calculate the

29

dependency graph; you don't have to look far.

30

31

http://blogs.gentoo.org/mgorny/2013/05/27/the-pointless-art-of-subslots/

32

33

While I don't want point out the contents of that blog post, the title

34

is relevant; implementing features like subslots on an algorithm that

35

was not written with subslots in mind introduces a lot of inefficiency.

36

37

And it's not just subslots, newer features keep getting added to the

38

dependency graph calculation and it gets slower and slower over time.

39

It feels like revdep-rebuild moved into the dependency calculation!

40

41

A combination of these two above arguments ("where do we start?" and

42

"why try to fix something broken by design?") makes me feel that there

43

is need for a huge refactoring; ask yourself another question, what if

44

these systems were written from scratch with subslots in mind?

45

46

Exactly, if you write a system with the features in mind you can write

47

it much more efficient and don't have to deal with historical decisions

48

that you have made in the past; you can continue writing without having

49

to change half of your code, because you though about it in advance.

50

But well, this is true in the start; some EAPIs later, history repeats!

51

52

So, when we acknowledge that it is useless to waste effort on fixes

53

that are unlikely to have a huge benefit or rewriting something that

54

might as well get replaced some years later; we should instead look for

55

better opportunities to do, there are multiple options:

56

57

 1) Spend an awful lot of time refactoring our well known Portage;

58

    this doesn't only involve moving code around, but also nuking

59

    legacy code you can't deal with (see my earlier response) as well

60

    as writing new code where it is needed. It may sound easy, it isn't.

61

62

 2) Write a system from scratch that is certain to be future proof;

63

    it is designed with the current and future specifications in mind,

64

    not based on specifications that were awesome some time in the past.

65

66

 3) Spend time on learning how pkgcore is implemented, then improve it;

67

    pkgcore can be somewhat seen as a a mix from (1) and (2).

68

69

Then, which option would you pick?

70

71

I'm not saying Portage or the team behind it is bad; it is just a bit

72

at the end of its time, I'm afraid of what the future will do to it.

73

74

For option (1), I've thinked slightly further than to just generate the

75

dependency graph, there are two things that came to mind that might be

76

interesting _if_ we can get it to somehow work:

77

78

 A) Multiple threads

79

80

    I think the way trees work (branches), threads are a huge benefit.

81

82

    Maybe zmedico can elaborate on this again, but there were some

83

    problems that make it not easy to introduce these threads; there

84

    was something regarding a global interpreter lock, but I can't

85

    recall the details and am not that acquainted with Python.

86

87

    Besides that, the threads will have to be organized; so properly

88

    designing the threads, locks and inter-thread interactions might be

89

    an interesting task that could require some effort.

90

91

 B) Additional caching

92

93

    Some of the parts of the dependency graph calculation could benefit

94

    from a bit of caching; implementing caching right might be a tricky

95

    thing, too small cache is useless and too large cache is overhead.

96

97

    Let me take one example part; resolving the USE flags to consider

98

    which packages are dependencies, this happens again and again.

99

100

    For example, let's say you have

101

102

        >=dev-libs/glib-2.33:2

103

        gnome-keyring? ( >=app-crypt/libsecret-0.5 )

104

        introspection? ( >=dev-libs/gobject-introspection-1 )

105

106

    then Portage has to go and turn that into maybe something like

107

108

        >=dev-libs/glib-2.33:2

109

110

    because the user has neither USE flag set; and it is not only the

111

    USE flags the user has set, but also the USE flag in profiles, the

112

    default USE flags, the REQUIRED_USE and sometimes even other USE

113

    flags like "use1? ( use2? ( ATOM ) )". Heh, complexity!

114

115

    So, let's say we want to cache this operation, then we could store

116

    a pair of the following details in the cache:

117

118

    - Checksum of the ebuild.

119

    - USE flags that the user relevant to the ebuild.

120

    - Resulting dependency variables.

121

122

    So, instead of having to compute the whole thing, it simply can

123

    look up the checksum and the USE flags the user has set and

124

    immediately get back the right dependency string. That sounds

125

    awesome, but how well does the cache function?

126

127

    To know that, we would have to look at cache invalidation.

128

129

    - How often does the ebuild checksum change?

130

      --> Not so much, especially not for stable ebuilds.

131

132

    - How often do the users USE flags change for a specific ebuild?

133

      --> Not so much, only when the user explicitly changes it or some

134

      masking happens in the profile which both don't happen too much.

135

136

    That's really great, but now three sad questions:

137

138

    - But how big does this cache grow?

139

      No idea, it requires another study that implements half of this.

140

141

    - But how much does this really speed up?

142

      Hard to tell without trying it.

143

144

    - Erm, what about the USE flags the reverse dependencies force?

145

      Oops, no idea is perfect; can we resolve that?! Heh, no idea...

146

147

You can see that it is not hard to come up with ideas like (A) and (B);

148

but, it is much harder to come up with ideas that actually work, which

149

is why I think we will not see any beneficial improvement to Portage

150

tree soon, unless we are going to drop features and write alternatives.

151

152

Back to the options...

153

154

For option (2) I made a very small start and might continue with this

155

over the vacation; but before I do that, I'm likely going to evaluate

156

option (3) if other people are going to jump in as well, perhaps

157

helping along pkgcore can help me gain knowledge to better write (2)

158

further in the future when pkgcore is found to be past its time.

159

160

Whatever we do, I hope a good educated choice is made...

161

162

Until then, I hope I can continue to reasonably use Portage.

163

164

--

165

With kind regards,

166

167

Tom Wijsman (TomWij)

168

Gentoo Developer

169

170

E-mail address  : TomWij@g.o

171

GPG Public Key  : 6D34E57D

172

GPG Fingerprint : C165 AF18 AB4C 400B C3D2  ABF0 95B2 1FCD 6D34 E57D

Subject	Author
[gentoo-dev] Re: Packages up for grabs	Duncan <1i5t5.duncan@×××.net>
Re: [gentoo-dev] Packages up for grabs	Ciaran McCreesh <ciaran.mccreesh@××××××××××.com>
Re: [gentoo-dev] Packages up for grabs	Zac Medico <zmedico@g.o>

Gentoo Archives: gentoo-dev

Attachments

Replies

1	On Sun, 16 Jun 2013 19:21:38 +0200
2	Pacho Ramos <pacho@g.o> wrote:
3
4	> El dom, 16-06-2013 a las 10:09 -0700, Brian Dolbec escribió:
5	> [...]
6	> > Thank you for considering helping. I have stayed away form the
7	> > intricate details of package management in the past, but I also do
8	> > not like how long portage is taking now for dep calculations.
9	>
10	> And, cannot that efforts be put in enhancing portage instead?
11
12	To make you see the problems and decisions, I'm going to elaborate a
13	little and would like you to ask yourself some questions.
14
15	Is it possible to reasonable enhance the Portage code to improve dep
16	calculations in a reasonable amount of time?
17
18	Let's take a call graph, demonstrating the amount of calls on the
19	arrows and the amount of ticks spend in the call in the boxes:
20
21	http://i.imgur.com/A93CdNR.png
22
23	Which part do you think is problematic? What can we do to get an
24	improvement in time that you can actually benefit from? Which
25	improvements are reasonable to implement? ...?
26
27	Ignoring that call graph, you could look at what has recently been
28	introduced to increase the amount of time needed to calculate the
29	dependency graph; you don't have to look far.
30
31	http://blogs.gentoo.org/mgorny/2013/05/27/the-pointless-art-of-subslots/
32
33	While I don't want point out the contents of that blog post, the title
34	is relevant; implementing features like subslots on an algorithm that
35	was not written with subslots in mind introduces a lot of inefficiency.
36
37	And it's not just subslots, newer features keep getting added to the
38	dependency graph calculation and it gets slower and slower over time.
39	It feels like revdep-rebuild moved into the dependency calculation!
40
41	A combination of these two above arguments ("where do we start?" and
42	"why try to fix something broken by design?") makes me feel that there
43	is need for a huge refactoring; ask yourself another question, what if
44	these systems were written from scratch with subslots in mind?
45
46	Exactly, if you write a system with the features in mind you can write
47	it much more efficient and don't have to deal with historical decisions
48	that you have made in the past; you can continue writing without having
49	to change half of your code, because you though about it in advance.
50	But well, this is true in the start; some EAPIs later, history repeats!
51
52	So, when we acknowledge that it is useless to waste effort on fixes
53	that are unlikely to have a huge benefit or rewriting something that
54	might as well get replaced some years later; we should instead look for
55	better opportunities to do, there are multiple options:
56
57	1) Spend an awful lot of time refactoring our well known Portage;
58	this doesn't only involve moving code around, but also nuking
59	legacy code you can't deal with (see my earlier response) as well
60	as writing new code where it is needed. It may sound easy, it isn't.
61
62	2) Write a system from scratch that is certain to be future proof;
63	it is designed with the current and future specifications in mind,
64	not based on specifications that were awesome some time in the past.
65
66	3) Spend time on learning how pkgcore is implemented, then improve it;
67	pkgcore can be somewhat seen as a a mix from (1) and (2).
68
69	Then, which option would you pick?
70
71	I'm not saying Portage or the team behind it is bad; it is just a bit
72	at the end of its time, I'm afraid of what the future will do to it.
73
74	For option (1), I've thinked slightly further than to just generate the
75	dependency graph, there are two things that came to mind that might be
76	interesting _if_ we can get it to somehow work:
77
78	A) Multiple threads
79
80	I think the way trees work (branches), threads are a huge benefit.
81
82	Maybe zmedico can elaborate on this again, but there were some
83	problems that make it not easy to introduce these threads; there
84	was something regarding a global interpreter lock, but I can't
85	recall the details and am not that acquainted with Python.
86
87	Besides that, the threads will have to be organized; so properly
88	designing the threads, locks and inter-thread interactions might be
89	an interesting task that could require some effort.
90
91	B) Additional caching
92
93	Some of the parts of the dependency graph calculation could benefit
94	from a bit of caching; implementing caching right might be a tricky
95	thing, too small cache is useless and too large cache is overhead.
96
97	Let me take one example part; resolving the USE flags to consider
98	which packages are dependencies, this happens again and again.
99
100	For example, let's say you have
101
102	>=dev-libs/glib-2.33:2
103	gnome-keyring? ( >=app-crypt/libsecret-0.5 )
104	introspection? ( >=dev-libs/gobject-introspection-1 )
105
106	then Portage has to go and turn that into maybe something like
107
108	>=dev-libs/glib-2.33:2
109
110	because the user has neither USE flag set; and it is not only the
111	USE flags the user has set, but also the USE flag in profiles, the
112	default USE flags, the REQUIRED_USE and sometimes even other USE
113	flags like "use1? ( use2? ( ATOM ) )". Heh, complexity!
114
115	So, let's say we want to cache this operation, then we could store
116	a pair of the following details in the cache:
117
118	- Checksum of the ebuild.
119	- USE flags that the user relevant to the ebuild.
120	- Resulting dependency variables.
121
122	So, instead of having to compute the whole thing, it simply can
123	look up the checksum and the USE flags the user has set and
124	immediately get back the right dependency string. That sounds
125	awesome, but how well does the cache function?
126
127	To know that, we would have to look at cache invalidation.
128
129	- How often does the ebuild checksum change?
130	--> Not so much, especially not for stable ebuilds.
131
132	- How often do the users USE flags change for a specific ebuild?
133	--> Not so much, only when the user explicitly changes it or some
134	masking happens in the profile which both don't happen too much.
135
136	That's really great, but now three sad questions:
137
138	- But how big does this cache grow?
139	No idea, it requires another study that implements half of this.
140
141	- But how much does this really speed up?
142	Hard to tell without trying it.
143
144	- Erm, what about the USE flags the reverse dependencies force?
145	Oops, no idea is perfect; can we resolve that?! Heh, no idea...
146
147	You can see that it is not hard to come up with ideas like (A) and (B);
148	but, it is much harder to come up with ideas that actually work, which
149	is why I think we will not see any beneficial improvement to Portage
150	tree soon, unless we are going to drop features and write alternatives.
151
152	Back to the options...
153
154	For option (2) I made a very small start and might continue with this
155	over the vacation; but before I do that, I'm likely going to evaluate
156	option (3) if other people are going to jump in as well, perhaps
157	helping along pkgcore can help me gain knowledge to better write (2)
158	further in the future when pkgcore is found to be past its time.
159
160	Whatever we do, I hope a good educated choice is made...
161
162	Until then, I hope I can continue to reasonably use Portage.
163
164	--
165	With kind regards,
166
167	Tom Wijsman (TomWij)
168	Gentoo Developer
169
170	E-mail address : TomWij@g.o
171	GPG Public Key : 6D34E57D
172	GPG Fingerprint : C165 AF18 AB4C 400B C3D2 ABF0 95B2 1FCD 6D34 E57D